download_ref

Download reference genomes and liftover chain files from UCSC.

Reference Genome Management

Downloading Reference Genomes

The module provides robust utilities for fetching and preparing bgzipped reference FASTA files from UCSC for use with pysam and other genomic tools.


source

download_file

 download_file (url:str, filename:str, sub_dir:pathlib.Path=None,
                verbose:bool=True)

*A general utility to download a file, with optional status messages and a progress bar.

Args: url (str): The URL of the file to download. filename (str): The name for the saved file. sub_dir (Path, optional): A subdirectory under the main data path. Defaults to None. verbose (bool, optional): If True, prints status messages and shows a progress bar. Defaults to True.

Returns: Path: The full path to the downloaded file, or None on error.*


source

is_bgzipped

 is_bgzipped (filepath:pathlib.Path)

*Checks if a file is block-gzipped (BGZF) by reading its header.

Args: filepath (Path): The path to the file to check.

Returns: bool: True if the file is in BGZF format, False otherwise.*


source

convert_to_bgzip

 convert_to_bgzip (input_path:pathlib.Path, output_path:pathlib.Path)

*Converts a standard gzip file to a bgzip file using command-line tools. This function replicates the command: gunzip -c <input> | bgzip > <output>.

Args: input_path (Path): The path to the input gzip file. output_path (Path): The path for the output bgzip file.

Returns: bool: True if conversion was successful, False otherwise.*


source

get_ref_genome

 get_ref_genome (name:str, **kwargs)

Downloads a reference genome and ensures it is properly compressed with bgzip for use with pysam.

# Example: Download and prepare hg38 reference genome
# ref_path = get_ref_genome("hg38")
# print(f"Reference genome saved to: {ref_path}")
✅ Final file '/mnt/idms/home/magyary/.bs-cpg/hg38.fa.bgz' already exists.
'/mnt/idms/home/magyary/.bs-cpg/hg38.fa.bgz'

Liftover Chain Files

Download UCSC liftover chain files for converting coordinates between genome builds:


source

get_liftover_chain

 get_liftover_chain (genome_from:str, genome_to:str, **kwargs)

*Download liftover chain file between genome versions from UCSC goldenPath liftOver.

Chain files enable coordinate conversion between different genome builds. For example, converting hg19 (GRCh37) coordinates to hg38 (GRCh38).

Args: genome_from (str): The original reference genome name (e.g., ‘hg19’, ‘hg38’, ‘mm10’). genome_to (str): The new reference genome name (e.g., ‘hg19’, ‘hg38’, ‘mm10’). **kwargs: Additional keyword arguments to be passed to download_file() (e.g., verbose=False)

Returns: Path: The path to the downloaded file, or None if an error occurred.

Examples: >>> # Download chain file for hg19 to hg38 conversion >>> chain_path = get_liftover_chain(“hg19”, “hg38”) >>> print(f”Chain file saved to: {chain_path}“)*