# Example: Get the base data path
base_path = get_base_data_path()
print(f"Data will be stored in: {base_path}")Path('/mnt/idms/home/magyary/.bs-cpg')
The package uses a centralized configuration system to manage where data is stored locally. This is essential for consistency across different workflows and environments.
The get_base_data_path() function determines the data storage location using a three-tier priority system:
BS_CPG_DATA): Highest priority, ideal for CI/CD pipelines and automated workflows.~/.bs-cpg-config.json): For returning users, stores the preferred path.This ensures flexibility across different deployment scenarios.
get_base_data_path ()
Determines the base data path with a clear priority: 1. BS_CPG_DATA environment variable. 2. Path stored in ~/.bs-cpg-config.json. 3. Prompts the user for the path as a last resort.
read_sample_cpg (columns:list=None, force_download:bool=False)
*Downloads and reads a sample CpG Parquet file.
This function fetches a sample dataset from the project’s GitHub repository. It caches the file locally to avoid re-downloading on subsequent calls.
Args: columns (list, optional): A list of columns to read from the file. Defaults to None (all columns). force_download (bool, optional): If True, forces a re-download of the file even if it exists locally. Defaults to False.
Returns: pd.DataFrame: A DataFrame containing the sample CpG data.
Examples: >>> # Load the entire sample dataset >>> df = read_sample_cpg() >>> >>> # Load only specific columns >>> df_subset = read_sample_cpg(columns=[“chromosome”, “pos”]) >>> >>> # Force re-download if needed >>> df_fresh = read_sample_cpg(force_download=True)*