# setup ## Data Path Management The package uses a centralized configuration system to manage where data is stored locally. This is essential for consistency across different workflows and environments. ### Priority-based Path Resolution The [`get_base_data_path()`](https://magistak.github.io/bs-cpg/setup.html#get_base_data_path) function determines the data storage location using a three-tier priority system: 1. **Environment Variable** (`BS_CPG_DATA`): Highest priority, ideal for CI/CD pipelines and automated workflows. 2. **Config File** (`~/.bs-cpg-config.json`): For returning users, stores the preferred path. 3. **Interactive Prompt**: Last resort, asks the user to specify a path and saves it for future use. This ensures flexibility across different deployment scenarios. ------------------------------------------------------------------------ source ### get_base_data_path > get_base_data_path () *Determines the base data path with a clear priority: 1. BS_CPG_DATA environment variable. 2. Path stored in ~/.bs-cpg-config.json. 3. Prompts the user for the path as a last resort.* ------------------------------------------------------------------------ source ### read_sample_cpg > read_sample_cpg (columns:list=None, force_download:bool=False) \*Downloads and reads a sample CpG Parquet file. This function fetches a sample dataset from the project’s GitHub repository. It caches the file locally to avoid re-downloading on subsequent calls. Args: columns (list, optional): A list of columns to read from the file. Defaults to None (all columns). force_download (bool, optional): If True, forces a re-download of the file even if it exists locally. Defaults to False. Returns: pd.DataFrame: A DataFrame containing the sample CpG data. Examples: \>\>\> \# Load the entire sample dataset \>\>\> df = read_sample_cpg() \>\>\> \>\>\> \# Load only specific columns \>\>\> df_subset = read_sample_cpg(columns=\[“chromosome”, “pos”\]) \>\>\> \>\>\> \# Force re-download if needed \>\>\> df_fresh = read_sample_cpg(force_download=True)\* ``` python # Example: Get the base data path base_path = get_base_data_path() print(f"Data will be stored in: {base_path}") ``` Path('/mnt/idms/home/magyary/.bs-cpg')