# setup


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Data Path Management

The package uses a centralized configuration system to manage where data
is stored locally. This is essential for consistency across different
workflows and environments.

### Priority-based Path Resolution

The
[`get_base_data_path()`](https://magistak.github.io/bs-cpg/setup.html#get_base_data_path)
function determines the data storage location using a three-tier
priority system:

1.  **Environment Variable** (`BS_CPG_DATA`): Highest priority, ideal
    for CI/CD pipelines and automated workflows.
2.  **Config File** (`~/.bs-cpg-config.json`): For returning users,
    stores the preferred path.
3.  **Interactive Prompt**: Last resort, asks the user to specify a path
    and saves it for future use.

This ensures flexibility across different deployment scenarios.

------------------------------------------------------------------------

<a
href="https://github.com/magistak/bs-cpg/blob/main/bs_cpg/setup.py#L19"
target="_blank" style="float:right; font-size:smaller">source</a>

### get_base_data_path

>  get_base_data_path ()

*Determines the base data path with a clear priority: 1. BS_CPG_DATA
environment variable. 2. Path stored in ~/.bs-cpg-config.json. 3.
Prompts the user for the path as a last resort.*

------------------------------------------------------------------------

<a
href="https://github.com/magistak/bs-cpg/blob/main/bs_cpg/setup.py#L46"
target="_blank" style="float:right; font-size:smaller">source</a>

### read_sample_cpg

>  read_sample_cpg (columns:list=None, force_download:bool=False)

\*Downloads and reads a sample CpG Parquet file.

This function fetches a sample dataset from the project’s GitHub
repository. It caches the file locally to avoid re-downloading on
subsequent calls.

Args: columns (list, optional): A list of columns to read from the file.
Defaults to None (all columns). force_download (bool, optional): If
True, forces a re-download of the file even if it exists locally.
Defaults to False.

Returns: pd.DataFrame: A DataFrame containing the sample CpG data.

Examples: \>\>\> \# Load the entire sample dataset \>\>\> df =
read_sample_cpg() \>\>\> \>\>\> \# Load only specific columns \>\>\>
df_subset = read_sample_cpg(columns=\[“chromosome”, “pos”\]) \>\>\>
\>\>\> \# Force re-download if needed \>\>\> df_fresh =
read_sample_cpg(force_download=True)\*

``` python
# Example: Get the base data path
base_path = get_base_data_path()
print(f"Data will be stored in: {base_path}")
```

    Path('/mnt/idms/home/magyary/.bs-cpg')
