Developer Guide

Instructions for contributing to and developing the bs-cpg package.

Project Structure

The bs-cpg package uses nbdev for development, which means the source code and documentation are authored directly in Jupyter notebooks and then compiled into Python modules.

Key Directories

nbs/: Contains the Jupyter notebooks where all code and documentation are written
- index.ipynb: Main documentation homepage
- 00_download_processed_geo.ipynb: GEO data acquisition module
- 01_setup.ipynb: Configuration and data path management
- 02_download_ref.ipynb: Reference genome and chain file utilities
- 03_liftover_pos.ipynb: Genomic coordinate processing and liftover
- 04_developers.ipynb: This file
bs_cpg/: Auto-generated Python package (exported from notebooks)
_proc/: Processing notebooks (not part of the main package)
data/: Sample data files for testing

Configuration Files

nbdev.yml: nbdev configuration
settings.ini: Project metadata and package settings
pyproject.toml: Python project configuration
_quarto.yml: Quarto/Jupyter Book configuration for documentation

Setting Up Development Environment

1. Clone the Repository

git clone https://github.com/magistak/bs-cpg.git
cd bs-cpg

2. Install in Development Mode

Install the package with editable mode so changes are reflected immediately:

pip install -e .

This installs the package in editable/development mode, meaning the code is linked rather than copied.

3. Install Development Dependencies

pip install nbdev jupyter

4. Verify Installation

python -c "import bs_cpg; print(bs_cpg.__version__)"

Working with nbdev

Understanding Cell Directives

nbdev uses special cell directives (comments starting with #|) to control how cells are processed:

Directive	Purpose
`#\\| export`	Export this cell’s code to the package module
`#\\| hide`	Hide this cell in documentation but execute it
`#\\| default_exp <module_name>`	Set the default module for subsequent `#\\| export` cells
`#\\| eval: false`	Include in docs but don’t execute when running tests

Example Cell Structure

#| export
def my_function(x: int) -> str:
    \"\"\"A helpful docstring.\"\"\"  
    return f"Result: {x}"

Writing Tests in Notebooks

You can include tests directly after function definitions:

# Example/test
result = my_function(42)
assert result == "Result: 42"
print(f"✅ Test passed: {result}")

Development Workflow

1. Make Changes

Edit the Jupyter notebooks in the nbs/ directory. Use #| export directive to mark cells that should be part of the package.

2. Prepare and Export

Run the following command to: - Export code from notebooks to Python modules (nbdev_export) - Run tests in the notebooks (nbdev_test) - Update the README from index.ipynb (nbdev_readme)

nbdev_prepare

This is the primary command you’ll use during development.

3. Preview Documentation Locally

To see how your documentation looks before publishing:

nbdev_preview

This starts a local web server showing the rendered documentation.

4. Run Specific Workflows

Individual nbdev commands (in case you don’t want to run everything):

# Export code from notebooks
nbdev_export

# Run tests defined in notebooks
nbdev_test

# Update README from index.ipynb
nbdev_readme

Adding New Functions

Step-by-Step Example

To add a new function to the bs-cpg package:

Open the appropriate notebook (e.g., 03_liftover_pos.ipynb for coordinate functions)

Add a markdown cell explaining the function:

### My New Function

Brief description of what the function does.

Add a code cell with #| export:

#| export
def my_new_function(data: list) -> int:
    \"\"\"Comprehensive docstring with Args, Returns, Examples.\"\"\" 
    # implementation
    return result

Add test/example cells below:

# Example usage
result = my_new_function([1, 2, 3])
assert result == 6

Run nbdev_prepare to export and test:
```
nbdev_prepare
```
Verify the function is accessible from the package:
```
from bs_cpg.liftover_ps import my_new_function
```

Managing Dependencies

Core Dependencies

bs-cpg depends on several key packages:

pandas: Data manipulation and analysis
pysam: Reading/writing genomic files
geofetch: Integration with GEO (Gene Expression Omnibus)
tenacity: Automatic retry logic for network requests
liftover: Genomic coordinate conversion

Updating Dependencies

Edit settings.ini under the install_requires section:

install_requires = 
    pandas
    pysam
    geofetch
    tenacity
    liftover

Then run:

pip install -e .

Troubleshooting

Import Errors After Editing

If you modify code and Python still sees old versions, you may need to reload the module:

# In Jupyter
%load_ext autoreload
%autoreload 2

Or reinstall in development mode:

pip install -e . --no-deps

nbdev_prepare Fails

Check that all cells marked with #| export can run without errors:

nbdev_test  # Run notebook tests

Module Not Found

Make sure you’ve run nbdev_export to generate the Python files:

nbdev_export

Publishing Updates

Prepare for Release

Update version in settings.ini:
```
version = 0.1.2
```
Run final checks:
```
nbdev_prepare
nbdev_preview
```

Commit and push:

git add .
git commit -m "v0.1.2: Add new features"
git push

Deploy Documentation

To deploy documentation to GitHub Pages:

nbdev_ghp_deploy

Build and Release Package

The package is configured to automatically build and release via GitHub Actions. Manually building:

python -m pip install build
python -m build

Resources

nbdev Documentation: Complete guide to nbdev
Jupyter Notebook Documentation: Notebook features
Pandas Documentation: Data analysis library
pysam Documentation: Genomic file handling
UCSC Genome Browser: Reference for chain files and genomes