Developer Guide

Instructions for contributing to and developing the bs-cpg package.

Project Structure

The bs-cpg package uses nbdev for development, which means the source code and documentation are authored directly in Jupyter notebooks and then compiled into Python modules.

Key Directories

  • nbs/: Contains the Jupyter notebooks where all code and documentation are written
    • index.ipynb: Main documentation homepage
    • 00_download_processed_geo.ipynb: GEO data acquisition module
    • 01_setup.ipynb: Configuration and data path management
    • 02_download_ref.ipynb: Reference genome and chain file utilities
    • 03_liftover_pos.ipynb: Genomic coordinate processing and liftover
    • 04_developers.ipynb: This file
  • bs_cpg/: Auto-generated Python package (exported from notebooks)
  • _proc/: Processing notebooks (not part of the main package)
  • data/: Sample data files for testing

Configuration Files

  • nbdev.yml: nbdev configuration
  • settings.ini: Project metadata and package settings
  • pyproject.toml: Python project configuration
  • _quarto.yml: Quarto/Jupyter Book configuration for documentation

Setting Up Development Environment

1. Clone the Repository

git clone https://github.com/magistak/bs-cpg.git
cd bs-cpg

2. Install in Development Mode

Install the package with editable mode so changes are reflected immediately:

pip install -e .

This installs the package in editable/development mode, meaning the code is linked rather than copied.

3. Install Development Dependencies

pip install nbdev jupyter

4. Verify Installation

python -c "import bs_cpg; print(bs_cpg.__version__)"

Working with nbdev

Understanding Cell Directives

nbdev uses special cell directives (comments starting with #|) to control how cells are processed:

Directive Purpose
#\| export Export this cell’s code to the package module
#\| hide Hide this cell in documentation but execute it
#\| default_exp <module_name> Set the default module for subsequent #\| export cells
#\| eval: false Include in docs but don’t execute when running tests

Example Cell Structure

#| export
def my_function(x: int) -> str:
    \"\"\"A helpful docstring.\"\"\"  
    return f"Result: {x}"

Writing Tests in Notebooks

You can include tests directly after function definitions:

# Example/test
result = my_function(42)
assert result == "Result: 42"
print(f"✅ Test passed: {result}")

Development Workflow

1. Make Changes

Edit the Jupyter notebooks in the nbs/ directory. Use #| export directive to mark cells that should be part of the package.

2. Prepare and Export

Run the following command to: - Export code from notebooks to Python modules (nbdev_export) - Run tests in the notebooks (nbdev_test) - Update the README from index.ipynb (nbdev_readme)

nbdev_prepare

This is the primary command you’ll use during development.

3. Preview Documentation Locally

To see how your documentation looks before publishing:

nbdev_preview

This starts a local web server showing the rendered documentation.

4. Run Specific Workflows

Individual nbdev commands (in case you don’t want to run everything):

# Export code from notebooks
nbdev_export

# Run tests defined in notebooks
nbdev_test

# Update README from index.ipynb
nbdev_readme

Adding New Functions

Step-by-Step Example

To add a new function to the bs-cpg package:

  1. Open the appropriate notebook (e.g., 03_liftover_pos.ipynb for coordinate functions)

  2. Add a markdown cell explaining the function:

    ### My New Function
    
    Brief description of what the function does.
  3. Add a code cell with #| export:

    #| export
    def my_new_function(data: list) -> int:
        \"\"\"Comprehensive docstring with Args, Returns, Examples.\"\"\" 
        # implementation
        return result
  4. Add test/example cells below:

    # Example usage
    result = my_new_function([1, 2, 3])
    assert result == 6
  5. Run nbdev_prepare to export and test:

    nbdev_prepare
  6. Verify the function is accessible from the package:

    from bs_cpg.liftover_ps import my_new_function

Managing Dependencies

Core Dependencies

bs-cpg depends on several key packages:

  • pandas: Data manipulation and analysis
  • pysam: Reading/writing genomic files
  • geofetch: Integration with GEO (Gene Expression Omnibus)
  • tenacity: Automatic retry logic for network requests
  • liftover: Genomic coordinate conversion

Updating Dependencies

Edit settings.ini under the install_requires section:

install_requires = 
    pandas
    pysam
    geofetch
    tenacity
    liftover

Then run:

pip install -e .

Troubleshooting

Import Errors After Editing

If you modify code and Python still sees old versions, you may need to reload the module:

# In Jupyter
%load_ext autoreload
%autoreload 2

Or reinstall in development mode:

pip install -e . --no-deps

nbdev_prepare Fails

Check that all cells marked with #| export can run without errors:

nbdev_test  # Run notebook tests

Module Not Found

Make sure you’ve run nbdev_export to generate the Python files:

nbdev_export

Publishing Updates

Prepare for Release

  1. Update version in settings.ini:

    version = 0.1.2
  2. Run final checks:

    nbdev_prepare
    nbdev_preview
  3. Commit and push:

    git add .
    git commit -m "v0.1.2: Add new features"
    git push

Deploy Documentation

To deploy documentation to GitHub Pages:

nbdev_ghp_deploy

Build and Release Package

The package is configured to automatically build and release via GitHub Actions. Manually building:

python -m pip install build
python -m build

Resources