Developer Guide
Project Structure
The bs-cpg package uses nbdev for development, which means the source code and documentation are authored directly in Jupyter notebooks and then compiled into Python modules.
Key Directories
nbs/: Contains the Jupyter notebooks where all code and documentation are writtenindex.ipynb: Main documentation homepage00_download_processed_geo.ipynb: GEO data acquisition module01_setup.ipynb: Configuration and data path management02_download_ref.ipynb: Reference genome and chain file utilities03_liftover_pos.ipynb: Genomic coordinate processing and liftover04_developers.ipynb: This file
bs_cpg/: Auto-generated Python package (exported from notebooks)_proc/: Processing notebooks (not part of the main package)data/: Sample data files for testing
Configuration Files
nbdev.yml: nbdev configurationsettings.ini: Project metadata and package settingspyproject.toml: Python project configuration_quarto.yml: Quarto/Jupyter Book configuration for documentation
Setting Up Development Environment
1. Clone the Repository
git clone https://github.com/magistak/bs-cpg.git
cd bs-cpg2. Install in Development Mode
Install the package with editable mode so changes are reflected immediately:
pip install -e .This installs the package in editable/development mode, meaning the code is linked rather than copied.
3. Install Development Dependencies
pip install nbdev jupyter4. Verify Installation
python -c "import bs_cpg; print(bs_cpg.__version__)"Working with nbdev
Understanding Cell Directives
nbdev uses special cell directives (comments starting with #|) to control how cells are processed:
| Directive | Purpose |
|---|---|
#\| export |
Export this cell’s code to the package module |
#\| hide |
Hide this cell in documentation but execute it |
#\| default_exp <module_name> |
Set the default module for subsequent #\| export cells |
#\| eval: false |
Include in docs but don’t execute when running tests |
Example Cell Structure
#| export
def my_function(x: int) -> str:
\"\"\"A helpful docstring.\"\"\"
return f"Result: {x}"Writing Tests in Notebooks
You can include tests directly after function definitions:
# Example/test
result = my_function(42)
assert result == "Result: 42"
print(f"✅ Test passed: {result}")Development Workflow
1. Make Changes
Edit the Jupyter notebooks in the nbs/ directory. Use #| export directive to mark cells that should be part of the package.
2. Prepare and Export
Run the following command to: - Export code from notebooks to Python modules (nbdev_export) - Run tests in the notebooks (nbdev_test) - Update the README from index.ipynb (nbdev_readme)
nbdev_prepareThis is the primary command you’ll use during development.
3. Preview Documentation Locally
To see how your documentation looks before publishing:
nbdev_previewThis starts a local web server showing the rendered documentation.
4. Run Specific Workflows
Individual nbdev commands (in case you don’t want to run everything):
# Export code from notebooks
nbdev_export
# Run tests defined in notebooks
nbdev_test
# Update README from index.ipynb
nbdev_readmeAdding New Functions
Step-by-Step Example
To add a new function to the bs-cpg package:
Open the appropriate notebook (e.g.,
03_liftover_pos.ipynbfor coordinate functions)Add a markdown cell explaining the function:
### My New Function Brief description of what the function does.Add a code cell with
#| export:#| export def my_new_function(data: list) -> int: \"\"\"Comprehensive docstring with Args, Returns, Examples.\"\"\" # implementation return resultAdd test/example cells below:
# Example usage result = my_new_function([1, 2, 3]) assert result == 6Run
nbdev_prepareto export and test:nbdev_prepareVerify the function is accessible from the package:
from bs_cpg.liftover_ps import my_new_function
Managing Dependencies
Core Dependencies
bs-cpg depends on several key packages:
- pandas: Data manipulation and analysis
- pysam: Reading/writing genomic files
- geofetch: Integration with GEO (Gene Expression Omnibus)
- tenacity: Automatic retry logic for network requests
- liftover: Genomic coordinate conversion
Updating Dependencies
Edit settings.ini under the install_requires section:
install_requires =
pandas
pysam
geofetch
tenacity
liftoverThen run:
pip install -e .Troubleshooting
Import Errors After Editing
If you modify code and Python still sees old versions, you may need to reload the module:
# In Jupyter
%load_ext autoreload
%autoreload 2Or reinstall in development mode:
pip install -e . --no-depsnbdev_prepare Fails
Check that all cells marked with #| export can run without errors:
nbdev_test # Run notebook testsModule Not Found
Make sure you’ve run nbdev_export to generate the Python files:
nbdev_exportPublishing Updates
Prepare for Release
Update version in
settings.ini:version = 0.1.2Run final checks:
nbdev_prepare nbdev_previewCommit and push:
git add . git commit -m "v0.1.2: Add new features" git push
Deploy Documentation
To deploy documentation to GitHub Pages:
nbdev_ghp_deployBuild and Release Package
The package is configured to automatically build and release via GitHub Actions. Manually building:
python -m pip install build
python -m buildResources
- nbdev Documentation: Complete guide to nbdev
- Jupyter Notebook Documentation: Notebook features
- Pandas Documentation: Data analysis library
- pysam Documentation: Genomic file handling
- UCSC Genome Browser: Reference for chain files and genomes