# download_processed


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## GEO Integration with Resilient Fetching

The module provides a resilient wrapper around `geofetch.Geofetcher`
that includes:

- **Automatic Retries**: Uses exponential backoff to handle transient
  network failures.
- **Smart Caching**: Locally caches project metadata to avoid redundant
  network requests.
- **Clear Interface**: Simple methods for listing and downloading
  datasets.

### Import and Setup

------------------------------------------------------------------------

<a
href="https://github.com/magistak/bs-cpg/blob/main/bs_cpg/download_processed.py#L28"
target="_blank" style="float:right; font-size:smaller">source</a>

### Geofetcher

>  Geofetcher (name:str='', metadata_root:str='', metadata_folder:str='',
>                  just_metadata:bool=False, refresh_metadata:bool=False,
>                  config_template:str|None=None,
>                  pipeline_samples:str|None=None,
>                  pipeline_project:str|None=None, skip:int=0,
>                  acc_anno:bool=False, use_key_subset:bool=False,
>                  processed:bool=False, data_source:str='samples',
>                  filter:str|None=None, filter_size:str|None=None,
>                  geo_folder:str='.', split_experiments:bool=False,
>                  bam_folder:str='', fq_folder:str='', sra_folder:str='',
>                  bam_conversion:bool=False, picard_path:str='',
>                  input:str|None=None, const_limit_project:int=50,
>                  const_limit_discard:int=1000, attr_limit_truncate:int=500,
>                  max_soft_size:str='1GB', discard_soft:bool=False,
>                  add_dotfile:bool=False, disable_progressbar:bool=False,
>                  add_convert_modifier:bool=False, opts:object|None=None,
>                  max_prefetch_size:str|int|None=None, **kwargs:object)

*Class to download or get projects, metadata, data from GEO and SRA.*

``` python
# Example: Create a Geofetcher instance
geo = Geofetcher(just_metadata=True)
acc = 'GSE51239'
print(f"Fetching metadata for project: {acc}")
```

    [INFO] [18:08:41] Metadata folder: /mnt/idms/home/magyary/bs-dna-methyl/nbs/project_name

### List Available Projects

Query GEO for available processed files within a project:

``` python
# Example: Get project metadata
# projects = geo.get_projects(acc)
# projects
```

    [INFO] [18:08:41] Metadata folder: /mnt/idms/home/magyary/bs-dna-methyl/nbs/project_name
    [INFO] [18:08:41] Trying GSE51239 (not a file) as accession...
    [INFO] [18:08:41] Trying GSE51239 (not a file) as accession...
    [INFO] [18:08:41] Skipped 0 accessions. Starting now.
    [INFO] [18:08:41] Processing accession 1 of 1: 'GSE51239'
    [INFO] [18:08:43] Processed 48 samples.
    [INFO] [18:08:43] Expanding metadata list...
    [INFO] [18:08:43] Found SRA Project accession: SRP030612
    [INFO] [18:08:43] Downloading SRP030612 sra metadata
    [INFO] [18:08:46] Parsing SRA file to download SRR records
    [INFO] [18:08:46] Dry run, no data will be downloaded
    [INFO] [18:08:46] Finished processing 1 accession(s)
    [INFO] [18:08:46] Cleaning soft files ...
    [INFO] [18:08:46] Creating complete project annotation sheets and config file...

    {'GSE51239_raw': Project
     48 samples (showing first 20): hsperm-524-90, hsperm-530-90, hsperm-533-90, hsperm-534-90, h8c-1, h8c-2, hblast-1, hblast-2, hblast-3, hblastsingle-2, hblastsingle-5, hicm-1, hicm-2, hte-1, hte-2, hesp0-e1, hesp0-e4, hesp0-e5, hesp1-e1, hesp1-e4
     Sections: name, pep_version, sample_table, experiment_metadata, sample_modifiers, description}

### Download Processed Files

Create a Geofetcher instance configured for downloading processed data:

``` python
# projects_files = geof.get_projects(acc, just_metadata=False, ignore_cache=True)
```

    [INFO] [18:09:55] Metadata folder: /mnt/idms/home/magyary/bs-dna-methyl/nbs/project_name
    [INFO] [18:09:55] Trying GSE51239 (not a file) as accession...
    [INFO] [18:09:55] Trying GSE51239 (not a file) as accession...
    [INFO] [18:09:55] Skipped 0 accessions. Starting now.
    [INFO] [18:09:55] Processing accession 1 of 1: 'GSE51239'
    [INFO] [18:09:57] Processed 48 samples.
    [INFO] [18:09:57] Expanding metadata list...
    [INFO] [18:09:57] Found SRA Project accession: SRP030612
    [INFO] [18:09:57] Downloading SRP030612 sra metadata
    [INFO] [18:09:58] Parsing SRA file to download SRR records
    [INFO] [18:09:58] Getting SRR: SRR1003182  in (GSE51239)

    2025-07-28T16:09:58 prefetch.3.2.1: 1) Resolving 'SRR1003182'...
    2025-07-28T16:09:59 prefetch.3.2.1: Current preference is set to retrieve SRA Normalized Format files with full base quality scores

    [INFO] [18:10:00] Getting SRR: SRR1003183  in (GSE51239)

    2025-07-28T16:10:00 prefetch.3.2.1: 1) 'SRR1003182' is found locally 
    2025-07-28T16:10:00 prefetch.3.2.1: 1) Resolving 'SRR1003183'...
    2025-07-28T16:10:01 prefetch.3.2.1: Current preference is set to retrieve SRA Normalized Format files with full base quality scores
    2025-07-28T16:10:02 prefetch.3.2.1: 1) Downloading 'SRR1003183'...
    2025-07-28T16:10:02 prefetch.3.2.1:  SRA Normalized Format file is being retrieved
    2025-07-28T16:10:02 prefetch.3.2.1:  Downloading via HTTPS...
    2025-07-28T16:10:02 prefetch.3.2.1:    Continue download of 'SRR1003183' from 154660408

### Explore Downloaded Files

Once downloaded, you can explore the sample table to see available
processed files:
