sc-nsforest-qc-nf Documentation

sc-nsforest-qc-nf is a Nextflow pipeline for NSForest marker gene discovery and silhouette score quality control of single-cell RNA-seq data.

It orchestrates parallel execution of NSForest marker discovery and scsilhouette clustering quality control across multiple datasets and organs, with ontology-based cell filtering driven by cellxgene-harvester outputs.

It is part of the NIH NLM Cell Knowledge Network.

Python API (nsforest-cli)

Quick Start — Nextflow workflow

nextflow run main.nf \
    --datasets_csv data/homo_sapiens_kidney_harvester_final.csv \
    --organ        kidney \
    --uberon_json  data/uberon_kidney.json \
    --disease_json data/disease_normal.json \
    --hsapdv_json  data/hsapdv_adult_15.json \
    --outdir       results/kidney \
    -c             configs/macamd64.config

Warning

Pass --github_token via -params-file params.json or an environment variable — never hardcode it in a config file or on the command line where it may appear in shell history. See the README for full details.

Repository Structure

sc-nsforest-qc-nf/
├── configs/                        # Platform-specific Nextflow configs
│   ├── aws.config
│   ├── macamd64.config
│   └── nexflow_biowulf.config
├── container/nsforest/             # Docker image for nsforest-cli
│   ├── Dockerfile
│   └── context/src/nsforest_cli/   # nsforest-cli Python package
├── docs/                           # Sphinx documentation
│   ├── parse_nf_docs.py            # Auto-generates RST from .nf docblocks
│   └── source/
├── modules/
│   ├── nsforest/                   # NSForest Nextflow process modules
│   ├── publish/                    # cell-kn publish module
│   └── scsilhouette/               # scsilhouette Nextflow process modules
├── main.nf                         # Pipeline entry point
└── nextflow.config                 # Default parameters and container config

Indices and tables