Scsilhouette Modules

Nextflow modules for the scsilhouette silhouette-score QC branch of the sc-nsforest-qc-nf workflow. These modules run inside the ghcr.io/nih-nlm/scsilhouette:1.0 container and are orchestrated by main.nf.

The scsilhouette package computes silhouette scores and integrated visualizations with NSForest F-scores. For full details see the scsilhouette repository and scsilhouette documentation.

Execution order (runs in parallel with the NSForest branch):

  1. compute_silhouette_process — silhouette scores + cluster summary

  2. viz_summary_process — silhouette + F-score summary plot

  3. viz_distribution_process — cluster size vs silhouette distribution

  4. viz_dotplot_process — UMAP/embedding coloured by silhouette

Compute Silhouette Process

compute_silhouette_process

Source: modules/scsilhouette/compute_silhouette.nf

Compute Silhouette Module

Computes per-cell silhouette scores for each cluster using the specified embedding. Saves per-cell scores, per-cluster summary statistics, and an annotation JSON for downstream viz processes.

Input:

  • h5ad: Path to adata_filtered.h5ad

Output:

{organ}_{first_author}_{journal}_{year}_{cluster_header_safe}_{embedding_safe}_{vid}_cluster_summary.csv {organ}_{first_author}_{journal}_{year}_{cluster_header_safe}_{embedding_safe}_{vid}_annotation.json

Params referenced:

  • params.outdir

  • params.publish_mode

Compute Summary Stats Process

compute_summary_stats_process

Source: modules/scsilhouette/compute_summary_stats.nf

Compute Summary Statistics

Creates dataset-level summary statistics from cluster summaries. Computes median-of-medians and other aggregate metrics across all clusters.

Input:

  • silhouette_scores:{prefix}_silhouette_scores.csv

  • cluster_summary: {prefix}_cluster_summary.csv

  • annotation: {prefix}_annotation.json

  • nsforest_results: {prefix}_results.csv (or NO_FILE sentinel)

Output:

median/mean/std silhouette, quality tier counts, median/mean F-score, doi, collection_name, dataset_title, journal

Params referenced:

  • params.outdir

  • params.publish_mode

Viz 2D Projection Process

viz_2D_projection_process

Source: modules/scsilhouette/viz_2D_projection.nf

Viz 2D_Projection Module

Generates an embedding scatter plot (UMAP/t-SNE/etc.) coloured by cluster identity, saved as both HTML (interactive) and SVG.

Input:

  • h5ad: Path to adata_filtered.h5ad

Output:

Params referenced:

  • params.outdir

  • params.publish_mode

Viz Distribution Process

viz_distribution_process

Source: modules/scsilhouette/viz_distribution.nf

Viz Distribution Module

Generates distribution plots of cluster cell counts (raw and log10) overlaid with mean/median silhouette scores per cluster.

Input:

  • silhouette_scores: {prefix}_silhouette_scores.csv

  • cluster_summary: {prefix}_cluster_summary.csv

  • annotation: {prefix}_annotation.json

Output:

Params referenced:

  • params.outdir

  • params.publish_mode

Viz Summary Process

viz_summary_process

Source: modules/scsilhouette/viz_summary.nf

Viz Summary Module

Generates an interactive silhouette F-score summary plot combining silhouette scores with NSForest F-scores per cluster. Also writes a dataset-level summary CSV.

Input:

  • silhouette_scores:{prefix}_silhouette_scores.csv

  • cluster_summary: {prefix}_cluster_summary.csv

  • annotation: {prefix}_annotation.json

  • nsforest_results: {prefix}_results.csv (or NO_FILE sentinel)

Output:

Params referenced:

  • params.outdir

  • params.publish_mode