Scsilhouette Modules ==================== Nextflow modules for the scsilhouette silhouette-score QC branch of the ``sc-nsforest-qc-nf`` workflow. These modules run inside the ``ghcr.io/nih-nlm/scsilhouette:1.0`` container and are orchestrated by ``main.nf``. The ``scsilhouette`` package computes silhouette scores and integrated visualizations with NSForest F-scores. For full details see the `scsilhouette repository `_ and `scsilhouette documentation `_. Execution order (runs in parallel with the NSForest branch): 1. ``compute_silhouette_process`` — silhouette scores + cluster summary 2. ``viz_summary_process`` — silhouette + F-score summary plot 3. ``viz_distribution_process`` — cluster size vs silhouette distribution 4. ``viz_dotplot_process`` — UMAP/embedding coloured by silhouette Compute Silhouette Process ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. rubric:: ``compute_silhouette_process`` *Source:* ``modules/scsilhouette/compute_silhouette.nf`` Compute Silhouette Module Computes per-cell silhouette scores for each cluster using the specified embedding. Saves per-cell scores, per-cluster summary statistics, and an annotation JSON for downstream viz processes. Input: ~~~~~~ ------ @param tuple: - meta: Map with organ, first_author, journal, year, author_cell_type, embedding, disease, dataset_version_id - h5ad: Path to adata_filtered.h5ad Output: ~~~~~~~ ------- @emit results: tuple(meta, [silhouette_scores.csv, cluster_summary.csv, annotation.json]) Flat filenames: {organ}_{first_author}_{journal}_{year}_{cluster_header_safe}_{embedding_safe}_{vid}_silhouette_scores.csv {organ}_{first_author}_{journal}_{year}_{cluster_header_safe}_{embedding_safe}_{vid}_cluster_summary.csv {organ}_{first_author}_{journal}_{year}_{cluster_header_safe}_{embedding_safe}_{vid}_annotation.json **Params referenced:** - ``params.outdir`` - ``params.publish_mode`` Compute Summary Stats Process ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. rubric:: ``compute_summary_stats_process`` *Source:* ``modules/scsilhouette/compute_summary_stats.nf`` Compute Summary Statistics Creates dataset-level summary statistics from cluster summaries. Computes median-of-medians and other aggregate metrics across all clusters. Input: ~~~~~~ ------ @param tuple: - meta: Map with organ, first_author, journal, year, author_cell_type, embedding, doi, etc. - silhouette_scores:{prefix}_silhouette_scores.csv - cluster_summary: {prefix}_cluster_summary.csv - annotation: {prefix}_annotation.json - nsforest_results: {prefix}_results.csv (or NO_FILE sentinel) Output: ~~~~~~~ ------- @emit summary: tuple(meta, {prefix}_dataset_summary.csv) Contains: organ, first_author, journal, year, cluster_header, n_clusters, n_cells, median/mean/std silhouette, quality tier counts, median/mean F-score, doi, collection_name, dataset_title, journal **Params referenced:** - ``params.outdir`` - ``params.publish_mode`` Viz 2D Projection Process ^^^^^^^^^^^^^^^^^^^^^^^^^ .. rubric:: ``viz_2D_projection_process`` *Source:* ``modules/scsilhouette/viz_2D_projection.nf`` Viz 2D_Projection Module Generates an embedding scatter plot (UMAP/t-SNE/etc.) coloured by cluster identity, saved as both HTML (interactive) and SVG. Input: ~~~~~~ ------ @param tuple: - meta: Map with organ, first_author, year, author_cell_type, embedding - h5ad: Path to adata_filtered.h5ad Output: ~~~~~~~ ------- @emit plots: tuple(meta, [2D_projection HTML and SVG]) Flat filenames: {organ}_{first_author}_{journal}_{year}_{cluster_header_safe}_{embedding_key}_2D_projection.{html,svg} **Params referenced:** - ``params.outdir`` - ``params.publish_mode`` Viz Distribution Process ^^^^^^^^^^^^^^^^^^^^^^^^ .. rubric:: ``viz_distribution_process`` *Source:* ``modules/scsilhouette/viz_distribution.nf`` Viz Distribution Module Generates distribution plots of cluster cell counts (raw and log10) overlaid with mean/median silhouette scores per cluster. Input: ~~~~~~ ------ @param tuple: - meta: Map with organ, first_author, journal, year, author_cell_type - silhouette_scores: {prefix}_silhouette_scores.csv - cluster_summary: {prefix}_cluster_summary.csv - annotation: {prefix}_annotation.json Output: ~~~~~~~ ------- @emit plots: tuple(meta, [distribution HTML and SVG]) Flat filenames: {organ}_{first_author}_{journal}_{year}_{cluster_header_safe}_{embedding_safe}_{vid}_distribution_*.{html,svg} **Params referenced:** - ``params.outdir`` - ``params.publish_mode`` Viz Summary Process ^^^^^^^^^^^^^^^^^^^ .. rubric:: ``viz_summary_process`` *Source:* ``modules/scsilhouette/viz_summary.nf`` Viz Summary Module Generates an interactive silhouette F-score summary plot combining silhouette scores with NSForest F-scores per cluster. Also writes a dataset-level summary CSV. Input: ~~~~~~ ------ @param tuple: - meta: Map with organ, first_author, journal, year, author_cell_type, embedding, doi, etc. - silhouette_scores:{prefix}_silhouette_scores.csv - cluster_summary: {prefix}_cluster_summary.csv - annotation: {prefix}_annotation.json - nsforest_results: {prefix}_results.csv (or NO_FILE sentinel) Output: ~~~~~~~ ------- @emit plots: tuple(meta, [summary SVG/HTML/CSV, dataset_summary CSV]) Flat filenames: {organ}_{first_author}_{journal}_{year}_{cluster_header_safe}_*.{csv,svg,html,json} **Params referenced:** - ``params.outdir`` - ``params.publish_mode``