🧩 Script Parameter Reference: 003_extractCellMarker.py¶
This script is used to extract marker genes from scRNA-seq data based on contribution scores and silhouette-based expression specificity.
It is typically used after running 002_geneContribution.py, and helps identify the most cell-type-specific genes.
🔧 Parameters¶
-i (Required)¶
Type: str
Description:
Path to the input directory that contains model results and attribution score matrices.
This directory is usually generated after running the model training pipeline (002_geneContribution.py).
-e (Required)¶
Type: str
Description:
Path to the scRNA-seq expression matrix.
Should be in cell × gene format, with cell_id and group as the first two columns.
Used to compute silhouette scores for gene-level expression specificity.
-o (Required)¶
Type: str
Description:
Output directory where marker gene results will be saved.
-p¶
Type: int
Default: 1000
Description:
Number of permutation test.
Calculate the statistical association between gene contribution scores and cell types.
💡 Increase this if you want to output the result of permutation test.
-n¶
Type: int
Default: 3
Description:
Number of final marker genes to select for each cell type.
This is the final output size after filtering by contribution and expression specificity.
💡 Increase this if you want more marker genes per group (e.g.
-n 5for top 5 markers per cell type).
-t¶
Type: int
Default: 10
Description:
Number of top candidate genes (by contribution score) selected for further filtering.
These genes will be filtered based on silhouette score and fold change.
🔍 Think of this as a preselection pool — top N genes ranked by importance for each group.
✅ Example Usage¶
python 003_extractCellMarker.py \
-i results/output_dir/ \
-e data/scRNA_expression_matrix \
-o results/output_dir/ \
-n 5 -t 20 -p 1000
This will extract top 5 marker genes per cell type, considering the top 20 attribution-ranked genes.
Need help? Contact the developer on zhao_yongbing@gibh.ac.cn.