Skip to content

🧩 Script Parameter Reference: 003_extractCellMarker.py

This script is used to extract marker genes from scRNA-seq data based on contribution scores and silhouette-based expression specificity.

It is typically used after running 002_geneContribution.py, and helps identify the most cell-type-specific genes.


🔧 Parameters

-i (Required)

Type: str
Description:
Path to the input directory that contains model results and attribution score matrices.
This directory is usually generated after running the model training pipeline (002_geneContribution.py).


-e (Required)

Type: str
Description:
Path to the scRNA-seq expression matrix.
Should be in cell × gene format, with cell_id and group as the first two columns.
Used to compute silhouette scores for gene-level expression specificity.


-o (Required)

Type: str
Description:
Output directory where marker gene results will be saved.


-p

Type: int
Default: 1000
Description:
Number of permutation test.
Calculate the statistical association between gene contribution scores and cell types.

💡 Increase this if you want to output the result of permutation test.


-n

Type: int
Default: 3
Description:
Number of final marker genes to select for each cell type.
This is the final output size after filtering by contribution and expression specificity.

💡 Increase this if you want more marker genes per group (e.g. -n 5 for top 5 markers per cell type).


-t

Type: int
Default: 10
Description:
Number of top candidate genes (by contribution score) selected for further filtering.
These genes will be filtered based on silhouette score and fold change.

🔍 Think of this as a preselection pool — top N genes ranked by importance for each group.


✅ Example Usage

python 003_extractCellMarker.py \
  -i results/output_dir/ \
  -e data/scRNA_expression_matrix \
  -o results/output_dir/ \
  -n 5 -t 20 -p 1000

This will extract top 5 marker genes per cell type, considering the top 20 attribution-ranked genes.


Need help? Contact the developer on zhao_yongbing@gibh.ac.cn.