π§© Script Parameter Reference: 002_geneContribution.py¶
This script trains a neural network classifier on scRNA-seq data, then uses DeepLIFT to compute gene contribution scores for each cell type.
It performs hyperparameter optimization using Optuna, and outputs an attribution matrix that reflects each geneβs importance for each group.
It is typically used after running 001_prepareData.py, which formats the input expression matrix and metadata.
π§ Parameters¶
-i, --input-dir (Required)¶
Type: str
Description:
Path to the input directory containing processed expression data and metadata (output of 001_prepareData.py).
-o, --output-dir (Required)¶
Type: str
Description:
Path to the directory where all training models and contribution matrices will be saved.
ποΈ Training Configuration¶
-b, --batch-size¶
Type: int
Default: 1024
Description:
Mini-batch size used for training.
--noise-sigma¶
Type: float
Default: 1.0
Description:
Standard deviation of Gaussian noise added to input features during training, to improve generalization.
--upsample¶
Type: store_true
Default: False
Description:
If set, the script will automatically upsample minority cell types to balance training data.
Useful when group sizes are highly imbalanced.
π Hyperparameter Ranges (for Optuna)¶
These are ranges for neural network architecture and training that will be searched using Optuna optimization.
--hidden-layers-range¶
Type: list[int]
Default: [1, 3]
Description:
Range of number of hidden layers.
e.g., [1, 3] means models with 1 to 3 hidden layers will be explored.
--dropout-range¶
Type: list[float]
Default: [0.2, 0.3]
Description:
Range of dropout rates. Prevents overfitting during training.
--hidden-units-range¶
Type: list[int]
Default: [256, 512]
Description:
Range of hidden units per layer.
--lr-range¶
Type: list[float]
Default: [1e-5, 1e-3]
Description:
Range of learning rates to be tested during optimization.
π Optuna Optimization Settings¶
-e, --epochs¶
Type: int
Default: 50
Description:
Maximum number of training epochs for each trial.
--global_trials¶
Type: int
Default: 30
Description:
Number of Optuna trials to run for global hyperparameter search.
--optuna_storage¶
Type: str
Default: sqlite:///db.sqlite3
Description:
Path to the Optuna storage database.
Can be SQLite (default) or a MySQL/PostgreSQL URL for parallel tuning.
--study_name¶
Type: str
Default: scMarkerGene
Description:
Name of the Optuna study. Useful when reusing or resuming studies.
π§ͺ Refinement Stage¶
After global optimization, this stage fine-tunes the best model with finer-grained search around optimal hyperparameters.
--refine_lr_num¶
Type: int
Default: 5
Description:
Number of learning rate candidates to test during refinement.
--refine_dropout_num¶
Type: int
Default: 3
Description:
Number of dropout rate candidates to test during refinement.
--refine_lr_ratio¶
Type: float
Default: 0.3
Description:
Relative search range around the best learning rate.
e.g., 0.3 means Β±30% around the best value found.
--refine_dropout_ratio¶
Type: float
Default: 0.1
Description:
Relative search range around the best dropout rate.
e.g., 0.1 means Β±10% around the best value found.
β Example Usage¶
python 002_geneContribution.py \
-i results/output_dir/ \
-o results/output_dir/ \
-b 1024 \
--upsample \
--hidden-layers-range 1 3 \
--dropout-range 0.2 0.3 \
--hidden-units-range 256 512 \
--lr-range 1e-5 1e-3 \
-e 50 \
--global_trials 30 \
--refine_lr_num 5 \
--refine_dropout_num 3
This will run model training with balanced data, perform Optuna search, and save contribution results for marker extraction.
Need help? Contact the developer on zhao_yongbing@gibh.ac.cn.