Skip to content

🧩 Script Parameter Reference: 002_geneContribution.py

This script trains a neural network classifier on scRNA-seq data, then uses DeepLIFT to compute gene contribution scores for each cell type.
It performs hyperparameter optimization using Optuna, and outputs an attribution matrix that reflects each gene’s importance for each group.

It is typically used after running 001_prepareData.py, which formats the input expression matrix and metadata.


πŸ”§ Parameters

-i, --input-dir (Required)

Type: str
Description:
Path to the input directory containing processed expression data and metadata (output of 001_prepareData.py).


-o, --output-dir (Required)

Type: str
Description:
Path to the directory where all training models and contribution matrices will be saved.


πŸ‹οΈ Training Configuration

-b, --batch-size

Type: int
Default: 1024
Description:
Mini-batch size used for training.


--noise-sigma

Type: float
Default: 1.0
Description:
Standard deviation of Gaussian noise added to input features during training, to improve generalization.


--upsample

Type: store_true
Default: False
Description:
If set, the script will automatically upsample minority cell types to balance training data.
Useful when group sizes are highly imbalanced.


πŸ” Hyperparameter Ranges (for Optuna)

These are ranges for neural network architecture and training that will be searched using Optuna optimization.

--hidden-layers-range

Type: list[int]
Default: [1, 3]
Description:
Range of number of hidden layers.
e.g., [1, 3] means models with 1 to 3 hidden layers will be explored.


--dropout-range

Type: list[float]
Default: [0.2, 0.3]
Description:
Range of dropout rates. Prevents overfitting during training.


--hidden-units-range

Type: list[int]
Default: [256, 512]
Description:
Range of hidden units per layer.


--lr-range

Type: list[float]
Default: [1e-5, 1e-3]
Description:
Range of learning rates to be tested during optimization.


πŸ“Š Optuna Optimization Settings

-e, --epochs

Type: int
Default: 50
Description:
Maximum number of training epochs for each trial.


--global_trials

Type: int
Default: 30
Description:
Number of Optuna trials to run for global hyperparameter search.


--optuna_storage

Type: str
Default: sqlite:///db.sqlite3
Description:
Path to the Optuna storage database.
Can be SQLite (default) or a MySQL/PostgreSQL URL for parallel tuning.


--study_name

Type: str
Default: scMarkerGene
Description:
Name of the Optuna study. Useful when reusing or resuming studies.


πŸ§ͺ Refinement Stage

After global optimization, this stage fine-tunes the best model with finer-grained search around optimal hyperparameters.

--refine_lr_num

Type: int
Default: 5
Description:
Number of learning rate candidates to test during refinement.


--refine_dropout_num

Type: int
Default: 3
Description:
Number of dropout rate candidates to test during refinement.


--refine_lr_ratio

Type: float
Default: 0.3
Description:
Relative search range around the best learning rate.
e.g., 0.3 means Β±30% around the best value found.


--refine_dropout_ratio

Type: float
Default: 0.1
Description:
Relative search range around the best dropout rate.
e.g., 0.1 means Β±10% around the best value found.


βœ… Example Usage

python 002_geneContribution.py \
  -i results/output_dir/ \
  -o results/output_dir/ \
  -b 1024 \
  --upsample \
  --hidden-layers-range 1 3 \
  --dropout-range 0.2 0.3 \
  --hidden-units-range 256 512 \
  --lr-range 1e-5 1e-3 \
  -e 50 \
  --global_trials 30 \
  --refine_lr_num 5 \
  --refine_dropout_num 3

This will run model training with balanced data, perform Optuna search, and save contribution results for marker extraction.


Need help? Contact the developer on zhao_yongbing@gibh.ac.cn.