DenseNet121-based multi-label chest X-ray classification pipeline using the CheXpert-small dataset.
This repository is the research and experiment codebase for the capstone project.
The production web service is managed separately in capstone-cxr.
This project trains and evaluates a chest X-ray classification model for five CheXpert target findings:
- Atelectasis
- Cardiomegaly
- Consolidation
- Edema
- Pleural Effusion
The model predicts class-wise probabilities and supports Grad-CAM visualization for explainable inference.
| Item | Description |
|---|---|
| Project Type | Research PoC / Model Experiment Repository |
| Development Period | 2026.03 – 2026.04 |
| Main Task | Multi-label chest X-ray classification |
| Dataset | CheXpert-small |
| Backbone | DenseNet121 |
| Explainability | Grad-CAM |
This repository was developed as the research and model experimentation phase of the capstone project.
| Phase | Period | Description |
|---|---|---|
| Initial Setup | 2026.03 | CheXpert-small dataset structure setup, preprocessing policy design, and baseline DenseNet121 training pipeline implementation |
| Model Experiments | 2026.03 – 2026.04 | Uncertainty policy comparison, training configuration refinement, AUROC/AUPRC evaluation, and threshold tuning |
| Inference & Visualization | 2026.04 | Image-level inference pipeline, Grad-CAM visualization, error analysis, and reusable inference logic preparation |
- CheXpert-small data loading
- Frontal-view-only training policy
- DenseNet121 multi-label classifier
- Uncertainty label policy comparison: U-Ignore, U-Ones, U-Zeros
- BCEWithLogitsLoss with
pos_weight - AUROC and AUPRC evaluation
- F1-based threshold tuning
- Image-level inference
- Grad-CAM visualization
- Reusable inference service functions for later migration
chexpert_poc/
├── chexpert_poc/
│ ├── common/ # config, runtime, shared utilities
│ ├── datasets/ # CheXpert dataset and label handling
│ ├── evaluation/ # metrics, prediction tables, thresholds
│ ├── explain/ # Grad-CAM logic
│ ├── inference/ # inference, postprocess, artifact handling
│ ├── metrics/ # classification metrics
│ ├── models/ # DenseNet121 model definition
│ └── utils/ # training utilities
├── configs/
│ └── base.yaml
├── scripts/
│ ├── train.py
│ ├── eval.py
│ ├── threshold_tune.py
│ ├── error_analysis.py
│ ├── infer.py
│ └── gradcam_demo.py
└── README.md
The dataset is not included in this repository.
Expected local dataset path:
data/chexpert_small/raw/
├── train.csv
├── valid.csv
├── train/
└── valid/
Only frontal-view images are used.
- Atelectasis
- Cardiomegaly
- Consolidation
- Edema
- Pleural Effusion
CheXpert contains uncertain labels. This project compares three uncertainty label policies:
| Policy | Description |
|---|---|
| U-Ignore | Exclude uncertain labels from loss calculation |
| U-Ones | Treat uncertain labels as positive |
| U-Zeros | Treat uncertain labels as negative |
The representative model uses U-Ignore because it achieved the highest test AUROC while avoiding forced positive or negative assignment of uncertain labels.
| Item | Setting |
|---|---|
| Backbone | DenseNet121 |
| Pretrained | ImageNet |
| Task | Multi-label classification |
| Input size | 320 ×ばつ 320 |
| Batch size | 32 |
| Epochs | 10 |
| Optimizer | Adam |
| Learning rate | 1e-4 |
| Loss | BCEWithLogitsLoss + pos_weight |
| Metrics | AUROC, AUPRC |
| Threshold tuning | F1 grid search from 0.05 to 0.95 |
| Policy | Valid AUROC | Valid AUPRC | Test AUROC | Test AUPRC |
|---|---|---|---|---|
| U-Ignore | 0.8817 | 0.7387 | 0.8927 | 0.6494 |
| U-Ones | 0.8778 | 0.7216 | 0.8715 | 0.6116 |
| U-Zeros | 0.8837 | 0.7302 | 0.8903 | 0.6597 |
Representative U-Ignore model:
| Metric | Test Score |
|---|---|
| Mean AUROC | 0.8927 |
| Mean AUPRC | 0.6494 |
Model Evaluation Metrics and Threshold Selection
Class-specific thresholds were tuned on the validation set to maximize per-class F1-score.
| Label | Threshold |
|---|---|
| Atelectasis | 0.46 |
| Cardiomegaly | 0.11 |
| Consolidation | 0.47 |
| Edema | 0.34 |
| Pleural Effusion | 0.37 |
Grad-CAM is used to visualize model attention over chest X-ray regions.
It is provided as supporting evidence and does not replace clinical interpretation.
Model Inference and Grad-CAM Visualization
Create and activate the virtual environment:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtCheck dataset:
python scripts/check_dataset.py --config configs/base.yaml
Train:
python scripts/train.py --config configs/base.yaml
Evaluate:
python scripts/eval.py \ --config configs/base.yaml \ --checkpoint outputs/train_runs/<run_id>/checkpoints/best.pt
Tune thresholds:
python scripts/threshold_tune.py \ --config configs/base.yaml \ --checkpoint outputs/train_runs/<run_id>/checkpoints/best.pt \ --criterion f1
Run inference:
python scripts/infer.py \ --config configs/base.yaml \ --checkpoint outputs/train_runs/<run_id>/checkpoints/best.pt \ --input path/to/image.jpg
Run Grad-CAM:
python scripts/gradcam_demo.py \ --config configs/base.yaml \ --checkpoint outputs/train_runs/<run_id>/checkpoints/best.pt \ --input path/to/image.jpg \ --label "Pleural Effusion"
- This repository is for research and proof-of-concept experiments.
- The development period was 2026.03 – 2026.04.
- It is not a standalone medical device.
- Model outputs should be interpreted only as decision-support information.
- CheXpert data, model checkpoints, logs, and generated outputs are excluded from Git tracking.