A comprehensive, production-ready PyTorch project template with modular architecture, distributed training support, and modern tooling.
- 🧩 Modular Architecture: Registry-based component system for easy extensibility
- ⚙️ Configuration Management: Hierarchical config system with inheritance and CLI overrides
- 🚀 Distributed Training: Multi-node/multi-GPU training with DDP, FSDP, and DataParallel
- 📊 Experiment Tracking: MLflow and Weights & Biases integration with auto-visualization
- 🔧 Modern Tooling: uv package management, pre-commit hooks, Docker support
- 💾 Resume Training: Automatic checkpoint saving and loading with state preservation
- 🌐 Cross-Platform: Development support on macOS (Apple Silicon MPS), Linux with optimized builds
- 🐳 Development Environment: Devcontainer and Jupyter Lab integration
- ⚡ Performance Optimization: RAM caching, mixed precision, torch.compile support
- 📚 Auto Documentation: Sphinx-based API docs with live reloading
- 📱 Slack Notifications: Training completion and error notifications
- 🛡️ Error Handling: Robust error recovery and automatic retries
- Python: 3.11+
- Package Manager: uv
- CUDA: 12.8 (for GPU training)
- PyTorch: 2.7.1
Create a new project using this template:
# Option 1: Use as GitHub template (recommended) # Click "Use this template" on GitHub # Option 2: Clone and setup manually git clone <your-repo-url> cd your-project-name # Option 3: Merge updates from this template git remote add upstream https://github.com/mjun0812/PyTorch-Project-Template.git git fetch upstream main git merge --allow-unrelated-histories --squash upstream/main
# Copy environment template cp template.env .env # Edit .env with your API keys and settings
Example .env configuration:
# Slack notifications (optional) # You can use either SLACK_TOKEN or SLACK_WEBHOOK_URL SLACK_TOKEN="xoxb-your-token" SLACK_CHANNEL="#notifications" SLACK_USERNAME="Training Bot" # Alternative: Webhook URL (simpler setup) SLACK_WEBHOOK_URL="https://hooks.slack.com/services/..." # MLflow tracking MLFLOW_TRACKING_URI="./result/mlruns" # or remote URI # Weights & Biases (optional) WANDB_API_KEY="your-wandb-key"
Choose your preferred installation method:
# Install dependencies uv sync # Setup development environment uv run pre-commit install # Run training uv run python train.py config/dummy.yaml
# Build container ./docker/build.sh # Run training in container ./docker/run.sh python train.py config/dummy.yaml
Open the project in VS Code and use the devcontainer configuration for a consistent development environment.
Start with the dummy configuration to test your setup:
# Basic training with dummy dataset python train.py config/dummy.yaml # Override configuration from command line python train.py config/dummy.yaml batch=32 gpu.use=0 optimizer.lr=0.001
This template uses hierarchical configuration with inheritance support:
# Use dot notation to modify nested values python train.py config/dummy.yaml gpu.use=0,1 model.backbone.depth=50 # Multiple overrides python train.py config/dummy.yaml batch=64 epoch=100 optimizer.lr=0.01 # View current configuration python script/show_config.py config/dummy.yaml # Batch edit configuration files python script/edit_configs.py config/dummy.yaml "optimizer.lr=0.01,batch=64"
Configuration hierarchy:
- Dataclass defaults (
src/config/config.py) - Base configs (
config/__base__/) - Experiment configs (
config/*.yaml) with__base__inheritance - CLI overrides
# Launch Jupyter Lab for experimentation ./script/run_notebook.sh # Start MLflow UI for experiment tracking ./script/run_mlflow.sh # View all registered components python script/show_registers.py # View model architecture python script/show_model.py # Visualize learning rate schedules python script/show_scheduler.py # View data transformation pipeline python script/show_transform.py # Clean up orphaned result directories python script/clean_result.py # Aggregate MLflow results to CSV python script/aggregate_mlflow.py all # Start documentation server (auto-reloads on changes) ./script/run_docs.sh
Scale your training across multiple GPUs and nodes:
# Use torchrun for DDP training (recommended) ./torchrun.sh 4 train.py config/dummy.yaml gpu.use="0,1,2,3" # Alternative: DataParallel (not recommended for production) python train.py config/dummy.yaml gpu.use="0,1,2,3" gpu.multi_strategy="dp"
# Master node (node 0) ./multinode.sh 2 4 12345 0 master-ip:12345 train.py config/dummy.yaml gpu.use="0,1,2,3" # Worker nodes (node 1+) ./multinode.sh 2 4 12345 1 master-ip:12345 train.py config/dummy.yaml gpu.use="0,1,2,3"
For very large models that don't fit in GPU memory:
python train.py config/dummy.yaml gpu.multi_strategy="fsdp" gpu.fsdp.min_num_params=100000000Training results are automatically saved to:
result/[dataset_name]/[date]_[model_name]_[tag]/
├── config.yaml # Complete configuration used
├── models/ # Model checkpoints (latest.pth, best.pth, epoch_N.pth)
├── optimizers/ # Optimizer states
└── schedulers/ # Scheduler states
Resume interrupted training using saved checkpoints:
# Resume from automatically saved checkpoint python train.py result/dataset_name/20240108_ResNet_experiment/config.yaml # Resume and extend training python train.py result/dataset_name/20240108_ResNet_experiment/config.yaml epoch=200 # Resume with different configuration python train.py result/dataset_name/20240108_ResNet_experiment/config.yaml gpu.use=1 batch=64
Run evaluation separately from training:
# Evaluate using saved model configuration python test.py result/dataset_name/20240108_ResNet_experiment/config.yaml # Evaluate with different GPU python test.py result/dataset_name/20240108_ResNet_experiment/config.yaml gpu.use=1
Speed up training by caching datasets in RAM:
python train.py config/dummy.yaml use_ram_cache=true ram_cache_size_gb=16
Implement caching in your custom dataset:
if self.cache is not None and idx in self.cache: data = self.cache.get(idx) else: data = self.load_data(idx) # Your data loading logic if self.cache is not None: self.cache.set(idx, data)
# Enable automatic mixed precision with fp16 python train.py config/dummy.yaml use_amp=true amp_dtype="fp16" # Use bfloat16 for newer hardware (A100, H100) python train.py config/dummy.yaml use_amp=true amp_dtype="bf16"
# Enable PyTorch 2.0 compilation for speedup python train.py config/dummy.yaml use_compile=true compile_backend="inductor" # Alternative backends python train.py config/dummy.yaml use_compile=true compile_backend="aot_eager"
Get notified about training progress and errors:
# Training will automatically send notifications on completion/error python train.py config/dummy.yaml # Manual notification testing uv run --frozen pytest tests/test_slack_notification.py -v
src/
├── config/ # Configuration management with inheritance
├── dataloaders/ # Dataset and DataLoader implementations
├── models/ # Model definitions and backbones
│ ├── backbone/ # Pre-trained backbones (ResNet, Swin, etc.)
│ ├── layers/ # Custom layers and building blocks
│ └── losses/ # Loss function implementations
├── optimizer/ # Optimizer builders (including ScheduleFree)
├── scheduler/ # Learning rate schedulers
├── transform/ # Data preprocessing and augmentation
├── evaluator/ # Metrics and evaluation
├── runner/ # Training and testing loops
└── utils/ # Utilities (logger, registry, torch utils)
config/
├── __base__/ # Base configuration templates
└── *.yaml # Experiment configurations
script/ # Utility scripts
├── run_*.sh # Service startup scripts
├── show_*.py # Visualization tools
└── aggregate_*.py # Result aggregation tools
Components are registered using decorators for dynamic instantiation:
from src.models import MODEL_REGISTRY @MODEL_REGISTRY.register() class MyModel(BaseModel): def __init__(self, ...): super().__init__() # Model implementation # Custom name registration @MODEL_REGISTRY.register("custom_name") class AnotherModel(BaseModel): pass
Available registries:
MODEL_REGISTRY: Model architecturesDATASET_REGISTRY: Dataset implementationsTRANSFORM_REGISTRY: Data transformationsOPTIMIZER_REGISTRY: OptimizersLR_SCHEDULER_REGISTRY: Learning rate schedulersEVALUATOR_REGISTRY: Evaluation metrics
The configuration system supports inheritance and modular composition:
# config/my_experiment.yaml __base__: "__base__/config.yaml" # Override specific values batch: 64 optimizer: lr: 0.001 # Import specific sections transform: __import__: "__base__/transform/imagenet.yaml"
The template includes comprehensive error handling:
- Automatic Slack notifications for training completion and errors
- Graceful error recovery with detailed logging
- Checkpoint preservation even during failures
- Distributed training fault tolerance
# Run all tests uv run --frozen pytest # Run specific test modules uv run --frozen pytest tests/test_modules.py uv run --frozen pytest tests/test_slack_notification.py -v # Run with verbose output uv run --frozen pytest -v
# Format code uv run --frozen ruff format . # Check code quality uv run --frozen ruff check . # Fix auto-fixable issues uv run --frozen ruff check . --fix
# Start documentation server with live reload
./script/run_docs.sh# Build development image ./docker/build.sh # Run commands in container ./docker/run.sh python train.py config/dummy.yaml ./docker/run.sh bash # Interactive shell