# Added Health Endpoint (deef1e25)
# GET /health
Next, I tried adding a pre-deployment check in the pipeline: python -c "from main import app" before systemctl restart to validate imports. However, this only confirmed that imports worked; it couldn't catch errors that occurred during the actual service startup.
# Enhanced pre-import validation (deef1e25)
python -c "from main import app"
Crucially, I implemented logic to automatically roll back to the previous commit (HEAD~1) and restart the service if the /health/ready endpoint didn't respond correctly. During this, removing Gunicorn's --preload option was vital. With this option enabled, an import error in a single worker could bring down the entire service.
# Implemented automatic rollback (deef1e25)
# GET /health/ready (readiness check including DB connection)
git reset --hard HEAD~1
# Removed Gunicorn --preload, enabled worker-independent imports (6fa120b9)
# --preload option removed
The Cause
The biggest issue was the lack of proper validation of actual service availability post-deployment. Just having the process running wasn't sufficient; I needed to confirm real dependencies like database connections. Furthermore, Gunicorn's --preload option meant all workers pre-loaded code from the same process, making an import error in one worker a critical failure for the entire service.
The Solution
Finally, I established the following 3-step safety net in my GitHub Actions deployment pipeline:
-
Pre-import Validation: The deployment script now uses
python -c "from main import app" to pre-check code import validity.
-
Post-deployment Validation based on Detailed Health Check: It leverages the
/health/ready endpoint, which checks database connections, to confirm the service is truly ready.
-
Automatic Rollback: If the
/health/ready endpoint responds abnormally, it automatically executes git reset --hard to the previous commit (HEAD~1) and restarts the service for a rapid recovery.
Additionally, I removed Gunicorn's --preload option, allowing each worker to import code independently. This prevents an import failure in a single worker from causing a complete service outage.
# Example GitHub Actions deploy.yml (partial)
# ...
# Pre-import validation
- name: Pre-import check
run: python -c "from main import app"
# Restart service and check health
- name: Restart service and check health
run: |
sudo systemctl restart myapp.service
# Wait for /health/ready response and validate (with timeout)
# Execute auto-rollback logic on failure
# Example Gunicorn systemd unit file modification (using update_systemd_unit.sh script)
# ExecStart=/path/to/gunicorn --workers 4 --bind 0.0.0.0:8000 main:app --preload -> remove --preload
Results
- The risk of service outages due to import errors or code instability during deployment has been significantly reduced.
- Thanks to the automatic rollback system based on
/health/ready, deployment failures now result in a quick recovery to the previous stable state.
- The overall stability of the production service has greatly improved.
Wrap-up — Avoid the Same Pitfalls
- [ ] Integrate detailed health check logic (e.g.,
GET /health/ready) into your deployment pipeline to accurately assess service availability.
- [ ] Implement automatic rollback to a previous version upon health check failure to minimize recovery time.
- [ ] Be aware that while Gunicorn's
--preload option is convenient, worker-independent imports can offer higher stability. Choose wisely based on your needs (consider the trade-off in memory usage).
- [ ] Add a pre-import validation step before deployment to catch potential issues early on.