There is a specific kind of anxiety that comes with deploying to production on a Friday afternoon. Your monitoring dashboard is open in one tab, your terminal in another, and somewhere in the back of your mind you are calculating how long it will take to roll back if something goes wrong.
For the past two years, I have been running a handful of services on a single Hetzner VPS. Nothing glamorous: a Node.js API, a static frontend, a webhook processor, and a PostgreSQL database. The kind of setup where Kubernetes would be overkill but manual docker compose down && docker compose up means a few seconds of downtime every deploy.
Those few seconds add up. More importantly, they add up in customer-visible ways. A failed health check here, a dropped WebSocket connection there. So I built a blue-green deployment pipeline using nothing more than Docker Compose and Nginx.
The Architecture
The core idea is simple: run two identical copies of your application stack behind Nginx. At any given moment, one is "live" (receiving traffic) and the other is "standby" (either idle or running the previous version). When you deploy, you bring up the new version on the standby stack, verify it is healthy, then tell Nginx to switch traffic over.
The trick is that Nginx can reload its configuration without dropping existing connections. A nginx -s reload gracefully transitions traffic from one upstream to another. Combined with Docker Compose's ability to run multiple project instances using the -p flag, you get a surprisingly robust deployment pipeline.
Setting Up the Compose Files
I use a single docker-compose.yml with environment variable interpolation to distinguish between the blue and green stacks. The key is the project name and the port bindings.
version: "3.8" services: api: image: myapp/api:${TAG:-latest} ports: - "${API_PORT:-3001}:3000" environment: - NODE_ENV=production - DATABASE_URL=${DATABASE_URL} healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 10s timeout: 5s retries: 3 start_period: 15s restart: unless-stopped worker: image: myapp/worker:${TAG:-latest} environment: - REDIS_URL=${REDIS_URL} restart: unless-stopped
When I deploy, the script sets API_PORT=3001 for the blue stack and API_PORT=3002 for the green stack. Nginx knows about both ports and routes to whichever is currently active.
The Deployment Script
The deploy script is roughly 80 lines of bash. It determines which stack is currently active, brings up the other one with the new image tag, waits for health checks to pass, then swaps the Nginx upstream configuration.
#!/bin/bash set -euo pipefail # Determine active stack ACTIVE=$(cat /etc/nginx/active-stack 2>/dev/null || echo "blue") if [ "$ACTIVE" = "blue" ]; then TARGET="green" TARGET_PORT=3002 else TARGET="blue" TARGET_PORT=3001 fi echo "Deploying to $TARGET stack (port $TARGET_PORT)..." # Pull new images and start target stack TAG="$1" API_PORT="$TARGET_PORT" \ docker compose -p "$TARGET" up -d --pull always # Wait for health check echo "Waiting for health check..." for i in $(seq 1 30); do if curl -sf http://localhost:"$TARGET_PORT"/health > /dev/null; then echo "Health check passed on attempt $i" break fi sleep 2 done # Swap Nginx upstream sed -i "s/server 127.0.0.1:.*/server 127.0.0.1:$TARGET_PORT;/" \ /etc/nginx/conf.d/upstream.conf nginx -s reload echo "$TARGET" > /etc/nginx/active-stack echo "Deployed. $TARGET is now active."
I wrote the first version of this script at 11 PM on a Tuesday, after a deploy had knocked out our webhook processor for 14 seconds and a client noticed before our monitoring did. There is nothing like a customer support email to motivate infrastructure improvements.
Nginx Configuration
The Nginx configuration is minimal. The upstream block points to whichever port is currently active, and the proxy_pass directive forwards all traffic there. The important detail is the upstream definition living in a separate file that the deploy script can modify independently.
upstream app_backend { server 127.0.0.1:3001; }
server { listen 443 ssl http2; server_name api.example.com; ssl_certificate /etc/letsencrypt/live/api.example.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem; location / { proxy_pass http://app_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # WebSocket support proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } }
Why Not Use Docker's Built-in Load Balancing?
Docker Compose can scale services and load-balance between them using the built-in round-robin DNS. The problem is that it does not give you control over the transition. You cannot tell Docker "drain connections from the old container before removing it." With Nginx in front, you have explicit control over when traffic shifts and can verify the new stack is healthy before committing.
Handling Database Migrations
The elephant in the room with blue-green deployments is database schema changes. If your new version expects a column that does not exist yet, you cannot simply switch traffic over after the migration runs because the old version (still running on the other stack for rollback purposes) might break.
My approach is borrowed from the Parallel Change pattern:
- Expand: Add the new column or table. Make it nullable or provide a default. Deploy this change first, without any application code that uses it.
- Migrate: Deploy the application code that writes to the new structure. Backfill historical data if needed.
- Contract: Once you are confident the new code is stable and the old stack will not be needed, remove the old column or constraint.
This means every breaking schema change becomes at least two deploys. It is more work up front, but it means you always have a safe rollback path.
A colleague once told me that the best infrastructure is the kind that lets you sleep at night. This pipeline is not elegant. It is not cutting-edge. But I have not been woken up by a failed deploy since I built it, and that counts for more than architectural purity.
Monitoring the Swap
During the switchover window, I log a few metrics to make sure everything is healthy:
- Response time from the new stack's health endpoint (should be under 200ms)
- Active connection count on the old stack (should drain to zero within 30 seconds)
- Error rate from the Nginx access log (any spike triggers automatic rollback)
- Memory and CPU usage of the new containers (catching runaway processes early)
I use a simple bash script that polls these metrics for 60 seconds after the swap. If anything looks off, it automatically reverts the Nginx config and sends me a notification.
You do not need Kubernetes to achieve zero-downtime deployments. Docker Compose, Nginx, and about 80 lines of bash will get you surprisingly far on a single VPS.
The expand-migrate-contract pattern for database changes is more work per deploy but eliminates the "migration broke the old version" failure mode entirely.
The best infrastructure is not the most sophisticated. It is the kind you can debug at 2 AM without a reference guide.
What I Would Do Differently
If I were starting today, I would use Docker Compose deploy configs more aggressively. The update_config section with order: start-first gets you part of the way there without the custom scripting. I would also explore Traefik as a replacement for Nginx, since it has native Docker integration and can detect new containers automatically.
But the script works. It has deployed 247 times without a single second of downtime. And sometimes, that is exactly the right amount of engineering.