Rollback
Purpose
How to roll back a bad deploy. This page covers code rollback for API, RAG, and admin, plus the caveats for DB migrations (which are additive-only in HUPH). Pair with incident-playbook.en.md when a rollback is part of a larger incident.
Prerequisites
- SSH + sudo access to the production host
- The pre-deploy snapshot files from the deploy runbook (
/tmp/pre- deploy-sha.txtand/tmp/pre-deploy-state.txt), or knowledge of the last-known-good commit SHA onmain
Rollback triggers
Roll back immediately (no additional verification) if any of:
- Smoke tests fail after deploy
docker-compose logs huph-api --tail 100 | grep -i errorshows non-trivial error spike- Admin dashboard fails to load or realtime shows "Offline"
- WhatsApp webhook stops receiving messages (check 360dialog dashboard)
- Counselor reports from the team
Roll back after brief investigation (~5 minutes max) if:
- Slight latency increase
- Single-page rendering glitch
- Non-blocking log warnings
Layer 1 — Code rollback (preferred path)
The cleanest rollback is git revert + redeploy the reverted
commit.
Step 1 — Find the last-known-good SHA
cat /tmp/pre-deploy-sha.txt # from deploy runbook
If the snapshot file is missing:
cd /opt/huph
git log --oneline -20 # look for the commit before the bad one
Step 2 — Revert
cd /opt/huph
git revert HEAD # if bad deploy was the last commit
# OR
git revert <bad-sha> # for a specific bad commit
git log --oneline -5 # verify the revert is there
git push origin main # (or leave local if using a different flow)
Step 3 — Redeploy API
docker-compose build huph-api
docker-compose up -d huph-api
curl -s http://localhost:3101/health
Step 4 — Redeploy RAG (if needed)
docker-compose build huph-rag
docker-compose up -d huph-rag
# Wait ~90 seconds for models to load
curl -s http://localhost:3102/health
Step 5 — Redeploy admin (if needed)
cd /opt/huph/apps/admin
npm run build
sudo systemctl restart huph-admin
sudo systemctl status huph-admin --no-pager | head -10
Step 6 — Smoke
See the post-deploy smoke section in deploy.en.md.
Layer 2 — Checkout-to-previous-SHA rollback
If you cannot git revert cleanly (e.g. conflicts in unrelated
files, or revert ripples into too many components), check out the
previous SHA into a temporary state:
cd /opt/huph
git stash # if any local changes
git checkout <last-good-sha> -- .
# now the working tree matches the old SHA
docker-compose build huph-api
docker-compose up -d huph-api
After the immediate fire is out, resolve the conflict properly and commit the correct forward fix — don't leave the repo in a detached state.
Database migration rollback
HUPH migrations are additive-only — there are no down migrations. Columns are added, rarely removed. This means:
- A new column can be left in place after code rollback; the old code will simply ignore it
- A new index can be left in place; it consumes space but doesn't break anything
- A new table can be left in place; same rule
- A new trigger can be dangerous if the old code doesn't know about it — it may fire events the old code isn't ready to handle
If the migration is safe to leave
Do nothing. The old code ignores the new column/index/table.
If the migration is not safe to leave (trigger incompatibility)
Drop the trigger explicitly:
docker exec huph-postgres psql -U huph -d huph -c \
"DROP TRIGGER IF EXISTS <trigger_name> ON <table>;"
Document in the incident log. Plan a forward fix that either: - Removes the trigger permanently if it was a mistake, OR - Keeps the trigger and re-deploys fixed code that handles it
If the migration added a NOT NULL column
This is the nightmare case. The forward migration works but the old code inserts rows without the new column → NOT NULL violation on INSERT.
Fix: alter the column to nullable:
docker exec huph-postgres psql -U huph -d huph -c \
"ALTER TABLE <table> ALTER COLUMN <col> DROP NOT NULL;"
Then decide whether to forward-fix or keep nullable.
Admin-only rollback (systemd)
If only the admin changed and you want to rollback without touching API:
cd /opt/huph
git revert HEAD
cd apps/admin && npm run build
sudo systemctl restart huph-admin
Nginx continues serving — no vhost change needed.
Communication during rollback
As soon as you decide to rollback:
- Notify the team channel — "Rolling back deploy X due to Y"
- Stop any counselors from taking destructive actions — "Hold off on bulk updates for the next 5 minutes"
- Start a rollback timer — note the time
- Watch logs during the rollback deploy
- Post the all-clear when smoke passes
After rollback
- Open a post-mortem doc — use the template at incident-playbook.en.md
- Do NOT blame — focus on what the system let slip through
- Identify action items — what would have prevented this? Better tests? Staging environment? More careful migration?
- Forward-fix the original issue — rolling back is temporary; the actual bug still needs resolving
Gotchas
- Migration rollback is case-by-case. There is no generic "down migration" to run. Always think through the specific change first.
git revertof a merge commit needs-m 1to specify the parent:git revert -m 1 <merge-sha>.docker-compose up -ddoesn't rebuild — you mustdocker-compose buildfirst if the Dockerfile or source changed.- Admin
systemctl restartwithoutnpm run buildrestarts the old build. The rollback does nothing. Always build first. - Rollback is not free. Each rollback + re-deploy burns ~5 minutes of restart windows. Fast rollback is better than hesitating, but try to get the forward fix deployed ASAP.
See also
- Deploy — the inverse operation
- Incident playbook — broader incident response
- ClickHouse OOM recovery — specific runbook