Lewati ke isi

Rollback

Purpose

How to roll back a bad deploy. This page covers code rollback for API, RAG, and admin, plus the caveats for DB migrations (which are additive-only in HUPH). Pair with incident-playbook.en.md when a rollback is part of a larger incident.

Prerequisites

  • SSH + sudo access to the production host
  • The pre-deploy snapshot files from the deploy runbook (/tmp/pre- deploy-sha.txt and /tmp/pre-deploy-state.txt), or knowledge of the last-known-good commit SHA on main

Rollback triggers

Roll back immediately (no additional verification) if any of:

  • Smoke tests fail after deploy
  • docker-compose logs huph-api --tail 100 | grep -i error shows non-trivial error spike
  • Admin dashboard fails to load or realtime shows "Offline"
  • WhatsApp webhook stops receiving messages (check 360dialog dashboard)
  • Counselor reports from the team

Roll back after brief investigation (~5 minutes max) if:

  • Slight latency increase
  • Single-page rendering glitch
  • Non-blocking log warnings

Layer 1 — Code rollback (preferred path)

The cleanest rollback is git revert + redeploy the reverted commit.

Step 1 — Find the last-known-good SHA

cat /tmp/pre-deploy-sha.txt    # from deploy runbook

If the snapshot file is missing:

cd /opt/huph
git log --oneline -20   # look for the commit before the bad one

Step 2 — Revert

cd /opt/huph
git revert HEAD                  # if bad deploy was the last commit
# OR
git revert <bad-sha>             # for a specific bad commit
git log --oneline -5             # verify the revert is there
git push origin main             # (or leave local if using a different flow)

Step 3 — Redeploy API

docker-compose build huph-api
docker-compose up -d huph-api
curl -s http://localhost:3101/health

Step 4 — Redeploy RAG (if needed)

docker-compose build huph-rag
docker-compose up -d huph-rag
# Wait ~90 seconds for models to load
curl -s http://localhost:3102/health

Step 5 — Redeploy admin (if needed)

cd /opt/huph/apps/admin
npm run build
sudo systemctl restart huph-admin
sudo systemctl status huph-admin --no-pager | head -10

Step 6 — Smoke

See the post-deploy smoke section in deploy.en.md.

Layer 2 — Checkout-to-previous-SHA rollback

If you cannot git revert cleanly (e.g. conflicts in unrelated files, or revert ripples into too many components), check out the previous SHA into a temporary state:

cd /opt/huph
git stash                        # if any local changes
git checkout <last-good-sha> -- .
# now the working tree matches the old SHA
docker-compose build huph-api
docker-compose up -d huph-api

After the immediate fire is out, resolve the conflict properly and commit the correct forward fix — don't leave the repo in a detached state.

Database migration rollback

HUPH migrations are additive-only — there are no down migrations. Columns are added, rarely removed. This means:

  • A new column can be left in place after code rollback; the old code will simply ignore it
  • A new index can be left in place; it consumes space but doesn't break anything
  • A new table can be left in place; same rule
  • A new trigger can be dangerous if the old code doesn't know about it — it may fire events the old code isn't ready to handle

If the migration is safe to leave

Do nothing. The old code ignores the new column/index/table.

If the migration is not safe to leave (trigger incompatibility)

Drop the trigger explicitly:

docker exec huph-postgres psql -U huph -d huph -c \
  "DROP TRIGGER IF EXISTS <trigger_name> ON <table>;"

Document in the incident log. Plan a forward fix that either: - Removes the trigger permanently if it was a mistake, OR - Keeps the trigger and re-deploys fixed code that handles it

If the migration added a NOT NULL column

This is the nightmare case. The forward migration works but the old code inserts rows without the new column → NOT NULL violation on INSERT.

Fix: alter the column to nullable:

docker exec huph-postgres psql -U huph -d huph -c \
  "ALTER TABLE <table> ALTER COLUMN <col> DROP NOT NULL;"

Then decide whether to forward-fix or keep nullable.

Admin-only rollback (systemd)

If only the admin changed and you want to rollback without touching API:

cd /opt/huph
git revert HEAD
cd apps/admin && npm run build
sudo systemctl restart huph-admin

Nginx continues serving — no vhost change needed.

Communication during rollback

As soon as you decide to rollback:

  1. Notify the team channel — "Rolling back deploy X due to Y"
  2. Stop any counselors from taking destructive actions — "Hold off on bulk updates for the next 5 minutes"
  3. Start a rollback timer — note the time
  4. Watch logs during the rollback deploy
  5. Post the all-clear when smoke passes

After rollback

  1. Open a post-mortem doc — use the template at incident-playbook.en.md
  2. Do NOT blame — focus on what the system let slip through
  3. Identify action items — what would have prevented this? Better tests? Staging environment? More careful migration?
  4. Forward-fix the original issue — rolling back is temporary; the actual bug still needs resolving

Gotchas

  1. Migration rollback is case-by-case. There is no generic "down migration" to run. Always think through the specific change first.
  2. git revert of a merge commit needs -m 1 to specify the parent: git revert -m 1 <merge-sha>.
  3. docker-compose up -d doesn't rebuild — you must docker-compose build first if the Dockerfile or source changed.
  4. Admin systemctl restart without npm run build restarts the old build. The rollback does nothing. Always build first.
  5. Rollback is not free. Each rollback + re-deploy burns ~5 minutes of restart windows. Fast rollback is better than hesitating, but try to get the forward fix deployed ASAP.

See also