AI Pipeline (Dify-based)

Purpose

Deep-dive into HUPH's AI generation pipeline. In the current architecture (April 2026 onward), all retrieval-augmented generation happens inside a self-hosted Dify stack — NOT in an in-repo Python service. This page covers how Dify is wired up, where the boundaries between apps/api and Dify live, and what the prompt workflow looks like.

Prerequisites

Architecture overview — know that Dify is the AI layer
Familiarity with Dify's concepts (apps, workflows, datasets, annotation reply) — see https://dify.ai/docs

Historical note

HUPH previously had a apps/rag/ Python FastAPI service on port 3102 running BGE-M3 embeddings + BGE-reranker-v2-m3 + a direct Claude Haiku call. That service was removed in early April 2026 and replaced by Dify. Files like apps/rag/main.py, apps/rag/venv/, apps/rag/test_*.py no longer exist in this repo. Commits in the git history that refer to huph-rag, port 3102, BGE-M3, or the uph_knowledge_v2 Qdrant collection predate this replacement.

If you're landing on this page looking for the old apps/rag docs, they do not apply — read on for the current architecture.

Dify stack

HUPH runs Dify as a separate docker-compose stack defined in /opt/huph/docker-compose.dify.yml. The stack includes:

Container	Purpose
`huph-dify-api`	Main Dify API server (port 5001 internal) — chat-messages, datasets, annotation reply
`huph-dify-worker`	Celery-like background worker for embeddings + indexing jobs
`huph-dify-web`	Dify admin UI at `https://dify.huph.val.id`
`huph-dify-sandbox`	Sandbox for code execution inside workflows
`huph-dify-plugin-daemon`	Plugin runtime
`huph-dify-ssrf-proxy`	SSRF protection for outbound tool calls
`huph-dify-beat`	Scheduler for periodic tasks (memory-bumped to 512m after Apr 8 OOM)

Dify uses its own Postgres + Redis (inside the stack) for its metadata; it is NOT the same Postgres as HUPH's main postgres service. Vector data lives in Milvus, brought up via docker-compose.milvus.yml.

How `apps/api` talks to Dify

apps/api
    │
    │ HTTP (internal network: http://huph-dify-api:5001)
    ↓
Dify API
    │
    ├── /v1/chat-messages          (primary chat endpoint)
    ├── /v1/apps/annotations        (FAQ annotation CRUD)
    └── /v1/datasets/<id>/...       (KB management)

Environment variables in apps/api that control the Dify integration (set in root .env + passed through docker-compose.yml environment block):

DIFY_API_URL=http://huph-dify-api:5001 — primary chat endpoint
DIFY_APP_API_KEY — token for the chat-messages app
DIFY_KB_API_URL=http://huph-dify-api:5001 — same URL, different key scope
DIFY_KB_API_KEY — token for the KB management API (crawl, upload, delete)
DIFY_DATASET_ID — the main KB dataset UUID
DIFY_INTENT_DATASET_ID — optional secondary dataset for intent routing

The HUPH API sends inputs to Dify chat-messages that include persona variables (tone, emoji_usage, answer_length, guidance_rules, etc.) — Dify's workflow reads those to build the prompt.

Retrieval pipeline (inside Dify)

Dify's chat-messages workflow executes approximately:

1. User query arrives
2. Annotation reply matcher runs (~300 ms):
   - Fuzzy match against curated FAQ annotations
   - If hit: return annotated answer, skip LLM
3. Otherwise: RAG workflow:
   a. OpenAI embeddings API called with the query
      (HUPH provides OPENAI_API_KEY to Dify)
   b. Milvus similarity search → top N candidates
   c. Optional rerank (Dify config-dependent)
   d. Context assembled into prompt with persona variables
   e. Claude call via Dify's Anthropic integration
   f. Response + citations returned
4. Dify records the trace internally (and optionally to Langfuse)

The exact workflow lives in Dify admin UI at https://dify.huph.val.id. Edit it from the Dify workflow canvas, not in this repo — there is no Python file to change.

Annotation reply (FAQ) vs KB retrieval

Two different answer paths, both handled by Dify:

Annotation reply — curated exact-match Q&A pairs. Managed via HUPH admin /knowledge/faq page which proxies to Dify's annotation API. Latency ~300 ms. Bypasses the LLM entirely.
KB retrieval — semantic search over crawled documents. Latency ~6 seconds including LLM call. Used when the annotation matcher doesn't find a hit.

See the FAQ operator guide and the KB operator guide for operator-facing flows.

KB ingestion (crawler-worker)

apps/crawler-worker is a Node.js background worker that picks up new / changed KB sources (web URLs, uploaded documents) from the HUPH DB and dispatches ingestion jobs to the Dify KB API:

apps/crawler-worker
    │
    │ fetch + clean + chunk
    ↓
Dify KB API (/v1/datasets/<DIFY_DATASET_ID>/document/create-by-text)
    │
    ↓
Dify dify-worker embeds + indexes into Milvus

The worker has retry logic (3 attempts, exponential backoff, 180 s timeout per page) per the project_kb_phase3_complete memory. Current KB state is ~308+ pages from uph.edu.

Observability

Langfuse — Dify has optional Langfuse trace export. As of Mar 27 2026 it was blocked on missing Langfuse SDK in Dify v1.13.3; verify current Dify version if tracing is expected. See project_kb_phase3_complete memory for the status at that time.
Phoenix — HUPH's apps/api still emits OpenTelemetry traces to Phoenix on port 6006, but those capture the API-layer spans, not Dify internals.
Dify admin UI — the fastest debugging tool for "why did the bot say X?" — shows prompt, retrieval, response per conversation.

Gotchas

There is no local RAG service anymore. Any instruction to run npm run dev:rag, cd apps/rag, or curl localhost:3102 is obsolete — check docker-compose.yml comments lines 33 + 40 + 54 for explicit confirmation.
Dify stack is memory-hungry. The Apr 8 megasession caught 7 containers dying simultaneously when huph-dify-beat OOM'd during flask upgrade-db. Memory limits bumped: dify-beat 256m→512m, dify-worker 512m→1g, dify-plugin-daemon 2g→3g. Preserve these bumps when you edit docker-compose.dify.yml.
Dify's Anthropic model ID must match what HUPH uses elsewhere: claude-haiku-4-5 (no date suffix). Changing it inside a Dify workflow will cause version drift with the API-layer intent router fallback.
Annotation sync is through HUPH's /knowledge/faq page, which writes to the local faq_local table and then calls the Dify annotation API. Editing annotations directly in Dify bypasses faq_local and causes drift — always go through the admin.
Milvus is NOT in the main docker-compose.yml — it lives in docker-compose.milvus.yml. Bring it up separately if needed locally: docker-compose -f docker-compose.milvus.yml up -d.
Persona variables must all be set when the API dispatches to Dify, otherwise the workflow errors. Defaults come from the Chatbot Settings page /admin/settings/chatbot which stores them in the PersonaConfig and syncs to Dify's workflow variables.