AI Pipeline (Dify-based)
Purpose
Deep-dive into HUPH's AI generation pipeline. In the current
architecture (April 2026 onward), all retrieval-augmented generation
happens inside a self-hosted Dify stack — NOT in an in-repo
Python service. This page covers how Dify is wired up, where the
boundaries between apps/api and Dify live, and what the prompt
workflow looks like.
Prerequisites
- Architecture overview — know that Dify is the AI layer
- Familiarity with Dify's concepts (apps, workflows, datasets,
annotation reply) — see
https://dify.ai/docs
Historical note
HUPH previously had a apps/rag/ Python FastAPI service on port
3102 running BGE-M3 embeddings + BGE-reranker-v2-m3 + a direct
Claude Haiku call. That service was removed in early April 2026
and replaced by Dify. Files like apps/rag/main.py, apps/rag/venv/,
apps/rag/test_*.py no longer exist in this repo. Commits in the
git history that refer to huph-rag, port 3102, BGE-M3, or the
uph_knowledge_v2 Qdrant collection predate this replacement.
If you're landing on this page looking for the old apps/rag docs,
they do not apply — read on for the current architecture.
Dify stack
HUPH runs Dify as a separate docker-compose stack defined in
/opt/huph/docker-compose.dify.yml. The stack includes:
| Container | Purpose |
|---|---|
huph-dify-api |
Main Dify API server (port 5001 internal) — chat-messages, datasets, annotation reply |
huph-dify-worker |
Celery-like background worker for embeddings + indexing jobs |
huph-dify-web |
Dify admin UI at https://dify.huph.val.id |
huph-dify-sandbox |
Sandbox for code execution inside workflows |
huph-dify-plugin-daemon |
Plugin runtime |
huph-dify-ssrf-proxy |
SSRF protection for outbound tool calls |
huph-dify-beat |
Scheduler for periodic tasks (memory-bumped to 512m after Apr 8 OOM) |
Dify uses its own Postgres + Redis (inside the stack) for its
metadata; it is NOT the same Postgres as HUPH's main postgres
service. Vector data lives in Milvus, brought up via
docker-compose.milvus.yml.
How apps/api talks to Dify
apps/api
│
│ HTTP (internal network: http://huph-dify-api:5001)
↓
Dify API
│
├── /v1/chat-messages (primary chat endpoint)
├── /v1/apps/annotations (FAQ annotation CRUD)
└── /v1/datasets/<id>/... (KB management)
Environment variables in apps/api that control the Dify
integration (set in root .env + passed through
docker-compose.yml environment block):
DIFY_API_URL=http://huph-dify-api:5001— primary chat endpointDIFY_APP_API_KEY— token for the chat-messages appDIFY_KB_API_URL=http://huph-dify-api:5001— same URL, different key scopeDIFY_KB_API_KEY— token for the KB management API (crawl, upload, delete)DIFY_DATASET_ID— the main KB dataset UUIDDIFY_INTENT_DATASET_ID— optional secondary dataset for intent routing
The HUPH API sends inputs to Dify chat-messages that include
persona variables (tone, emoji_usage, answer_length,
guidance_rules, etc.) — Dify's workflow reads those to build the
prompt.
Retrieval pipeline (inside Dify)
Dify's chat-messages workflow executes approximately:
1. User query arrives
2. Annotation reply matcher runs (~300 ms):
- Fuzzy match against curated FAQ annotations
- If hit: return annotated answer, skip LLM
3. Otherwise: RAG workflow:
a. OpenAI embeddings API called with the query
(HUPH provides OPENAI_API_KEY to Dify)
b. Milvus similarity search → top N candidates
c. Optional rerank (Dify config-dependent)
d. Context assembled into prompt with persona variables
e. Claude call via Dify's Anthropic integration
f. Response + citations returned
4. Dify records the trace internally (and optionally to Langfuse)
The exact workflow lives in Dify admin UI at
https://dify.huph.val.id. Edit it from the Dify workflow canvas,
not in this repo — there is no Python file to change.
Annotation reply (FAQ) vs KB retrieval
Two different answer paths, both handled by Dify:
- Annotation reply — curated exact-match Q&A pairs. Managed via
HUPH admin
/knowledge/faqpage which proxies to Dify's annotation API. Latency ~300 ms. Bypasses the LLM entirely. - KB retrieval — semantic search over crawled documents. Latency ~6 seconds including LLM call. Used when the annotation matcher doesn't find a hit.
See the FAQ operator guide and the KB operator guide for operator-facing flows.
KB ingestion (crawler-worker)
apps/crawler-worker is a Node.js background worker that picks up
new / changed KB sources (web URLs, uploaded documents) from the
HUPH DB and dispatches ingestion jobs to the Dify KB API:
apps/crawler-worker
│
│ fetch + clean + chunk
↓
Dify KB API (/v1/datasets/<DIFY_DATASET_ID>/document/create-by-text)
│
↓
Dify dify-worker embeds + indexes into Milvus
The worker has retry logic (3 attempts, exponential backoff,
180 s timeout per page) per the project_kb_phase3_complete
memory. Current KB state is ~308+ pages from uph.edu.
Observability
- Langfuse — Dify has optional Langfuse trace export. As of
Mar 27 2026 it was blocked on missing Langfuse SDK in Dify
v1.13.3; verify current Dify version if tracing is expected. See
project_kb_phase3_completememory for the status at that time. - Phoenix — HUPH's
apps/apistill emits OpenTelemetry traces to Phoenix on port 6006, but those capture the API-layer spans, not Dify internals. - Dify admin UI — the fastest debugging tool for "why did the bot say X?" — shows prompt, retrieval, response per conversation.
Gotchas
- There is no local RAG service anymore. Any instruction to
run
npm run dev:rag,cd apps/rag, orcurl localhost:3102is obsolete — checkdocker-compose.ymlcomments lines 33 + 40 + 54 for explicit confirmation. - Dify stack is memory-hungry. The Apr 8 megasession caught
7 containers dying simultaneously when
huph-dify-beatOOM'd duringflask upgrade-db. Memory limits bumped:dify-beat256m→512m,dify-worker512m→1g,dify-plugin-daemon2g→3g. Preserve these bumps when you editdocker-compose.dify.yml. - Dify's Anthropic model ID must match what HUPH uses
elsewhere:
claude-haiku-4-5(no date suffix). Changing it inside a Dify workflow will cause version drift with the API-layer intent router fallback. - Annotation sync is through HUPH's
/knowledge/faqpage, which writes to the localfaq_localtable and then calls the Dify annotation API. Editing annotations directly in Dify bypassesfaq_localand causes drift — always go through the admin. - Milvus is NOT in the main
docker-compose.yml— it lives indocker-compose.milvus.yml. Bring it up separately if needed locally:docker-compose -f docker-compose.milvus.yml up -d. - Persona variables must all be set when the API dispatches to
Dify, otherwise the workflow errors. Defaults come from the
Chatbot Settings page
/admin/settings/chatbotwhich stores them in the PersonaConfig and syncs to Dify's workflow variables.
See also
- Architecture overview — system-level context
- Integrations — the full integration table
- API — where
apps/apidispatches to Dify - Debugging — Dify log locations
project_kb_phase3_completememory — KB crawl state + Dify IDs