Lewati ke isi

AI Pipeline (Dify-based)

Purpose

Deep-dive into HUPH's AI generation pipeline. In the current architecture (April 2026 onward), all retrieval-augmented generation happens inside a self-hosted Dify stack — NOT in an in-repo Python service. This page covers how Dify is wired up, where the boundaries between apps/api and Dify live, and what the prompt workflow looks like.

Prerequisites

  • Architecture overview — know that Dify is the AI layer
  • Familiarity with Dify's concepts (apps, workflows, datasets, annotation reply) — see https://dify.ai/docs

Historical note

HUPH previously had a apps/rag/ Python FastAPI service on port 3102 running BGE-M3 embeddings + BGE-reranker-v2-m3 + a direct Claude Haiku call. That service was removed in early April 2026 and replaced by Dify. Files like apps/rag/main.py, apps/rag/venv/, apps/rag/test_*.py no longer exist in this repo. Commits in the git history that refer to huph-rag, port 3102, BGE-M3, or the uph_knowledge_v2 Qdrant collection predate this replacement.

If you're landing on this page looking for the old apps/rag docs, they do not apply — read on for the current architecture.

Dify stack

HUPH runs Dify as a separate docker-compose stack defined in /opt/huph/docker-compose.dify.yml. The stack includes:

Container Purpose
huph-dify-api Main Dify API server (port 5001 internal) — chat-messages, datasets, annotation reply
huph-dify-worker Celery-like background worker for embeddings + indexing jobs
huph-dify-web Dify admin UI at https://dify.huph.val.id
huph-dify-sandbox Sandbox for code execution inside workflows
huph-dify-plugin-daemon Plugin runtime
huph-dify-ssrf-proxy SSRF protection for outbound tool calls
huph-dify-beat Scheduler for periodic tasks (memory-bumped to 512m after Apr 8 OOM)

Dify uses its own Postgres + Redis (inside the stack) for its metadata; it is NOT the same Postgres as HUPH's main postgres service. Vector data lives in Milvus, brought up via docker-compose.milvus.yml.

How apps/api talks to Dify

apps/api
    │
    │ HTTP (internal network: http://huph-dify-api:5001)
    ↓
Dify API
    │
    ├── /v1/chat-messages          (primary chat endpoint)
    ├── /v1/apps/annotations        (FAQ annotation CRUD)
    └── /v1/datasets/<id>/...       (KB management)

Environment variables in apps/api that control the Dify integration (set in root .env + passed through docker-compose.yml environment block):

  • DIFY_API_URL=http://huph-dify-api:5001 — primary chat endpoint
  • DIFY_APP_API_KEY — token for the chat-messages app
  • DIFY_KB_API_URL=http://huph-dify-api:5001 — same URL, different key scope
  • DIFY_KB_API_KEY — token for the KB management API (crawl, upload, delete)
  • DIFY_DATASET_ID — the main KB dataset UUID
  • DIFY_INTENT_DATASET_ID — optional secondary dataset for intent routing

The HUPH API sends inputs to Dify chat-messages that include persona variables (tone, emoji_usage, answer_length, guidance_rules, etc.) — Dify's workflow reads those to build the prompt.

Retrieval pipeline (inside Dify)

Dify's chat-messages workflow executes approximately:

1. User query arrives
2. Annotation reply matcher runs (~300 ms):
   - Fuzzy match against curated FAQ annotations
   - If hit: return annotated answer, skip LLM
3. Otherwise: RAG workflow:
   a. OpenAI embeddings API called with the query
      (HUPH provides OPENAI_API_KEY to Dify)
   b. Milvus similarity search → top N candidates
   c. Optional rerank (Dify config-dependent)
   d. Context assembled into prompt with persona variables
   e. Claude call via Dify's Anthropic integration
   f. Response + citations returned
4. Dify records the trace internally (and optionally to Langfuse)

The exact workflow lives in Dify admin UI at https://dify.huph.val.id. Edit it from the Dify workflow canvas, not in this repo — there is no Python file to change.

Annotation reply (FAQ) vs KB retrieval

Two different answer paths, both handled by Dify:

  • Annotation reply — curated exact-match Q&A pairs. Managed via HUPH admin /knowledge/faq page which proxies to Dify's annotation API. Latency ~300 ms. Bypasses the LLM entirely.
  • KB retrieval — semantic search over crawled documents. Latency ~6 seconds including LLM call. Used when the annotation matcher doesn't find a hit.

See the FAQ operator guide and the KB operator guide for operator-facing flows.

KB ingestion (crawler-worker)

apps/crawler-worker is a Node.js background worker that picks up new / changed KB sources (web URLs, uploaded documents) from the HUPH DB and dispatches ingestion jobs to the Dify KB API:

apps/crawler-worker
    │
    │ fetch + clean + chunk
    ↓
Dify KB API (/v1/datasets/<DIFY_DATASET_ID>/document/create-by-text)
    │
    ↓
Dify dify-worker embeds + indexes into Milvus

The worker has retry logic (3 attempts, exponential backoff, 180 s timeout per page) per the project_kb_phase3_complete memory. Current KB state is ~308+ pages from uph.edu.

Observability

  • Langfuse — Dify has optional Langfuse trace export. As of Mar 27 2026 it was blocked on missing Langfuse SDK in Dify v1.13.3; verify current Dify version if tracing is expected. See project_kb_phase3_complete memory for the status at that time.
  • Phoenix — HUPH's apps/api still emits OpenTelemetry traces to Phoenix on port 6006, but those capture the API-layer spans, not Dify internals.
  • Dify admin UI — the fastest debugging tool for "why did the bot say X?" — shows prompt, retrieval, response per conversation.

Gotchas

  1. There is no local RAG service anymore. Any instruction to run npm run dev:rag, cd apps/rag, or curl localhost:3102 is obsolete — check docker-compose.yml comments lines 33 + 40 + 54 for explicit confirmation.
  2. Dify stack is memory-hungry. The Apr 8 megasession caught 7 containers dying simultaneously when huph-dify-beat OOM'd during flask upgrade-db. Memory limits bumped: dify-beat 256m→512m, dify-worker 512m→1g, dify-plugin-daemon 2g→3g. Preserve these bumps when you edit docker-compose.dify.yml.
  3. Dify's Anthropic model ID must match what HUPH uses elsewhere: claude-haiku-4-5 (no date suffix). Changing it inside a Dify workflow will cause version drift with the API-layer intent router fallback.
  4. Annotation sync is through HUPH's /knowledge/faq page, which writes to the local faq_local table and then calls the Dify annotation API. Editing annotations directly in Dify bypasses faq_local and causes drift — always go through the admin.
  5. Milvus is NOT in the main docker-compose.yml — it lives in docker-compose.milvus.yml. Bring it up separately if needed locally: docker-compose -f docker-compose.milvus.yml up -d.
  6. Persona variables must all be set when the API dispatches to Dify, otherwise the workflow errors. Defaults come from the Chatbot Settings page /admin/settings/chatbot which stores them in the PersonaConfig and syncs to Dify's workflow variables.

See also

  • Architecture overview — system-level context
  • Integrations — the full integration table
  • API — where apps/api dispatches to Dify
  • Debugging — Dify log locations
  • project_kb_phase3_complete memory — KB crawl state + Dify IDs