Architecture
System overview
Section titled “System overview”Alquimia has two layers:
┌─────────────────────────────────────────────────────────────┐│ External Clients ││ (HTTP, SSE, Slack, WhatsApp, Email) │└──────────────────────┬──────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ alquimia-runtime (FastAPI) ││ ││ Auth → Registry → Event ingestion → SSE streaming ││ Redis (state) · PostgreSQL (persistence) · S3 (blobs) │└──────────────────────┬──────────────────────────────────────┘ │ CloudEvents ▼┌─────────────────────────────────────────────────────────────┐│ alquimia-core (SDK) ││ ││ Controller → Stages → Worklog → evaluate() ││ Tools (MCP / Llama Stack / A2A) · Memory · Evaluation │└─────────────────────────────────────────────────────────────┘alquimia-runtime handles HTTP, auth, session state, and event routing. It delegates all agent reasoning to alquimia-core via CloudEvents.
alquimia-runtime router map
Section titled “alquimia-runtime router map”The runtime exposes the following HTTP routers. All are served by the master instance. Workers expose only the health probe.
| Prefix | Visibility | Description |
|---|---|---|
/auth | Public | Authentication endpoints (Keycloak OIDC flows) |
/event | Public | Inference submission, SSE streaming, tool approval/completion |
/context | Public | Session context retrieval, blob upload/download |
/registry | Public | Agentspace, agent, secret, parameter, and OCI distribution management |
/health | Public (no auth) | Liveness and readiness probes |
/knowledge | Public | Knowledge base file and topic management (RAG) |
/task | Public | Controller state management (run/stop/restart) |
/worklog | Public | Inference audit history |
/webhooks | Public | Outbound webhook subscription management |
All requests carry an X-Request-ID header (generated or propagated) for log correlation across the Kafka event sequence.
alquimia-core internals
Section titled “alquimia-core internals”The controller and stages
Section titled “The controller and stages”Every inference request flows through a three-stage pipeline:
AssistantInference command │ ▼┌─────────────────┐│ Preprocess │ Shield inference (content moderation, classification)│ Stage │ Attachment normalization└────────┬────────┘ │ ▼┌─────────────────┐│ Process │ Tool/agent discovery│ Stage │ LLM response inference│ │ Tool execution loop (up to max_steps)└────────┬────────┘ │ ▼┌─────────────────┐│ Answer │ Long-term memory flush│ Stage │ Context persistence│ │ Final AssistantInferenceResponse└─────────────────┘Each stage implements three methods:
act(worklog)— emits new commands when work is neededack(event, worklog)— processes incoming response eventshas_finished(worklog)— returnsTruewhen the stage is done
The evaluate() loop
Section titled “The evaluate() loop”evaluate() is the recursive orchestration function:
async def evaluate(controller, event, registry, tool_executor=None, _depth=0): async for emitted in controller.process(event): if isinstance(emitted, AssistantInferenceResponse): return emitted response = await _handle_event(emitted, controller, registry, ...) # feed response back into evaluate()_handle_event() dispatches each emitted command to the appropriate handler:
ResponseInference→ calls the LLMServerToolExecution→ calls an MCP/Llama Stack toolA2AInference→ recursively calls another agentContextFlush→ runs the memory summarizerContextPersistence→ saves the session
Maximum recursion depth is MAX_EVALUATE_DEPTH = 10,000.
The worklog
Section titled “The worklog”The Worklog is an append-only log of all events processed during a single inference run. It provides:
- O(1) lookups by
(event_class, control_id) - LangChain message conversion for LLM prompts
- Tool execution tracking (pending, completed, errors)
alquimia-runtime internals
Section titled “alquimia-runtime internals”Async inference via Kafka
Section titled “Async inference via Kafka”Inference is non-blocking. The master and workers share no direct HTTP path — all coordination goes through Kafka.
POST /event/infer/{assistant_id}signs and publishes anAssistantInferenceCloudEvent to Kafka and returns atask_idimmediately.- A worker consumes the event, verifies its HMAC-SHA256 signature, and dispatches it in-process via
kafka/dispatcher.py. - The dispatcher drives the controller, which emits follow-up events back to Kafka and enqueues worklog records to Redis.
GET /event/stream/{task_id}streams those worklog records as SSE untilAssistantInferenceResponseis received.
Client Master (runtime) Kafka Worker (runtime) Redis │ │ │ │ │ │ POST /infer │ │ │ │ │───────────────>│ │ │ │ │ │ publish + sign CE │ │ │ │ │──────────────────>│ │ │ │ {task_id} │ │ │ │ │<───────────────│ │ consume + verify│ │ │ │ │────────────────>│ │ │ │ │ dispatch() │ │ │ │ │ │ enqueue records│ │ │ │ │───────────────>│ │ GET /stream │ │ │ │ │───────────────>│ │ │ │ │ │ dequeue │ │ │ │ │<────────────────────────────────────────────────────│ │ SSE: record │ │ │ │ │<───────────────│ │ │ │ │ SSE: final │ │ │ │ │<───────────────│ │ │ │Master and worker modes
Section titled “Master and worker modes”A single container image runs in two modes, selected by ALQUIMIA_RUNTIME_MODE:
| Mode | Role | Kafka | HTTP API |
|---|---|---|---|
master (default) | Serves the external API, registry, context, and knowledge base | Producer only | Full — all routers |
worker | Kafka consumer, executes inference in-process | Consumer + producer | Health probe only |
Scale worker replicas to match Kafka partition count for maximum throughput.
Key design decisions
Section titled “Key design decisions”Kafka as the event bus. All master-to-worker communication uses Apache Kafka. CloudEvents provide a standardised envelope — every message is HMAC-SHA256 signed before publishing and verified before dispatch. Workers drop any event with a missing or invalid signature.
Redis as the universal state store. Redis serves as message queue, key-value store, and distributed lock coordinator. This minimizes infrastructure complexity while supporting the async, event-driven architecture.
Pluggable authentication. The AuthProvider enum allows the same codebase to operate in API-token mode (development), JWT mode (microservice-to-microservice), or Keycloak OIDC mode (production) without code changes.
Registry-driven agent configuration. Agent configurations are stored in a TinyDB-backed registry with OCI artifact support. This enables version-controlled, portable agent packages that can be published, pulled, and validated independently of the runtime.
Observability by default. Every request carries an X-Request-ID (generated or propagated). The runtime emits OpenTelemetry traces and logs via FastAPIInstrumentor and a loguru→OTEL bridge. The core SDK emits 18 OTEL metrics covering token consumption, latency, tool invocations, shield calls, empathy rule matches, and agent lifecycle events. All three signals share CommonAttributes dimensions (assistant_id, session_id, task_id) for cross-signal correlation. See Observability for the full model.
Related pages
Section titled “Related pages”- Observability — metrics, traces, logs, and correlation
- Event Model — the typed command/response system
- Agent Configuration — AssistantConfig structure
- evaluate() reference — the orchestration function
- Inference Endpoints — the public HTTP API
Source repositories
Section titled “Source repositories”| Repository | What lives here |
|---|---|
Alquimia-ai/alquimia-core | evaluate(), controller, stages, worklog, tools, memory, registry, CLI |
Alquimia-ai/alquimia-runtime | FastAPI routers, Redis storage, PostgreSQL models, Kafka event pipeline (kafka/), Docker Compose, Kubernetes manifests (k8s/) |