Architecture

System overview

Alquimia has two layers:

┌─────────────────────────────────────────────────────────────┐
│                    External Clients                         │
│          (HTTP, SSE, Slack, WhatsApp, Email)                │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│                 alquimia-runtime (FastAPI)                  │
│                                                             │
│  Auth → Registry → Event ingestion → SSE streaming         │
│  Redis (state) · PostgreSQL (persistence) · S3 (blobs)     │
└──────────────────────┬──────────────────────────────────────┘
                       │ CloudEvents
                       ▼
┌─────────────────────────────────────────────────────────────┐
│                  alquimia-core (SDK)                        │
│                                                             │
│  Controller → Stages → Worklog → evaluate()                │
│  Tools (MCP / Llama Stack / A2A) · Memory · Evaluation     │
└─────────────────────────────────────────────────────────────┘

alquimia-runtime handles HTTP, auth, session state, and event routing. It delegates all agent reasoning to alquimia-core via CloudEvents.

alquimia-runtime router map

The runtime exposes the following HTTP routers. All are served by the master instance. Workers expose only the health probe.

Prefix	Visibility	Description
`/auth`	Public	Authentication endpoints (Keycloak OIDC flows)
`/event`	Public	Inference submission, SSE streaming, tool approval/completion
`/context`	Public	Session context retrieval, blob upload/download
`/registry`	Public	Agentspace, agent, secret, parameter, and OCI distribution management
`/health`	Public (no auth)	Liveness and readiness probes
`/knowledge`	Public	Knowledge base file and topic management (RAG)
`/task`	Public	Controller state management (run/stop/restart)
`/worklog`	Public	Inference audit history
`/webhooks`	Public	Outbound webhook subscription management

All requests carry an X-Request-ID header (generated or propagated) for log correlation across the Kafka event sequence.

alquimia-core internals

The controller and stages

Every inference request flows through a three-stage pipeline:

AssistantInference command
         │
         ▼
┌─────────────────┐
│  Preprocess     │  Shield inference (content moderation, classification)
│  Stage          │  Attachment normalization
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Process        │  Tool/agent discovery
│  Stage          │  LLM response inference
│                 │  Tool execution loop (up to max_steps)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Answer         │  Long-term memory flush
│  Stage          │  Context persistence
│                 │  Final AssistantInferenceResponse
└─────────────────┘

Each stage implements three methods:

act(worklog) — emits new commands when work is needed
ack(event, worklog) — processes incoming response events
has_finished(worklog) — returns True when the stage is done

The evaluate() loop

evaluate() is the recursive orchestration function:

async def evaluate(controller, event, registry, tool_executor=None, _depth=0):
    async for emitted in controller.process(event):
        if isinstance(emitted, AssistantInferenceResponse):
            return emitted
        response = await _handle_event(emitted, controller, registry, ...)
        # feed response back into evaluate()

_handle_event() dispatches each emitted command to the appropriate handler:

ResponseInference → calls the LLM
ServerToolExecution → calls an MCP/Llama Stack tool
A2AInference → recursively calls another agent
ContextFlush → runs the memory summarizer
ContextPersistence → saves the session

Maximum recursion depth is MAX_EVALUATE_DEPTH = 10,000.

The worklog

The Worklog is an append-only log of all events processed during a single inference run. It provides:

O(1) lookups by (event_class, control_id)
LangChain message conversion for LLM prompts
Tool execution tracking (pending, completed, errors)

alquimia-runtime internals

Async inference via Kafka

Inference is non-blocking. The master and workers share no direct HTTP path — all coordination goes through Kafka.

POST /event/infer/{assistant_id} signs and publishes an AssistantInference CloudEvent to Kafka and returns a task_id immediately.
A worker consumes the event, verifies its HMAC-SHA256 signature, and dispatches it in-process via kafka/dispatcher.py.
The dispatcher drives the controller, which emits follow-up events back to Kafka and enqueues worklog records to Redis.
GET /event/stream/{task_id} streams those worklog records as SSE until AssistantInferenceResponse is received.

Client          Master (runtime)     Kafka          Worker (runtime)    Redis
  │                │                   │                 │                │
  │ POST /infer    │                   │                 │                │
  │───────────────>│                   │                 │                │
  │                │ publish + sign CE │                 │                │
  │                │──────────────────>│                 │                │
  │ {task_id}      │                   │                 │                │
  │<───────────────│                   │ consume + verify│                │
  │                │                   │────────────────>│                │
  │                │                   │   dispatch()    │                │
  │                │                   │                 │ enqueue records│
  │                │                   │                 │───────────────>│
  │ GET /stream    │                   │                 │                │
  │───────────────>│                   │                 │                │
  │                │ dequeue           │                 │                │
  │                │<────────────────────────────────────────────────────│
  │ SSE: record    │                   │                 │                │
  │<───────────────│                   │                 │                │
  │ SSE: final     │                   │                 │                │
  │<───────────────│                   │                 │                │

Master and worker modes

A single container image runs in two modes, selected by ALQUIMIA_RUNTIME_MODE:

Mode	Role	Kafka	HTTP API
`master` (default)	Serves the external API, registry, context, and knowledge base	Producer only	Full — all routers
`worker`	Kafka consumer, executes inference in-process	Consumer + producer	Health probe only

Scale worker replicas to match Kafka partition count for maximum throughput.

Key design decisions

Kafka as the event bus. All master-to-worker communication uses Apache Kafka. CloudEvents provide a standardised envelope — every message is HMAC-SHA256 signed before publishing and verified before dispatch. Workers drop any event with a missing or invalid signature.

Redis as the universal state store. Redis serves as message queue, key-value store, and distributed lock coordinator. This minimizes infrastructure complexity while supporting the async, event-driven architecture.

Pluggable authentication. The AuthProvider enum allows the same codebase to operate in API-token mode (development), JWT mode (microservice-to-microservice), or Keycloak OIDC mode (production) without code changes.

Registry-driven agent configuration. Agent configurations are stored in a TinyDB-backed registry with OCI artifact support. This enables version-controlled, portable agent packages that can be published, pulled, and validated independently of the runtime.

Observability by default. Every request carries an X-Request-ID (generated or propagated). The runtime emits OpenTelemetry traces and logs via FastAPIInstrumentor and a loguru→OTEL bridge. The core SDK emits 18 OTEL metrics covering token consumption, latency, tool invocations, shield calls, empathy rule matches, and agent lifecycle events. All three signals share CommonAttributes dimensions (assistant_id, session_id, task_id) for cross-signal correlation. See Observability for the full model.

Observability — metrics, traces, logs, and correlation
Event Model — the typed command/response system
Agent Configuration — AssistantConfig structure
evaluate() reference — the orchestration function
Inference Endpoints — the public HTTP API

Source repositories

Repository	What lives here
`Alquimia-ai/alquimia-core`	`evaluate()`, controller, stages, worklog, tools, memory, registry, CLI
`Alquimia-ai/alquimia-runtime`	FastAPI routers, Redis storage, PostgreSQL models, Kafka event pipeline (`kafka/`), Docker Compose, Kubernetes manifests (`k8s/`)