Skip to content

Architecture

Alquimia has two layers:

┌─────────────────────────────────────────────────────────────┐
│ External Clients │
│ (HTTP, SSE, Slack, WhatsApp, Email) │
└──────────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ alquimia-runtime (FastAPI) │
│ │
│ Auth → Registry → Event ingestion → SSE streaming │
│ Redis (state) · PostgreSQL (persistence) · S3 (blobs) │
└──────────────────────┬──────────────────────────────────────┘
│ CloudEvents
┌─────────────────────────────────────────────────────────────┐
│ alquimia-core (SDK) │
│ │
│ Controller → Stages → Worklog → evaluate() │
│ Tools (MCP / Llama Stack / A2A) · Memory · Evaluation │
└─────────────────────────────────────────────────────────────┘

alquimia-runtime handles HTTP, auth, session state, and event routing. It delegates all agent reasoning to alquimia-core via CloudEvents.

The runtime exposes the following HTTP routers. All are served by the master instance. Workers expose only the health probe.

PrefixVisibilityDescription
/authPublicAuthentication endpoints (Keycloak OIDC flows)
/eventPublicInference submission, SSE streaming, tool approval/completion
/contextPublicSession context retrieval, blob upload/download
/registryPublicAgentspace, agent, secret, parameter, and OCI distribution management
/healthPublic (no auth)Liveness and readiness probes
/knowledgePublicKnowledge base file and topic management (RAG)
/taskPublicController state management (run/stop/restart)
/worklogPublicInference audit history
/webhooksPublicOutbound webhook subscription management

All requests carry an X-Request-ID header (generated or propagated) for log correlation across the Kafka event sequence.

Every inference request flows through a three-stage pipeline:

AssistantInference command
┌─────────────────┐
│ Preprocess │ Shield inference (content moderation, classification)
│ Stage │ Attachment normalization
└────────┬────────┘
┌─────────────────┐
│ Process │ Tool/agent discovery
│ Stage │ LLM response inference
│ │ Tool execution loop (up to max_steps)
└────────┬────────┘
┌─────────────────┐
│ Answer │ Long-term memory flush
│ Stage │ Context persistence
│ │ Final AssistantInferenceResponse
└─────────────────┘

Each stage implements three methods:

  • act(worklog) — emits new commands when work is needed
  • ack(event, worklog) — processes incoming response events
  • has_finished(worklog) — returns True when the stage is done

evaluate() is the recursive orchestration function:

async def evaluate(controller, event, registry, tool_executor=None, _depth=0):
async for emitted in controller.process(event):
if isinstance(emitted, AssistantInferenceResponse):
return emitted
response = await _handle_event(emitted, controller, registry, ...)
# feed response back into evaluate()

_handle_event() dispatches each emitted command to the appropriate handler:

  • ResponseInference → calls the LLM
  • ServerToolExecution → calls an MCP/Llama Stack tool
  • A2AInference → recursively calls another agent
  • ContextFlush → runs the memory summarizer
  • ContextPersistence → saves the session

Maximum recursion depth is MAX_EVALUATE_DEPTH = 10,000.

The Worklog is an append-only log of all events processed during a single inference run. It provides:

  • O(1) lookups by (event_class, control_id)
  • LangChain message conversion for LLM prompts
  • Tool execution tracking (pending, completed, errors)

Inference is non-blocking. The master and workers share no direct HTTP path — all coordination goes through Kafka.

  1. POST /event/infer/{assistant_id} signs and publishes an AssistantInference CloudEvent to Kafka and returns a task_id immediately.
  2. A worker consumes the event, verifies its HMAC-SHA256 signature, and dispatches it in-process via kafka/dispatcher.py.
  3. The dispatcher drives the controller, which emits follow-up events back to Kafka and enqueues worklog records to Redis.
  4. GET /event/stream/{task_id} streams those worklog records as SSE until AssistantInferenceResponse is received.
Client Master (runtime) Kafka Worker (runtime) Redis
│ │ │ │ │
│ POST /infer │ │ │ │
│───────────────>│ │ │ │
│ │ publish + sign CE │ │ │
│ │──────────────────>│ │ │
│ {task_id} │ │ │ │
│<───────────────│ │ consume + verify│ │
│ │ │────────────────>│ │
│ │ │ dispatch() │ │
│ │ │ │ enqueue records│
│ │ │ │───────────────>│
│ GET /stream │ │ │ │
│───────────────>│ │ │ │
│ │ dequeue │ │ │
│ │<────────────────────────────────────────────────────│
│ SSE: record │ │ │ │
│<───────────────│ │ │ │
│ SSE: final │ │ │ │
│<───────────────│ │ │ │

A single container image runs in two modes, selected by ALQUIMIA_RUNTIME_MODE:

ModeRoleKafkaHTTP API
master (default)Serves the external API, registry, context, and knowledge baseProducer onlyFull — all routers
workerKafka consumer, executes inference in-processConsumer + producerHealth probe only

Scale worker replicas to match Kafka partition count for maximum throughput.

Kafka as the event bus. All master-to-worker communication uses Apache Kafka. CloudEvents provide a standardised envelope — every message is HMAC-SHA256 signed before publishing and verified before dispatch. Workers drop any event with a missing or invalid signature.

Redis as the universal state store. Redis serves as message queue, key-value store, and distributed lock coordinator. This minimizes infrastructure complexity while supporting the async, event-driven architecture.

Pluggable authentication. The AuthProvider enum allows the same codebase to operate in API-token mode (development), JWT mode (microservice-to-microservice), or Keycloak OIDC mode (production) without code changes.

Registry-driven agent configuration. Agent configurations are stored in a TinyDB-backed registry with OCI artifact support. This enables version-controlled, portable agent packages that can be published, pulled, and validated independently of the runtime.

Observability by default. Every request carries an X-Request-ID (generated or propagated). The runtime emits OpenTelemetry traces and logs via FastAPIInstrumentor and a loguru→OTEL bridge. The core SDK emits 18 OTEL metrics covering token consumption, latency, tool invocations, shield calls, empathy rule matches, and agent lifecycle events. All three signals share CommonAttributes dimensions (assistant_id, session_id, task_id) for cross-signal correlation. See Observability for the full model.

RepositoryWhat lives here
Alquimia-ai/alquimia-coreevaluate(), controller, stages, worklog, tools, memory, registry, CLI
Alquimia-ai/alquimia-runtimeFastAPI routers, Redis storage, PostgreSQL models, Kafka event pipeline (kafka/), Docker Compose, Kubernetes manifests (k8s/)