Inference Endpoints
The inference API is the primary interface for running agents. All endpoints are served by alquimia-runtime on port 8080.
Authentication
Section titled “Authentication”All endpoints (except health probes) require a bearer token in the Authorization header:
Authorization: Bearer <token>The token format depends on AUTH_PROVIDER:
| AUTH_PROVIDER | Token format | Configuration |
|---|---|---|
| api_token (default) | Static string matching API_TOKEN env var | Set API_TOKEN in .env |
| jwt | Signed JWT (HMAC HS256 by default) | Set JWT_SECRET, JWT_ALGORITHM |
| keycloak | Keycloak OIDC access token | Set KEYCLOAK_* env vars |
See Configuration Reference for full auth settings.
Health endpoints
Section titled “Health endpoints”Health endpoints do not require authentication. Use them for Kubernetes liveness and readiness probes.
GET /health/liveness
Section titled “GET /health/liveness”Liveness probe — is the process alive?
Returns 200 OK if the process is running. Never checks external dependencies. Kubernetes restarts the pod if this fails.
curl http://localhost:8080/health/liveness# "OK"GET /health/readiness
Section titled “GET /health/readiness”Readiness probe — can the service handle traffic?
Returns 200 OK only if all critical dependencies are reachable: PostgreSQL (verified with SELECT 1) and Redis (verified with PING). Kubernetes stops routing traffic if this fails — the pod stays alive.
curl http://localhost:8080/health/readiness# "OK"Error responses:
| Status | Condition |
|---|---|
| 500 | PostgreSQL or Redis unreachable |
Request correlation headers
Section titled “Request correlation headers”Every request to the runtime carries two sets of correlation identifiers.
X-Request-ID
Section titled “X-Request-ID”The runtime generates or propagates an X-Request-ID header on every request:
| Direction | Behaviour |
|---|---|
| Inbound | If the client provides X-Request-ID, it is used as-is. If omitted, the runtime generates a UUID v4. |
| Outbound | The ID is always echoed back in the response X-Request-ID header. |
X-Request-ID is per-HTTP-request. Use it to correlate logs for a single HTTP call. Use task_id to correlate everything across an entire inference run (multiple HTTP calls + CloudEvent hops).
CommonAttributes headers
Section titled “CommonAttributes headers”The inference endpoints accept the following headers, which are propagated as observability dimensions through the entire execution stack (metrics, traces, and logs):
| Header | Description |
|---|---|
| assistant-id | The agent being invoked |
| agentspace-id | The agentspace containing the agent |
| user-id | The end user |
| session-id | The conversation session |
| task-id | The inference task |
| channel-id | The channel (optional) |
These headers are extracted by telemetry_metadata_middleware and attached to every OTEL metric, trace span, and log record for the duration of the request. See Observability for details.
Inference flow
Section titled “Inference flow”Alquimia uses an async, event-driven inference model:
POST /event/infer/{assistant_id}— signs and publishes the request as a CloudEvent to Kafka, then returns atask_idimmediately.- A worker instance consumes the event from Kafka, verifies its signature, and executes the agent in-process.
GET /event/stream/{task_id}— streams worklog records as SSE until the finalAssistantInferenceResponsearrives.
This decoupling means your client never blocks waiting for the LLM. Long-running agents with multiple tool calls stream progress in real time.
POST /event/infer/{assistant_id}
Section titled “POST /event/infer/{assistant_id}”Submit an inference request. Returns immediately with a task_id.
Path parameters
Section titled “Path parameters”| Parameter | Type | Description |
|---|---|---|
| assistant_id | string | The agent to invoke. Must exist in the target agentspace. |
Query parameters
Section titled “Query parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
| agentspace_id | string | "default" | The agentspace containing the agent |
Request body
Section titled “Request body”{ "query": "What is the status of ticket INC-1234?", "user_id": "user-1", "session_id": "session-abc", "task_id": "task-xyz", "knowledge_base": [], "extra_instructions": {}, "evaluation_strategy": null}| Field | Type | Required | Description |
|---|---|---|---|
| query | string | Yes | The user’s message |
| user_id | string | Yes | User identifier for session scoping |
| session_id | string | Yes | Session identifier for conversation continuity |
| task_id | string | No | Custom task ID. Auto-generated (UUID) if omitted. |
| knowledge_base | KnowledgeBase[] | No | Additional knowledge bases to inject for this request only |
| extra_instructions | dict[str, str] | No | Named prompt clauses to inject into the system prompt |
| evaluation_strategy | EvaluationStrategy | No | Override the agent’s configured evaluation strategy for this request |
Response — 200 OK
Section titled “Response — 200 OK”{ "task_id": "task-abc123", "session_id": "session-abc", "user_id": "user-1", "assistant_id": "my-agent", "agentspace_id": "default"}Response headers
Section titled “Response headers”| Header | Description |
|---|---|
| X-Request-ID | The request correlation ID (generated or propagated) |
Error responses
Section titled “Error responses”| Status | Condition |
|---|---|
| 401 | Missing or invalid Authorization header |
| 404 | Agent not found in the agentspace |
| 500 | Connection error to Redis or CloudEvent sink |
Example
Section titled “Example”curl -X POST "http://localhost:8080/event/infer/my-agent?agentspace_id=default" \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "query": "What is the status of ticket INC-1234?", "user_id": "user-1", "session_id": "session-abc" }'GET /event/stream/{task_id}
Section titled “GET /event/stream/{task_id}”Stream inference progress as Server-Sent Events (SSE). Emits worklog records until AssistantInferenceResponse is received.
Path parameters
Section titled “Path parameters”| Parameter | Type | Description |
|---|---|---|
| task_id | string | The task ID returned by POST /event/infer |
Response
Section titled “Response”A stream of SSE events. Each event is a JSON-encoded WorklogRecord:
data: {"event_class": "ResponseInferenceResponse", "result": {...}, "status": "success"}
data: {"event_class": "ToolExecutionResponse", "tool_name": "search_tickets", "result": "INC-1234: Open", "status": "success"}
data: {"event_class": "AssistantInferenceResponse", "result": {"content": "Ticket INC-1234 is currently Open."}, "status": "success"}The stream ends when event_class == "AssistantInferenceResponse".
Special event classes to watch for:
| event_class | Action required |
|---|---|
| HumanApprovalRequired | Call POST /event/tool-approval to approve or reject |
| ClientToolExecution | Execute the tool client-side and call POST /event/tool-completion |
| AssistantInferenceResponse | Final answer — close the stream |
Error responses
Section titled “Error responses”| Status | Condition |
|---|---|
| 401 | Missing or invalid Authorization header |
| 404 | No ongoing task with the given task_id |
Example
Section titled “Example”curl -N "http://localhost:8080/event/stream/task-abc123" \ -H "Authorization: Bearer $API_TOKEN"POST /event/tool-approval
Section titled “POST /event/tool-approval”Approve or reject a tool execution that requires human approval.
Call this endpoint when the SSE stream emits a HumanApprovalRequired event. The agent blocks until a response is received.
Request body
Section titled “Request body”{ "control_id": "tool-call-xyz", "result": true, "message": ""}| Field | Type | Description |
|---|---|---|
| control_id | string | The control_id from the HumanApprovalRequired event |
| result | bool | true to approve, false to reject |
| message | string | Rejection reason (required when result is false) |
Required context headers
Section titled “Required context headers”These headers identify the active inference session. Copy them from the original POST /event/infer response:
| Header | Description |
|---|---|
| task-id | The task ID from the inference request |
| session-id | The session ID |
| user-id | The user ID |
| assistant-id | The assistant ID |
| agentspace-id | The agentspace ID |
Example
Section titled “Example”curl -X POST http://localhost:8080/event/tool-approval \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -H "task-id: task-abc123" \ -H "session-id: session-abc" \ -H "user-id: user-1" \ -H "assistant-id: my-agent" \ -H "agentspace-id: default" \ -d '{"control_id": "tool-call-xyz", "result": true, "message": ""}'POST /event/tool-completion
Section titled “POST /event/tool-completion”Submit the result of a client-side tool execution.
Call this endpoint when the SSE stream emits a ClientToolExecution event and your application has executed the tool locally.
Request body
Section titled “Request body”{ "control_id": "tool-call-xyz", "tool_name": "my_tool", "result": "Tool output here", "status": "success"}| Field | Type | Description |
|---|---|---|
| control_id | string | The control_id from the ClientToolExecution event |
| tool_name | string | The tool name |
| result | any | The tool’s output |
| status | "success" \| "error" | Execution status |
| message | string | Error message (when status is "error") |
Required context headers
Section titled “Required context headers”Same as /event/tool-approval — task-id, session-id, user-id, assistant-id, agentspace-id.
Channel inference
Section titled “Channel inference”Agents can receive messages from external channels (WhatsApp, Slack, Email). Channel endpoints normalize provider-specific payloads into the standard inference pipeline.
Channels are declared in the agent’s channels array in its AssistantConfig. Each channel has a channel_id that maps to the {channel_id} path segment in the URL. The runtime looks up the channel by this ID and calls its adapter.
{ "assistant_id": "support-agent", "channels": [ { "provider_id": "whatsapp", "channel_id": "whatsapp-prod", "whatsapp_access_token": { "$secretRef": "WHATSAPP_ACCESS_TOKEN" }, "whatsapp_api_base_url": { "$secretRef": "WHATSAPP_API_BASE_URL" }, "whatsapp_assistant_phone_number_id": { "$secretRef": "WHATSAPP_PHONE_NUMBER_ID" }, "whatsapp_verify_token": { "$secretRef": "WHATSAPP_VERIFY_TOKEN" } } ]}See Channels for full configuration details for WhatsApp, Slack, and Email.
GET /event/infer/{assistant_id}/{channel_id}
Section titled “GET /event/infer/{assistant_id}/{channel_id}”Webhook validation endpoint. Used by channel providers (e.g., WhatsApp Business API) to verify the webhook URL during setup. Does not require authentication.
| Parameter | Type | Description |
|---|---|---|
| assistant_id | string | The agent to validate for |
| channel_id | string | Must match a channel_id in the agent’s channels array |
POST /event/infer/{assistant_id}/{channel_id}
Section titled “POST /event/infer/{assistant_id}/{channel_id}”Receive a message from a channel and dispatch it as an inference request. Does not require authentication — the channel provider’s own verification mechanism is used instead.
The request body format is provider-specific (WhatsApp JSON webhook, Slack URL-encoded form, etc.). The channel adapter normalizes it into an AssistantInference command. The response is a CommonAttributes object identical to POST /event/infer/{assistant_id}.
| Parameter | Type | Description |
|---|---|---|
| assistant_id | string | The agent to invoke |
| channel_id | string | Must match a channel_id in the agent’s channels array |
Error responses:
| Status | Condition |
|---|---|
| 400 | channel_id not found in agent config |
| 400 | Channel produced no commands from the payload |
Complete inference example
Section titled “Complete inference example”End-to-end: submit a request, stream the response, handle a tool approval.
# 1. Submit inferenceRESPONSE=$(curl -s -X POST "http://localhost:8080/event/infer/support-agent" \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "query": "Close ticket INC-1234 with resolution: resolved by restart", "user_id": "user-1", "session_id": "session-abc" }')
TASK_ID=$(echo $RESPONSE | jq -r '.task_id')
# 2. Stream results (in background or separate terminal)curl -N "http://localhost:8080/event/stream/$TASK_ID" \ -H "Authorization: Bearer $API_TOKEN" &
# 3. If HumanApprovalRequired event arrives, approve it:curl -X POST http://localhost:8080/event/tool-approval \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -H "task-id: $TASK_ID" \ -H "session-id: session-abc" \ -H "user-id: user-1" \ -H "assistant-id: support-agent" \ -H "agentspace-id: default" \ -d '{"control_id": "<control_id from event>", "result": true, "message": ""}'Related pages
Section titled “Related pages”- Context & Knowledge API — session context, blob uploads, knowledge base management
- Registry API — manage agents and agentspaces
- Configuration Reference — auth and environment variables
- Event Model — the typed CloudEvents that drive inference
- Observability — X-Request-ID, CommonAttributes, and correlation