Skip to content

Inference Endpoints

The inference API is the primary interface for running agents. All endpoints are served by alquimia-runtime on port 8080.

All endpoints (except health probes) require a bearer token in the Authorization header:

Authorization: Bearer <token>

The token format depends on AUTH_PROVIDER:

| AUTH_PROVIDER | Token format | Configuration | |---|---|---| | api_token (default) | Static string matching API_TOKEN env var | Set API_TOKEN in .env | | jwt | Signed JWT (HMAC HS256 by default) | Set JWT_SECRET, JWT_ALGORITHM | | keycloak | Keycloak OIDC access token | Set KEYCLOAK_* env vars |

See Configuration Reference for full auth settings.


Health endpoints do not require authentication. Use them for Kubernetes liveness and readiness probes.

Liveness probe — is the process alive?

Returns 200 OK if the process is running. Never checks external dependencies. Kubernetes restarts the pod if this fails.

Terminal window
curl http://localhost:8080/health/liveness
# "OK"

Readiness probe — can the service handle traffic?

Returns 200 OK only if all critical dependencies are reachable: PostgreSQL (verified with SELECT 1) and Redis (verified with PING). Kubernetes stops routing traffic if this fails — the pod stays alive.

Terminal window
curl http://localhost:8080/health/readiness
# "OK"

Error responses:

| Status | Condition | |---|---| | 500 | PostgreSQL or Redis unreachable |


Every request to the runtime carries two sets of correlation identifiers.

The runtime generates or propagates an X-Request-ID header on every request:

| Direction | Behaviour | |---|---| | Inbound | If the client provides X-Request-ID, it is used as-is. If omitted, the runtime generates a UUID v4. | | Outbound | The ID is always echoed back in the response X-Request-ID header. |

X-Request-ID is per-HTTP-request. Use it to correlate logs for a single HTTP call. Use task_id to correlate everything across an entire inference run (multiple HTTP calls + CloudEvent hops).

The inference endpoints accept the following headers, which are propagated as observability dimensions through the entire execution stack (metrics, traces, and logs):

| Header | Description | |---|---| | assistant-id | The agent being invoked | | agentspace-id | The agentspace containing the agent | | user-id | The end user | | session-id | The conversation session | | task-id | The inference task | | channel-id | The channel (optional) |

These headers are extracted by telemetry_metadata_middleware and attached to every OTEL metric, trace span, and log record for the duration of the request. See Observability for details.


Alquimia uses an async, event-driven inference model:

  1. POST /event/infer/{assistant_id} — signs and publishes the request as a CloudEvent to Kafka, then returns a task_id immediately.
  2. A worker instance consumes the event from Kafka, verifies its signature, and executes the agent in-process.
  3. GET /event/stream/{task_id} — streams worklog records as SSE until the final AssistantInferenceResponse arrives.

This decoupling means your client never blocks waiting for the LLM. Long-running agents with multiple tool calls stream progress in real time.


Submit an inference request. Returns immediately with a task_id.

| Parameter | Type | Description | |---|---|---| | assistant_id | string | The agent to invoke. Must exist in the target agentspace. |

| Parameter | Type | Default | Description | |---|---|---|---| | agentspace_id | string | "default" | The agentspace containing the agent |

{
"query": "What is the status of ticket INC-1234?",
"user_id": "user-1",
"session_id": "session-abc",
"task_id": "task-xyz",
"knowledge_base": [],
"extra_instructions": {},
"evaluation_strategy": null
}

| Field | Type | Required | Description | |---|---|---|---| | query | string | Yes | The user’s message | | user_id | string | Yes | User identifier for session scoping | | session_id | string | Yes | Session identifier for conversation continuity | | task_id | string | No | Custom task ID. Auto-generated (UUID) if omitted. | | knowledge_base | KnowledgeBase[] | No | Additional knowledge bases to inject for this request only | | extra_instructions | dict[str, str] | No | Named prompt clauses to inject into the system prompt | | evaluation_strategy | EvaluationStrategy | No | Override the agent’s configured evaluation strategy for this request |

{
"task_id": "task-abc123",
"session_id": "session-abc",
"user_id": "user-1",
"assistant_id": "my-agent",
"agentspace_id": "default"
}

| Header | Description | |---|---| | X-Request-ID | The request correlation ID (generated or propagated) |

| Status | Condition | |---|---| | 401 | Missing or invalid Authorization header | | 404 | Agent not found in the agentspace | | 500 | Connection error to Redis or CloudEvent sink |

Terminal window
curl -X POST "http://localhost:8080/event/infer/my-agent?agentspace_id=default" \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "What is the status of ticket INC-1234?",
"user_id": "user-1",
"session_id": "session-abc"
}'

Stream inference progress as Server-Sent Events (SSE). Emits worklog records until AssistantInferenceResponse is received.

| Parameter | Type | Description | |---|---|---| | task_id | string | The task ID returned by POST /event/infer |

A stream of SSE events. Each event is a JSON-encoded WorklogRecord:

data: {"event_class": "ResponseInferenceResponse", "result": {...}, "status": "success"}
data: {"event_class": "ToolExecutionResponse", "tool_name": "search_tickets", "result": "INC-1234: Open", "status": "success"}
data: {"event_class": "AssistantInferenceResponse", "result": {"content": "Ticket INC-1234 is currently Open."}, "status": "success"}

The stream ends when event_class == "AssistantInferenceResponse".

Special event classes to watch for:

| event_class | Action required | |---|---| | HumanApprovalRequired | Call POST /event/tool-approval to approve or reject | | ClientToolExecution | Execute the tool client-side and call POST /event/tool-completion | | AssistantInferenceResponse | Final answer — close the stream |

| Status | Condition | |---|---| | 401 | Missing or invalid Authorization header | | 404 | No ongoing task with the given task_id |

Terminal window
curl -N "http://localhost:8080/event/stream/task-abc123" \
-H "Authorization: Bearer $API_TOKEN"

Approve or reject a tool execution that requires human approval.

Call this endpoint when the SSE stream emits a HumanApprovalRequired event. The agent blocks until a response is received.

{
"control_id": "tool-call-xyz",
"result": true,
"message": ""
}

| Field | Type | Description | |---|---|---| | control_id | string | The control_id from the HumanApprovalRequired event | | result | bool | true to approve, false to reject | | message | string | Rejection reason (required when result is false) |

These headers identify the active inference session. Copy them from the original POST /event/infer response:

| Header | Description | |---|---| | task-id | The task ID from the inference request | | session-id | The session ID | | user-id | The user ID | | assistant-id | The assistant ID | | agentspace-id | The agentspace ID |

Terminal window
curl -X POST http://localhost:8080/event/tool-approval \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-H "task-id: task-abc123" \
-H "session-id: session-abc" \
-H "user-id: user-1" \
-H "assistant-id: my-agent" \
-H "agentspace-id: default" \
-d '{"control_id": "tool-call-xyz", "result": true, "message": ""}'

Submit the result of a client-side tool execution.

Call this endpoint when the SSE stream emits a ClientToolExecution event and your application has executed the tool locally.

{
"control_id": "tool-call-xyz",
"tool_name": "my_tool",
"result": "Tool output here",
"status": "success"
}

| Field | Type | Description | |---|---|---| | control_id | string | The control_id from the ClientToolExecution event | | tool_name | string | The tool name | | result | any | The tool’s output | | status | "success" \| "error" | Execution status | | message | string | Error message (when status is "error") |

Same as /event/tool-approvaltask-id, session-id, user-id, assistant-id, agentspace-id.


Agents can receive messages from external channels (WhatsApp, Slack, Email). Channel endpoints normalize provider-specific payloads into the standard inference pipeline.

Channels are declared in the agent’s channels array in its AssistantConfig. Each channel has a channel_id that maps to the {channel_id} path segment in the URL. The runtime looks up the channel by this ID and calls its adapter.

Agent config with a WhatsApp channel
{
"assistant_id": "support-agent",
"channels": [
{
"provider_id": "whatsapp",
"channel_id": "whatsapp-prod",
"whatsapp_access_token": { "$secretRef": "WHATSAPP_ACCESS_TOKEN" },
"whatsapp_api_base_url": { "$secretRef": "WHATSAPP_API_BASE_URL" },
"whatsapp_assistant_phone_number_id": { "$secretRef": "WHATSAPP_PHONE_NUMBER_ID" },
"whatsapp_verify_token": { "$secretRef": "WHATSAPP_VERIFY_TOKEN" }
}
]
}

See Channels for full configuration details for WhatsApp, Slack, and Email.

GET /event/infer/{assistant_id}/{channel_id}

Section titled “GET /event/infer/{assistant_id}/{channel_id}”

Webhook validation endpoint. Used by channel providers (e.g., WhatsApp Business API) to verify the webhook URL during setup. Does not require authentication.

| Parameter | Type | Description | |---|---|---| | assistant_id | string | The agent to validate for | | channel_id | string | Must match a channel_id in the agent’s channels array |

POST /event/infer/{assistant_id}/{channel_id}

Section titled “POST /event/infer/{assistant_id}/{channel_id}”

Receive a message from a channel and dispatch it as an inference request. Does not require authentication — the channel provider’s own verification mechanism is used instead.

The request body format is provider-specific (WhatsApp JSON webhook, Slack URL-encoded form, etc.). The channel adapter normalizes it into an AssistantInference command. The response is a CommonAttributes object identical to POST /event/infer/{assistant_id}.

| Parameter | Type | Description | |---|---|---| | assistant_id | string | The agent to invoke | | channel_id | string | Must match a channel_id in the agent’s channels array |

Error responses:

| Status | Condition | |---|---| | 400 | channel_id not found in agent config | | 400 | Channel produced no commands from the payload |


End-to-end: submit a request, stream the response, handle a tool approval.

Terminal window
# 1. Submit inference
RESPONSE=$(curl -s -X POST "http://localhost:8080/event/infer/support-agent" \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "Close ticket INC-1234 with resolution: resolved by restart",
"user_id": "user-1",
"session_id": "session-abc"
}')
TASK_ID=$(echo $RESPONSE | jq -r '.task_id')
# 2. Stream results (in background or separate terminal)
curl -N "http://localhost:8080/event/stream/$TASK_ID" \
-H "Authorization: Bearer $API_TOKEN" &
# 3. If HumanApprovalRequired event arrives, approve it:
curl -X POST http://localhost:8080/event/tool-approval \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-H "task-id: $TASK_ID" \
-H "session-id: session-abc" \
-H "user-id: user-1" \
-H "assistant-id: support-agent" \
-H "agentspace-id: default" \
-d '{"control_id": "<control_id from event>", "result": true, "message": ""}'