Kubernetes & OpenShift

alquimia-runtime ships with production-ready Kubernetes manifests in k8s/ structured as Kustomize overlays. This guide covers deploying to vanilla Kubernetes and OpenShift / ROSA.

For local development with Docker Compose, see Docker Compose.

Prerequisites

| Tool | Version | | ---------------- | ------------------------------------- | | kubectl / oc | ≥ 1.27 / ≥ 4.13 | | kustomize | ≥ 5.0 (bundled in kubectl apply -k) |

The manifests target a namespace called alquimia-runtime. Create it first:

Kubernetes
OpenShift

bash kubectl create namespace alquimia-runtime

bash oc new-project alquimia-runtime

Infrastructure dependencies

The runtime does not provision its own backing services. These must already exist and be reachable from the cluster before applying the manifests:

| Service | Used by | Notes | | ----------------------- | --------------------------------------- | --------------------------------- | | Apache Kafka 3.x | Event bus | Master publishes, workers consume | | PostgreSQL 16 | Knowledge base, worklog, webhooks | | | Redis 7 | Conversation context, distributed locks | | | Qdrant | Vector store (master only) | | | HashiCorp Vault | Agent secret resolution | | | S3-compatible store | Blob storage (MinIO, AWS S3, Ceph) | | | OCI registry | Agent package publish/pull | |

Repository layout

k8s/
├── base/
│   ├── kustomization.yaml        — base resource list
│   ├── master-deployment.yaml    — master Deployment (API + s3-sync sidecar)
│   ├── worker-deployment.yaml    — worker Deployment (Kafka consumer + s3-sync sidecar)
│   ├── service.yaml              — ClusterIP + headless Services
└── overlays/
    ├── dev/
    │   ├── kustomization.yaml    — image tag, resource patches, generators
    │   ├── .secrets/             — gitignored plaintext files (never commit these)
    │   │   ├── config        — app ConfigMap (env vars)
    │   │   ├── postgres      — POSTGRES_* credentials
    │   │   ├── redis         — REDIS_URL
    │   │   ├── s3            — BLOB_S3_* credentials
    │   │   ├── vault         — VAULT_TOKEN
    │   │   ├── auth          — API_TOKEN
    │   │   ├── kafka-signing — KAFKA_SIGNING_KEY
    │   │   ├── qdrant        — QDRANT_URL, QDRANT_API_KEY
    │   │   └── .registryconfigjson  — OCI pull credentials
    │   └── patches/
    │       ├── master-resource-limits.yaml
    │       └── worker-resource-limits.yaml

Prepare secret files

The overlay reads plaintext key=value files from .secrets/. These files are gitignored and must be created manually.

`.secrets/config` — ConfigMap

Non-secret runtime environment variables:

ALQUIMIA_REGISTRY_SECRET_RESOLVER=vault
ALQUIMIA_OCI_REGISTRY_DEFAULT=<registry-host>:<port>
ORAS_INSECURE=true
ORAS_PLAIN_HTTP=true
VAULT_ADDR=http://<vault-host>:<port>
VAULT_MOUNT_POINT=secret
ALQUIMIA_REGISTRY_DIR=/var/lib/alquimia/registry
KAFKA_BOOTSTRAP_SERVERS=<broker-host>:<port>
DEBUG=false
IS_ALLOWED_CREDENTIALS=true
OTEL_ALQUIMIA_SERVICE_NAME=alquimia-runtime

`.secrets/redis`

REDIS_URL=redis://<host>:<port>/0

`.secrets/postgres`

POSTGRES_HOST=<host>
POSTGRES_PORT=5432
POSTGRES_DB=alquimia
POSTGRES_USERNAME=<username>
POSTGRES_PASSWORD=<password>
POSTGRES_SCHEMA=postgresql

`.secrets/s3`

BLOB_S3_ENDPOINT_URL=http://<minio-or-s3-endpoint>
BLOB_S3_ACCESS_KEY=<access-key>
BLOB_S3_SECRET_KEY=<secret-key>
BLOB_S3_BUCKET_NAME=alquimia
BLOB_S3_REGION_NAME=us-east-1
BLOB_S3_SECURE=true
BLOB_S3_PROVIDER=Minio # See https://rclone.org/s3/ for supported providers

`.secrets/auth`

API_TOKEN=<strong-random-token>

Generate: python -c "import secrets; print(secrets.token_urlsafe(32))"

`.secrets/kafka-signing`

KAFKA_SIGNING_KEY=<64-char-hex>

This key must be identical on the master and all workers — it authenticates every Kafka event:

python -c "import secrets; print(secrets.token_hex(32))"

`.secrets/vault`

VAULT_TOKEN=<scoped-vault-token>

Create a scoped token:

vault token create -policy=alquimia-runtime -ttl=720h -renewable=true

`.secrets/qdrant`

QDRANT_URL=http://<qdrant-host>:<port>
QDRANT_API_KEY=

Leave QDRANT_API_KEY empty for unauthenticated Qdrant.

`.secrets/.registryconfigjson`

Standard Docker config.json for your OCI registry:

{
  "auths": {
    "<registry-host>": {
      "username": "<username>",
      "password": "<password>",
      "auth": "<base64(username:password)>"
    }
  }
}

Generate the auth value: echo -n "username:password" | base64

Verify secret files are in place

ls k8s/overlays/dev/.secrets/
# auth  config  kafka-signing  postgres  qdrant  redis  s3  vault  .registryconfigjson

Preview the generated manifests (dry run)
Terminal window
```
kubectl kustomize k8s/overlays/dev/
```

Apply to the cluster

kubectl apply -k k8s/overlays/dev/ -n alquimia-runtime

Watch pods come up

kubectl get pods -n alquimia-runtime -w

Expected steady state — each pod shows 2/2 because both the alquimia-runtime container and the s3-sync sidecar must be ready:

alquimia-runtime-master-<hash>   2/2   Running   0   60s
alquimia-runtime-worker-<hash>   2/2   Running   0   60s
alquimia-runtime-worker-<hash>   2/2   Running   0   60s

Verify health

kubectl exec -n alquimia-runtime \
  $(kubectl get pod -l role=master -n alquimia-runtime -o jsonpath='{.items[0].metadata.name}') \
  -- curl -sf http://localhost:8080/health/readiness
# → "OK"

Verify secret files are in place (same as Kubernetes)

Apply with oc

oc apply -k k8s/overlays/dev/ -n alquimia-runtime

Watch pods come up
Terminal window
```
oc get pods -n alquimia-runtime -w
```

Create a Route for external access

# HTTP
oc expose svc/alquimia-runtime -n alquimia-runtime

# TLS edge termination (recommended)
oc create route edge alquimia-runtime-tls \
  --service=alquimia-runtime \
  --hostname=api.alquimia.<your-domain> \
  -n alquimia-runtime

Expose the master externally

Kubernetes — Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: alquimia-runtime
  namespace: alquimia-runtime
  annotations:
    # SSE streams need long-lived connections
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
spec:
  rules:
    - host: api.alquimia.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: alquimia-runtime
                port:
                  number: 80
  tls:
    - hosts:
        - api.alquimia.example.com
      secretName: alquimia-tls

Key resource details

| Resource | Value | | ---------------------- | ----------------------------------------------------- | | Master replicas | 1 — serialised registry writes via TinyDB + Redlock | | Worker replicas | 2 (dev) — scale to match Kafka partition count | | Container port | 8080 | | Readiness probe | GET /health/readiness | | Liveness probe | GET /health/liveness | | Startup probe | GET /health/liveness (90 retries × 10 s) | | CPU request / limit | 500m / 1 (dev overlay) | | Memory request / limit | 1Gi / 2Gi | | Worker registry | emptyDir synced from S3 via s3-sync sidecar |

Scaling workers

Workers are stateless Kafka consumers. All replicas share the same consumer group so Kafka distributes partitions across them automatically.

kubectl scale deployment alquimia-runtime-worker --replicas=4 -n alquimia-runtime

To make the change permanent, add replicas: 4 to k8s/overlays/dev/patches/worker-resource-limits.yaml. The upper bound for throughput is one replica per Kafka partition — additional replicas beyond that will be idle consumers.

Upgrading

Build and push the new image

docker build -t alquimiaai/runtime:<new-tag> runtime/
docker push alquimiaai/runtime:<new-tag>

Update newTag in the overlay and re-apply

kubectl apply -k k8s/overlays/dev/ -n alquimia-runtime

Run database migrations once the master pod is healthy

kubectl exec -n alquimia-runtime \
  $(kubectl get pod -l role=master -n alquimia-runtime -o jsonpath='{.items[0].metadata.name}') \
  -c alquimia-runtime \
  -- uv run alembic upgrade head

Tear down

kubectl delete -k k8s/overlays/dev/ -n alquimia-runtime --ignore-not-found=true

# Delete the namespace entirely (removes PVCs too)
kubectl delete namespace alquimia-runtime

Troubleshooting

Workers log missing or invalid alquimiasignature and drop all events

KAFKA_SIGNING_KEY differs between master and workers. Regenerate and redeploy:

NEW_KEY=$(python -c "import secrets; print(secrets.token_hex(32))")
# Update .secrets/kafka-signing with the new key
kubectl apply -k k8s/overlays/dev/ -n alquimia-runtime
kubectl rollout restart deployment/alquimia-runtime-master deployment/alquimia-runtime-worker -n alquimia-runtime

GET /health/readiness returns 500

PostgreSQL or Redis is unreachable. Check POSTGRES_HOST in alquimia-postgres Secret and REDIS_URL in alquimia-redis Secret point to the correct in-cluster DNS names, and that network policies allow traffic from alquimia-runtime to those services.

Vault token expired — registry secret resolution fails

vault token renew <token>
# or create a new one:
NEW_TOKEN=$(vault token create -policy=alquimia-runtime -ttl=720h -field=token)
kubectl create secret generic alquimia-vault \
  --from-literal=VAULT_TOKEN=$NEW_TOKEN \
  -n alquimia-runtime \
  --dry-run=client -o yaml | kubectl apply -f -
kubectl rollout restart deployment/alquimia-runtime-master -n alquimia-runtime

Docker Compose — local development setup
Configuration Reference — all environment variables
Kafka & Event Bus — how the Kafka event pipeline works
Inference Endpoints — the public HTTP API

Source

Alquimia-ai/alquimia-runtime — k8s/ manifests and Kustomize overlays