Skip to content

Memory

Alquimia has two memory layers: short-term (what the LLM sees in the current prompt) and long-term (what gets summarized or erased when context grows too large).

Short-term memory controls which messages from the conversation history are included in the LLM prompt. Without a strategy, all messages are included — which can exceed the model’s context window.

Walks the conversation backwards, grouping messages by human turn (Interaction), and keeps messages until the token budget is exhausted.

{
"short_term_memory_strategy": [
{
"memory_strategy_id": "max_tokens",
"memory_max_tokens": 8000
}
]
}
ParameterTypeDefaultDescription
memory_strategy_id"max_tokens"requiredStrategy identifier
memory_max_tokensint10000Maximum tokens to include. -1 = include nothing
conditionsstring[][]Empathy-style conditions to filter which interactions to include

Long-term memory strategies trigger when the conversation grows beyond a threshold. They either summarize the history or erase it, keeping only the most recent interactions.

All long-term strategies share these trigger fields:

FieldDescription
input_tokens_thresholdTrigger when the LLM’s input token count exceeds this value
interaction_threshold_qtyTrigger when the number of human turns exceeds this value
interaction_threshold_tokensTrigger when total interaction tokens exceed this value
on_tools_success_thresholdTrigger when any of these tool names succeed
on_tools_error_thresholdTrigger when any of these tool names error
interaction_keepNumber of most-recent interactions to keep after the strategy runs

Uses Chain of Density (CoD) summarization: iteratively compresses the conversation into a progressively denser summary over cod_max_loops iterations.

{
"long_term_memory_strategy": [
{
"long_term_memory_id": "summarizer",
"interaction_threshold_qty": 20,
"interaction_threshold_tokens": 20000,
"cod_max_loops": 5,
"interaction_keep": 3,
"instructions": "Focus on action items and decisions made.",
"knowledge_base": {
"collection_id": "session-summaries",
"search_mode": "always"
}
}
]
}
ParameterTypeDefaultDescription
long_term_memory_id"summarizer"requiredStrategy identifier
interaction_threshold_qtyint20Trigger after this many interactions
interaction_threshold_tokensint20000Trigger after this many tokens
cod_max_loopsint5Number of CoD densification passes
interaction_keepint0Interactions to keep after summarization
instructionsstringnullCustom summarization instructions
knowledge_baseKnowledgeBasenullStore summaries in a vector store for RAG

Erases memory beyond a threshold. Simpler and faster than summarization — use when you don’t need to retain historical context.

{
"long_term_memory_strategy": [
{
"long_term_memory_id": "neuralyzer",
"interaction_threshold_qty": 30,
"interaction_keep": 5
}
]
}
ParameterTypeDefaultDescription
long_term_memory_id"neuralyzer"requiredStrategy identifier
interaction_threshold_qtyint-1Trigger after this many interactions (-1 = never)
interaction_threshold_tokensint-1Trigger after this many tokens (-1 = never)
interaction_keepint0Interactions to keep after erasure

You can combine short-term and long-term strategies:

{
"profile": {
"short_term_memory_strategy": [
{ "memory_strategy_id": "max_tokens", "memory_max_tokens": 8000 }
],
"long_term_memory_strategy": [
{
"long_term_memory_id": "summarizer",
"interaction_threshold_qty": 20,
"cod_max_loops": 3,
"interaction_keep": 2
}
]
}
}

Flow: When the long-term strategy triggers, it summarizes the conversation and keeps the last 2 interactions. The short-term strategy then limits what the LLM sees to 8,000 tokens from those kept interactions.

After each inference, the conversation is persisted according to persistence_strategy:

ValueBehavior
INCREMENTALAppend new messages to the existing session (default)
FLUSHReplace the session with the current conversation
EPHEMERALDo not persist — session is lost after inference