Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.openmem.blog/llms.txt

Use this file to discover all available pages before exploring further.

POST /context returns a single ranked, citation-tagged text block sized to your token budget. It is designed to be prepended directly to an LLM system prompt without any further processing. The provider selects and ranks the most relevant memories for the given query, formats them as prose, and reports which memory IDs it drew from.

Request body

query
string
required
The user’s question or task description. The provider uses this to rank and select which memories to include in the context block.
user_id
string
required
The user whose memories to draw from. Context is always scoped to a single user.
scope
string
Restrict the context to memories whose scope starts with this prefix (e.g. coding to pull only coding-related memories).
token_budget
integer
default:"500"
Maximum number of tokens the provider should use when constructing the returned text. The provider will truncate or omit lower-ranked memories to stay within this budget.

Response — 200 OK

text
string
required
The ranked, citation-tagged text block ready for LLM prompt injection. Citations are embedded inline (e.g. [mem_abc123]) so the model can reference them.
citations
object[]
required
List of memory sources used to build the context block.
token_count
integer
The actual token count of the returned text. May be null if the provider does not track token counts.

Example

curl -s -X POST http://localhost:8080/context \
  -H "Content-Type: application/json" \
  -d '{
    "query": "set up a new node project",
    "user_id": "u1",
    "scope": "coding",
    "token_budget": 300
  }'
{
  "text": "The user prefers pnpm over npm for package management [mem_abc123]. They use dark mode in their editor [mem_def456].",
  "citations": [
    { "memory_id": "mem_abc123", "score": 0.94 },
    { "memory_id": "mem_def456", "score": 0.61 }
  ],
  "token_count": 38
}

Python SDK equivalent

ctx = mem.context(
    query="set up a new node project",
    user_id="u1",
    scope="coding",
    token_budget=300,
)
print(ctx.text)
print(ctx.citations)
print(ctx.token_count)

LLM integration

Prepend ctx.text to your system prompt to ground the model in the user’s memory:
import openai

ctx = mem.context(
    query=user_message,
    user_id=uid,
    scope="coding",
    token_budget=400,
)

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": f"Relevant user memory:\n{ctx.text}\n\nCitations: {ctx.citations}",
        },
        {"role": "user", "content": user_message},
    ],
)
Keep token_budget well below your model’s context window limit so there is room for the system prompt, conversation history, and the model’s response. A value between 300 and 600 tokens is a good starting point for most use cases.