POST /context — get a prompt-ready memory block

POST /context returns a single ranked, citation-tagged text block sized to your token budget. It is designed to be prepended directly to an LLM system prompt without any further processing. The provider selects and ranks the most relevant memories for the given query, formats them as prose, and reports which memory IDs it drew from.

Request body

query

string

required

The user’s question or task description. The provider uses this to rank and select which memories to include in the context block.

user_id

string

required

The user whose memories to draw from. Context is always scoped to a single user.

scope

string

Restrict the context to memories whose scope starts with this prefix (e.g. coding to pull only coding-related memories).

token_budget

integer

default:"500"

Maximum number of tokens the provider should use when constructing the returned text. The provider will truncate or omit lower-ranked memories to stay within this budget.

Response — 200 OK

text

string

required

The ranked, citation-tagged text block ready for LLM prompt injection. Citations are embedded inline (e.g. [mem_abc123]) so the model can reference them.

citations

object[]

required

List of memory sources used to build the context block.

Show citation properties

memory_id

string

The ID of the memory referenced in text.

score

number

Relevance score for this memory relative to the query.

token_count

integer

The actual token count of the returned text. May be null if the provider does not track token counts.

Example

curl -s -X POST http://localhost:8080/context \
  -H "Content-Type: application/json" \
  -d '{
    "query": "set up a new node project",
    "user_id": "u1",
    "scope": "coding",
    "token_budget": 300
  }'

{
  "text": "The user prefers pnpm over npm for package management [mem_abc123]. They use dark mode in their editor [mem_def456].",
  "citations": [
    { "memory_id": "mem_abc123", "score": 0.94 },
    { "memory_id": "mem_def456", "score": 0.61 }
  ],
  "token_count": 38
}

Python SDK equivalent

ctx = mem.context(
    query="set up a new node project",
    user_id="u1",
    scope="coding",
    token_budget=300,
)
print(ctx.text)
print(ctx.citations)
print(ctx.token_count)

LLM integration

Prepend ctx.text to your system prompt to ground the model in the user’s memory:

import openai

ctx = mem.context(
    query=user_message,
    user_id=uid,
    scope="coding",
    token_budget=400,
)

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": f"Relevant user memory:\n{ctx.text}\n\nCitations: {ctx.citations}",
        },
        {"role": "user", "content": user_message},
    ],
)

Keep token_budget well below your model’s context window limit so there is room for the system prompt, conversation history, and the model’s response. A value between 300 and 600 tokens is a good starting point for most use cases.

Overview

Endpoints

POST /context — get a prompt-ready memory block

Request body

Response — 200 OK

Example

Python SDK equivalent

LLM integration

Overview

Endpoints

Documentation Index

​Request body

​Response — 200 OK

​Example

​Python SDK equivalent

​LLM integration

Request body

Response — 200 OK

Example

Python SDK equivalent

LLM integration