Inject memory into LLM prompts with context blocks

The context() method bridges your stored memories and your LLM calls. Instead of retrieving a raw list of search results and assembling them yourself, you call mem.context() with a query and a token budget, and receive back a single text blob that is already ranked by relevance, trimmed to fit your budget, and annotated with citation metadata. Prepend it to your system prompt and your model has access to the most relevant facts about the user without any extra formatting work on your end.

What context() returns

mem.context() returns a ContextBlock with three fields:

text — ranked text ready for prompt injection
citations — list of {memory_id, score} objects for attribution
token_count — actual tokens used (always ≤ token_budget)

from openmem import Memory

mem = Memory(provider="postgres", url="postgresql://localhost/omp")

ctx = mem.context(
    query="set up a new node project",
    user_id="kek",
    scope="coding/*",
    token_budget=500,
)

# ctx.text: ranked text ready for prompt injection
# ctx.citations: list of {memory_id, score} for attribution
# ctx.token_count: actual tokens used

OpenAI integration

Pass ctx.text directly into the system message. Include ctx.citations if you want to surface sources to the user or log attribution data.

import openai
from openmem import Memory

mem = Memory(provider="postgres", url="postgresql://localhost/omp")

def chat(user_id: str, user_message: str) -> str:
    ctx = mem.context(
        query=user_message,
        user_id=user_id,
        scope="coding/*",
        token_budget=400,
    )

    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"Relevant user memory:\n{ctx.text}\n\nCitations: {ctx.citations}"
            },
            {"role": "user", "content": user_message},
        ],
    )
    return response.choices[0].message.content

Storing memories from conversations

Call mem.add() after each turn to build up the memory store over time. Extract discrete facts from the conversation rather than storing full message transcripts.

def chat_with_memory(user_id: str, user_message: str, assistant_reply: str) -> None:
    # Inject existing memory into the next prompt (shown above)
    # After the turn, store any new facts
    mem.add(
        content=f"User asked about: {user_message}",
        user_id=user_id,
        scope="conversation/history",
    )
    # Store extracted preferences, facts, or decisions as separate records
    # so future searches can surface them precisely

Storing one fact per add() call produces better search results than storing full conversation turns. Smaller, focused records score higher on semantic similarity and fit into tight token budgets more efficiently.

Choosing a token budget

A good rule of thumb is to allocate 10–20% of your model’s context window to memory. For gpt-4o with a 128 k context window and a 2 k system prompt, 400–800 tokens for memory is a reasonable starting point. Increase the budget for tasks that are heavily preference-driven (coding assistants, writing tools); decrease it for tasks that rely more on the current message than on history (single-turn Q&A, classification).

Checking provider capabilities

Not all providers support rich context ranking. Before relying on scored output, verify that the provider’s features indicate support.

caps = mem.capabilities()
if caps.features.vector_search:
    ctx = mem.context(query=user_message, user_id=user_id, token_budget=400)
else:
    # Fall back to a simple keyword search and manual formatting
    results = mem.search(query=user_message, user_id=user_id, limit=5)
    ctx_text = "\n".join(r.memory.content for r in results)

mem.capabilities() result is cached for the lifetime of the Memory instance, so calling it on every turn adds no latency after the first call.

Get Started

Core Concepts

Guides

SDK Reference

Inject memory into LLM prompts with context blocks

What context() returns

OpenAI integration

Storing memories from conversations

Choosing a token budget

Checking provider capabilities

Get Started

Core Concepts

Guides

SDK Reference

Documentation Index

​What context() returns

​OpenAI integration

​Storing memories from conversations

​Choosing a token budget

​Checking provider capabilities

What context() returns

OpenAI integration

Storing memories from conversations

Choosing a token budget

Checking provider capabilities