Analysis
Anyone who has used an AI assistant for real work knows the frustration. You tell it on Monday that your team writes everything in TypeScript and that you're building a payments app called PayFlow. By Tuesday it has forgotten both, and you're typing the same context all over again. Every chat starts from zero.
That gap is what Mem0 sets out to close. It's an open-source "memory layer" that sits between your agent and its conversations, quietly pulling out the facts worth keeping (preferences, project details, decisions) and handing them back when they're relevant later. The agent stops being a goldfish and starts behaving like a colleague who actually remembers what you told it.
For Australian teams weighing up where to put their AI effort, the practical appeal is twofold. The memory stays useful across sessions, so your staff stop re-explaining themselves, and because Mem0 can run on your own servers, the sensitive context never has to leave your infrastructure. The rest of this guide shows how to wire it up.
Analysis
Prerequisites
- Python 3.10 or later
pip install mem0ai- A vector store (Chroma, Qdrant, or PostgreSQL)
- An LLM API key for the memory extraction step
The version and install requirements above match Mem0's Python quickstart, and the supported vector stores are listed in its vector store overview, where Qdrant is the default.
Step-by-Step Framework
Step 1: Install and Configure
pip install mem0ai# mem0_config.py
from mem0 import Memory
m = Memory(
vector_store={
"provider": "qdrant",
"config": {
"host": "localhost",
"port": 6333,
"embedding_model_dims": 1536
}
},
llm={
"provider": "anthropic",
"config": {
"model": "claude-sonnet-4.6",
"api_key": "sk-ant-your-key"
}
},
embedder={
"provider": "openai",
"config": {
"model": "text-embedding-3-small",
"api_key": "sk-your-key"
}
}
)One thing to watch in the config above: the model string claude-sonnet-4.6 won't resolve against the API. Sonnet 4.6 is a real Anthropic model, but the canonical identifier is hyphenated, claude-sonnet-4-6, per the Claude API model IDs. Swap in the hyphenated form before you run this. The embedder side is correct as written: OpenAI's text-embedding-3-small returns 1536-dimensional vectors by default, which is why embedding_model_dims is set to 1536.
You don't have to use Anthropic, by the way. Mem0 works with any LLM through the same API, and OpenAI, Ollama, and local models are all configurable options for the extraction step (see the quickstart).
Step 2: Add Memories
# add_memories.py
# Mem0 automatically extracts facts from conversations
result = m.add(
messages=[
{"role": "user", "content": "I prefer TypeScript over Python for frontend work."},
{"role": "assistant", "content": "Noted! I'll use TypeScript for all frontend code I generate for you."}
],
user_id="alex-chen",
metadata={"category": "preferences", "topic": "programming"}
)
print(result)
# {'message': 'ok', 'memories': [
# {'id': 'mem_001', 'text': 'User prefers TypeScript over Python for frontend', 'event': 'ADD'}
# ]}
# More memories
m.add(
messages=[
{"role": "user", "content": "I'm working on a fintech app called PayFlow."},
{"role": "assistant", "content": "I'll remember that you're building PayFlow, a fintech app."}
],
user_id="alex-chen"
)You're not telling Mem0 what to store. You hand it the raw exchange and add() works out which facts are worth keeping, in this case the TypeScript preference and the PayFlow project. The exact shape of the printed return dict here is illustrative; treat it as a guide to the idea rather than a contract, since the current SDK may format its output slightly differently.
Step 3: Retrieve Relevant Memories
# retrieve.py
# Automatically retrieves relevant memories for a query
memories = m.search(
query="Write a React component for my app",
user_id="alex-chen"
)
for mem in memories:
print(f"[{mem['score']:.2f}] {mem['text']}")
# [0.92] User prefers TypeScript over Python for frontend
# [0.78] User is building PayFlow, a fintech appThis is where the embeddings earn their keep. search() compares the query against stored memories semantically and returns the closest matches with a relevance score on each, so only the memories that actually bear on the question surface. A request to "write a React component" pulls the frontend preference to the top, not some unrelated fact buried in last month's chat.
Step 4: Integrate with an Agent
# agent_with_memory.py
from mem0 import Memory
class MemoryAugmentedAgent:
def __init__(self, llm_client):
self.llm = llm_client
self.memory = Memory()
async def chat(self, user_id: str, message: str) -> str:
# 1. Retrieve relevant memories
relevant_memories = self.memory.search(
query=message,
user_id=user_id
)
# 2. Build context from memories
memory_context = "\n".join([
f"- {m['text']}" for m in relevant_memories[:5]
])
# 3. Generate response with memory context
system_prompt = f"""You are a helpful assistant. Here are relevant facts about the user:
{memory_context}
Use these facts to personalise your response."""
response = await self.llm.complete(
system=system_prompt,
messages=[{"role": "user", "content": message}]
)
# 4. Store the interaction
self.memory.add(
messages=[
{"role": "user", "content": message},
{"role": "assistant", "content": response}
],
user_id=user_id
)
return responseThe loop is the whole pattern in four steps: search before you answer, fold the top few memories into the system prompt, generate the reply, then store the new exchange so the next turn is a little smarter. Capping it at the top five (relevant_memories[:5]) keeps the prompt tight; you don't want to dump a user's entire history into every call.
Step 5: Memory Management
# memory_management.py
# Update a memory
m.update(memory_id="mem_001", data="User prefers TypeScript for frontend and Rust for backend")
# Delete a memory
m.delete(memory_id="mem_001")
# Get all memories for a user
all_memories = m.get_all(user_id="alex-chen")
print(f"Total memories: {len(all_memories)}")
# History of changes
history = m.history(memory_id="mem_001")
for event in history:
print(f"{event['created_at']}: {event['event']} - {event['text']}")Memories aren't write-once. People change their minds, projects wrap up, and stale facts cause more harm than no facts at all. The update, delete, get_all, and history methods (all part of the documented Memory API) give you the controls to keep the store honest. The history call in particular is handy for auditing: it shows how a given memory has changed over time.
Step 6: Self-Hosted Deployment
# docker-compose.yml
version: '3.8'
services:
mem0:
image: mem0/mem0:latest
ports:
- "8000:8000"
environment:
- VECTOR_STORE_PROVIDER=qdrant
- VECTOR_STORE_HOST=qdrant
- LLM_PROVIDER=anthropic
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- EMBEDDER_PROVIDER=openai
- OPENAI_API_KEY=${OPENAI_API_KEY}
depends_on:
- qdrant
- postgres
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
postgres:
image: postgres:16
environment:
POSTGRES_DB: mem0
POSTGRES_PASSWORD: password
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
qdrant_data:
postgres_data:This is the part that matters most if you're handling client data under Australian privacy obligations: Mem0 ships an open-source FastAPI server you can run on your own infrastructure via Docker Compose, so the memory store never leaves your control (see the self-hosted setup).
Two caveats on the compose file above. The image tag mem0/mem0:latest is illustrative; the official self-host image is published as `mem0/mem0-api-server` on Docker Hub, with the server listening on internal port 8000 and the official compose mapping it to host port 8888. And the Qdrant-plus-Postgres combination shown here is a valid setup, but it isn't Mem0's documented default; the default self-host stack pairs Postgres with pgvector and Neo4j. Adapt the file to the official image and your chosen stores before deploying.
Do/Don't
| Do | Don't |
|---|---|
| Store user preferences and project context | Store sensitive credentials or PII |
| Use memory to personalise responses | Rely solely on conversation history |
| Update memories when user preferences change | Let stale memories override current context |
| Self-host for data privacy | Send user data to managed memory without consent |
| Periodically clean irrelevant memories | Keep all memories forever |
A note on the performance figure
The "sub-50ms retrieval for 10,000 memories" number quoted earlier should be treated as unconfirmed. We couldn't find a published source backing it, and it runs against Mem0's own LOCOMO benchmark paper, which reports search latency closer to 148ms at the median and around 200ms at p95. Fast enough for interactive use, but plan against the published figures rather than the rounder claim.
Conclusion
Mem0 turns a stateless agent into one that remembers who it's talking to. The embedding-based retrieval keeps only relevant memories in front of the model, and the automatic fact extraction spares you from hand-curating what's worth keeping. If privacy is a concern, self-host it; if you're already running an agent framework, the API drops in without much ceremony. Either way, the payoff is continuity, agents that build on past conversations instead of starting cold every time.





