Back to news

How-to Guide

How to implement agent memory with Mem0.

Add persistent, intelligent memory to your AI agents using Mem0, the memory layer that remembers user preferences, facts, and conversation history across sessions.

AI Kick Start editorial image for How to implement agent memory with Mem0.

Decision

Start narrow

Use the article to decide the smallest useful workflow worth testing before expanding the system.

Risk to watch

Hype drift

Avoid turning a practical adoption step into a broad transformation promise nobody can verify.

Proof to collect

Business signal

Write down the owner, data boundary, review point, and measurable outcome before the first build.

TL;DR

TL;DR: [Mem0](https://mem0.ai/) is an open-source memory layer that gives AI agents persistent memory across conversations. Unlike simple chat history, Mem0 extracts facts, preferences, and insights from interactions, making them available in future sessions. This guide integrates Mem0 with Claude Code, Hermes Agent, and custom agents.

Key takeaways

  • Memory types: User preferences, facts, conversation history, entities
  • Architecture: Embedding-based retrieval with relevance scoring
  • Integration: Works with any LLM via simple API
  • Privacy: Self-hostable; data stays in your infrastructure
  • Performance: Reportedly sub-50ms retrieval for 10K memories (unconfirmed; see note below)

Analysis

Anyone who has used an AI assistant for real work knows the frustration. You tell it on Monday that your team writes everything in TypeScript and that you're building a payments app called PayFlow. By Tuesday it has forgotten both, and you're typing the same context all over again. Every chat starts from zero.

That gap is what Mem0 sets out to close. It's an open-source "memory layer" that sits between your agent and its conversations, quietly pulling out the facts worth keeping (preferences, project details, decisions) and handing them back when they're relevant later. The agent stops being a goldfish and starts behaving like a colleague who actually remembers what you told it.

For Australian teams weighing up where to put their AI effort, the practical appeal is twofold. The memory stays useful across sessions, so your staff stop re-explaining themselves, and because Mem0 can run on your own servers, the sensitive context never has to leave your infrastructure. The rest of this guide shows how to wire it up.

Analysis

Prerequisites

  • Python 3.10 or later
  • pip install mem0ai
  • A vector store (Chroma, Qdrant, or PostgreSQL)
  • An LLM API key for the memory extraction step

The version and install requirements above match Mem0's Python quickstart, and the supported vector stores are listed in its vector store overview, where Qdrant is the default.

Step-by-Step Framework

Step 1: Install and Configure

pip install mem0ai
# mem0_config.py
from mem0 import Memory

m = Memory(
    vector_store={
        "provider": "qdrant",
        "config": {
            "host": "localhost",
            "port": 6333,
            "embedding_model_dims": 1536
        }
    },
    llm={
        "provider": "anthropic",
        "config": {
            "model": "claude-sonnet-4.6",
            "api_key": "sk-ant-your-key"
        }
    },
    embedder={
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small",
            "api_key": "sk-your-key"
        }
    }
)

One thing to watch in the config above: the model string claude-sonnet-4.6 won't resolve against the API. Sonnet 4.6 is a real Anthropic model, but the canonical identifier is hyphenated, claude-sonnet-4-6, per the Claude API model IDs. Swap in the hyphenated form before you run this. The embedder side is correct as written: OpenAI's text-embedding-3-small returns 1536-dimensional vectors by default, which is why embedding_model_dims is set to 1536.

You don't have to use Anthropic, by the way. Mem0 works with any LLM through the same API, and OpenAI, Ollama, and local models are all configurable options for the extraction step (see the quickstart).

Step 2: Add Memories

# add_memories.py
# Mem0 automatically extracts facts from conversations

result = m.add(
    messages=[
        {"role": "user", "content": "I prefer TypeScript over Python for frontend work."},
        {"role": "assistant", "content": "Noted! I'll use TypeScript for all frontend code I generate for you."}
    ],
    user_id="alex-chen",
    metadata={"category": "preferences", "topic": "programming"}
)

print(result)
# {'message': 'ok', 'memories': [
#   {'id': 'mem_001', 'text': 'User prefers TypeScript over Python for frontend', 'event': 'ADD'}
# ]}

# More memories
m.add(
    messages=[
        {"role": "user", "content": "I'm working on a fintech app called PayFlow."},
        {"role": "assistant", "content": "I'll remember that you're building PayFlow, a fintech app."}
    ],
    user_id="alex-chen"
)

You're not telling Mem0 what to store. You hand it the raw exchange and add() works out which facts are worth keeping, in this case the TypeScript preference and the PayFlow project. The exact shape of the printed return dict here is illustrative; treat it as a guide to the idea rather than a contract, since the current SDK may format its output slightly differently.

Step 3: Retrieve Relevant Memories

# retrieve.py
# Automatically retrieves relevant memories for a query

memories = m.search(
    query="Write a React component for my app",
    user_id="alex-chen"
)

for mem in memories:
    print(f"[{mem['score']:.2f}] {mem['text']}")

# [0.92] User prefers TypeScript over Python for frontend
# [0.78] User is building PayFlow, a fintech app

This is where the embeddings earn their keep. search() compares the query against stored memories semantically and returns the closest matches with a relevance score on each, so only the memories that actually bear on the question surface. A request to "write a React component" pulls the frontend preference to the top, not some unrelated fact buried in last month's chat.

Step 4: Integrate with an Agent

# agent_with_memory.py
from mem0 import Memory

class MemoryAugmentedAgent:
    def __init__(self, llm_client):
        self.llm = llm_client
        self.memory = Memory()

    async def chat(self, user_id: str, message: str) -> str:
        # 1. Retrieve relevant memories
        relevant_memories = self.memory.search(
            query=message,
            user_id=user_id
        )

        # 2. Build context from memories
        memory_context = "\n".join([
            f"- {m['text']}" for m in relevant_memories[:5]
        ])

        # 3. Generate response with memory context
        system_prompt = f"""You are a helpful assistant. Here are relevant facts about the user:
{memory_context}

Use these facts to personalise your response."""

        response = await self.llm.complete(
            system=system_prompt,
            messages=[{"role": "user", "content": message}]
        )

        # 4. Store the interaction
        self.memory.add(
            messages=[
                {"role": "user", "content": message},
                {"role": "assistant", "content": response}
            ],
            user_id=user_id
        )

        return response

The loop is the whole pattern in four steps: search before you answer, fold the top few memories into the system prompt, generate the reply, then store the new exchange so the next turn is a little smarter. Capping it at the top five (relevant_memories[:5]) keeps the prompt tight; you don't want to dump a user's entire history into every call.

Step 5: Memory Management

# memory_management.py

# Update a memory
m.update(memory_id="mem_001", data="User prefers TypeScript for frontend and Rust for backend")

# Delete a memory
m.delete(memory_id="mem_001")

# Get all memories for a user
all_memories = m.get_all(user_id="alex-chen")
print(f"Total memories: {len(all_memories)}")

# History of changes
history = m.history(memory_id="mem_001")
for event in history:
    print(f"{event['created_at']}: {event['event']} - {event['text']}")

Memories aren't write-once. People change their minds, projects wrap up, and stale facts cause more harm than no facts at all. The update, delete, get_all, and history methods (all part of the documented Memory API) give you the controls to keep the store honest. The history call in particular is handy for auditing: it shows how a given memory has changed over time.

Step 6: Self-Hosted Deployment

# docker-compose.yml
version: '3.8'
services:
  mem0:
    image: mem0/mem0:latest
    ports:
      - "8000:8000"
    environment:
      - VECTOR_STORE_PROVIDER=qdrant
      - VECTOR_STORE_HOST=qdrant
      - LLM_PROVIDER=anthropic
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - EMBEDDER_PROVIDER=openai
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    depends_on:
      - qdrant
      - postgres

  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage

  postgres:
    image: postgres:16
    environment:
      POSTGRES_DB: mem0
      POSTGRES_PASSWORD: password
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  qdrant_data:
  postgres_data:

This is the part that matters most if you're handling client data under Australian privacy obligations: Mem0 ships an open-source FastAPI server you can run on your own infrastructure via Docker Compose, so the memory store never leaves your control (see the self-hosted setup).

Two caveats on the compose file above. The image tag mem0/mem0:latest is illustrative; the official self-host image is published as `mem0/mem0-api-server` on Docker Hub, with the server listening on internal port 8000 and the official compose mapping it to host port 8888. And the Qdrant-plus-Postgres combination shown here is a valid setup, but it isn't Mem0's documented default; the default self-host stack pairs Postgres with pgvector and Neo4j. Adapt the file to the official image and your chosen stores before deploying.

Do/Don't

DoDon't
Store user preferences and project contextStore sensitive credentials or PII
Use memory to personalise responsesRely solely on conversation history
Update memories when user preferences changeLet stale memories override current context
Self-host for data privacySend user data to managed memory without consent
Periodically clean irrelevant memoriesKeep all memories forever

A note on the performance figure

The "sub-50ms retrieval for 10,000 memories" number quoted earlier should be treated as unconfirmed. We couldn't find a published source backing it, and it runs against Mem0's own LOCOMO benchmark paper, which reports search latency closer to 148ms at the median and around 200ms at p95. Fast enough for interactive use, but plan against the published figures rather than the rounder claim.

Conclusion

Mem0 turns a stateless agent into one that remembers who it's talking to. The embedding-based retrieval keeps only relevant memories in front of the model, and the automatic fact extraction spares you from hand-curating what's worth keeping. If privacy is a concern, self-host it; if you're already running an agent framework, the API drops in without much ceremony. Either way, the payoff is continuity, agents that build on past conversations instead of starting cold every time.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Pick the smallest useful workflow that proves the pattern.
  2. Write down the owner, data boundary, review point, and success measure.
  3. Review the result after the first real run and decide whether to scale, change, or stop.

Want help applying this? Explore AI agent design systems.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: How to implement agent memory with Mem0

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call