Back to news

How-to Guide

How to build a knowledge graph with Cognee.

Use Cognee to automatically extract entities, relationships, and insights from documents to build queryable knowledge graphs that power advanced RAG systems.

AI Kick Start editorial image for How to build a knowledge graph with Cognee.

Decision

Start narrow

Use the article to decide the smallest useful workflow worth testing before expanding the system.

Risk to watch

Hype drift

Avoid turning a practical adoption step into a broad transformation promise nobody can verify.

Proof to collect

Business signal

Write down the owner, data boundary, review point, and measurable outcome before the first build.

TL;DR

TL;DR: Cognee turns your documents into a structured knowledge graph: it pulls out the entities, works out how they connect, and stores the result in a queryable graph database. Where ordinary RAG fetches chunks of text, graph RAG follows the relationships, which lets it answer questions that span several connected facts. This guide walks through building the full pipeline.

Key takeaways

  • Extraction: Automatic entity and relationship extraction from documents
  • Storage: Neo4j or NetworkX for graph storage
  • Querying: Natural language to Cypher queries
  • RAG boost: Graph RAG reportedly beats vector RAG on multi-hop questions by around 23% (figure unconfirmed; published benchmarks span roughly 6.5-26%)
  • Integration: Drop-in replacement for vector-only RAG

Analysis

Most teams that have tried retrieval-augmented generation know the frustration. You feed your handbook, your specs and your internal docs into a vector database, ask a question, and the answer comes back confident but shallow. Ask something that needs two or three facts joined together, such as "which services depend on the auth service?", and it falls apart. Vector search finds passages that look similar to your question. It does not understand how the things in those passages relate.

That gap is what knowledge graphs are meant to close. Instead of storing your documents as a pile of text fragments, a graph stores them as a web of entities and the links between them. Cognee, an open-source project from topoteretes, tries to make that automatic: drop in your files, and it reads them, identifies the people, systems and concepts inside, maps the connections, and saves the lot to a graph database you can query in plain English.

For an Australian business team sitting on years of documentation, the appeal is straightforward. You stop guessing which paragraph the model happened to retrieve, and start asking questions that follow the actual structure of your knowledge. The rest of this guide shows how to stand up that pipeline end to end.

One caveat before we start: Cognee moves quickly, and a few of the method names in the code below are illustrative rather than the exact current API. Where that matters, the article flags it so you check the live docs before copying anything verbatim.

Analysis

Prerequisites

  • Python 3.10+
  • Neo4j Community Edition (or Docker)
  • pip install cognee neo4j networkx
  • Documents to process (PDF, text, or markdown)

Cognee is published on PyPI, so the install is a plain `pip install cognee`.

Step-by-Step Framework

Step 1: Install and Configure

pip install cognee[all] neo4j networkx

# Start Neo4j (Docker)
docker run -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  neo4j:5-community
# config.py
import cognee

cognee.config.set({
    "llm_provider": "anthropic",
    "llm_model": "claude-sonnet-4.6",
    "llm_api_key": "sk-ant-your-key",

    "graph_db_provider": "neo4j",
    "graph_db_url": "bolt://localhost:7687",
    "graph_db_username": "neo4j",
    "graph_db_password": "password",

    "vector_db_provider": "lancedb",
    "embedding_model": "text-embedding-3-large"
})

A few notes on this config. Cognee does support Anthropic as an LLM provider, and `claude-sonnet-4.6` (canonical id claude-sonnet-4-6, released 17 February 2026) is a valid model to point it at. The graph and vector backends are real too: Cognee lists Neo4j, NetworkX and Kuzu for graph storage, and LanceDB, pgvector, Qdrant and Weaviate for vectors, per the Cognee configuration docs. LanceDB and NetworkX are the embedded defaults for local work.

The one thing to double-check: the exact key shape. The dict above is a simplified illustration. In practice Cognee often takes env-style keys such as LLM_PROVIDER, LLM_MODEL and GRAPH_DB_PROVIDER, so copy from the live configuration docs rather than this snippet if you hit errors.

Step 2: Ingest Documents

# ingest.py
import cognee

async def ingest_documents(file_paths: list[str]):
    for path in file_paths:
        await cognee.add(path)

    # Process, extract entities and relationships
    await cognee.cognify()

# Ingest
await ingest_documents([
    "docs/company-handbook.pdf",
    "docs/product-specs.md",
    "docs/api-documentation.md",
    "docs/engineering-blog-posts/"
])

These two calls are the heart of Cognee. cognee.add() ingests your data, and cognee.cognify() does the work of turning that text into a graph: chunking, embedding, summarising, and writing out the nodes and edges. Both are confirmed in the Cognee docs. You point it at files or whole folders and let the pipeline run.

Step 3: Query the Knowledge Graph

# query.py
import cognee

# Natural language query
results = await cognee.search(
    query="What authentication methods does the API support?",
    search_type="GRAPH"  # Uses knowledge graph
)

for result in results:
    print(f"Source: {result.source}")
    print(f"Answer: {result.text}")
    print(f"Confidence: {result.confidence}")
    print(f"Related entities: {result.related_entities}")

# Specific entity lookup
entities = await cognee.get_entities("authentication")
for entity in entities:
    print(f"Entity: {entity.name} ({entity.type})")
    print(f"Relationships: {entity.relationships}")

Cognee does support graph-based search, but check the signature against the current API before you run this. The real call uses search(query_text=..., query_type=SearchType.GRAPH_COMPLETION) with a SearchType enum, not the query= and search_type="GRAPH" string arguments shown here, according to the Cognee search docs. Treat the get_entities helper as illustrative as well; it does not appear in the current Python API reference, so confirm the actual entity-lookup method before depending on it.

Step 4: Custom Entity Types

# custom_entities.py
from cognee import EntityType, RelationshipType

# Define custom entity types
cognee.register_entity_types([
    EntityType(name="API_ENDPOINT", description="REST API endpoint"),
    EntityType(name="DATABASE_TABLE", description="Database table"),
    EntityType(name="MICROSERVICE", description="Microservice component"),
    EntityType(name="DEPLOYMENT_TARGET", description="Deployment environment")
])

# Define relationship types
cognee.register_relationship_types([
    RelationshipType(name="CALLS", description="Service A calls Service B"),
    RelationshipType(name="STORES_DATA_IN", description="Service stores data in table"),
    RelationshipType(name="DEPLOYS_TO", description="Service deploys to environment"),
    RelationshipType(name="AUTHENTICATES_VIA", description="Uses auth method")
])

The idea here is sound: generic entity types only get you so far, and defining types that match your own domain (endpoints, tables, services, deploy targets) gives the graph far more useful structure. That said, the specific helpers register_entity_types and register_relationship_types do not appear in Cognee's official API reference, so treat this snippet as pseudo-code that shows the pattern. Look up how the current version lets you declare custom types before wiring it in.

Step 5: Build Graph RAG Pipeline

# graph_rag.py
class GraphRAG:
    def __init__(self):
        self.cognee = cognee

    async def answer(self, question: str) -> str:
        # Step 1: Extract entities from question
        question_entities = await self.extract_entities(question)

        # Step 2: Find relevant subgraph
        subgraph = await self.cognee.get_subgraph(
            entities=question_entities,
            depth=2  # 2-hop traversal
        )

        # Step 3: Generate answer with context
        context = self.format_subgraph(subgraph)

        response = await self.cognee.llm.complete(
            prompt=f"""Answer the question using the provided knowledge graph context.

Context:
{context}

Question: {question}

Answer:"""
        )

        return response

    def format_subgraph(self, subgraph) -> str:
        lines = []
        for node in subgraph.nodes:
            lines.append(f"Entity: {node.name} ({node.type})")
            for rel in node.relationships:
                lines.append(f"  → {rel.type} → {rel.target.name}")
        return "\n".join(lines)

This is the part that earns graph RAG its keep. Pull the entities out of the question, fetch the slice of the graph around them, walk two hops out, and hand that connected context to the model. Two notes of caution: get_subgraph and cognee.llm.complete are not documented methods in the current Cognee surface, so read this as the shape of the pattern rather than a copy-paste recipe. Build your traversal and your completion call against whatever the live API actually exposes.

Step 6: Visualise the Graph

# visualise.py
import networkx as nx
import matplotlib.pyplot as plt

async def visualise_graph():
    # Export from Cognee to NetworkX
    G = await cognee.to_networkx()

    plt.figure(figsize=(20, 20))
    pos = nx.spring_layout(G, k=2, iterations=50)

    # Color by entity type
    node_colors = []
    for node in G.nodes():
        entity_type = G.nodes[node].get('type', 'unknown')
        colors = {
            'PERSON': '#ff9999',
            'ORGANIZATION': '#99ccff',
            'API_ENDPOINT': '#99ff99',
            'DATABASE_TABLE': '#ffcc99',
            'MICROSERVICE': '#cc99ff'
        }
        node_colors.append(colors.get(entity_type, '#cccccc'))

    nx.draw(G, pos, node_color=node_colors, with_labels=True,
            node_size=2000, font_size=8, font_weight='bold')
    plt.savefig('knowledge_graph.png', dpi=150, bbox_inches='tight')
    print("Graph saved to knowledge_graph.png")

Seeing the graph helps you trust it. Export to NetworkX, colour the nodes by type, and you get a picture of how your knowledge actually hangs together, which is often where you spot missing or wrong links. The to_networkx export shown here is, again, not a documented method name in the current docs, so check how your version exposes a NetworkX export. The matplotlib drawing code itself is standard.

Do/Don't

DoDon't
Define custom entity types for your domainRely solely on generic entity types
Use 2-3 hop depth for most queriesTraverse unlimited depth (slow, noisy)
Validate extracted entitiesTrust extraction without verification
Combine graph RAG with vector RAGReplace vector RAG entirely
Version your knowledge graphOverwrite without backup

Conclusion

Cognee closes the distance between a folder of unstructured documents and something you can actually interrogate. Graph RAG is at its best on questions that need you to follow a chain of relationships, the "which services depend on the auth service?" kind that plain vector search struggles with. Start with the automatic extraction, define the entity types that matter for your domain, and run graph RAG alongside your existing vector search rather than ripping it out. Just verify the method names against the live Cognee docs as you go, because the project is still moving fast.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Pick the smallest useful workflow that proves the pattern.
  2. Write down the owner, data boundary, review point, and success measure.
  3. Review the result after the first real run and decide whether to scale, change, or stop.

Want help applying this? Explore AI consulting & strategy.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: How to build a knowledge graph with Cognee

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call