AI Tools

Hermes Agent Review: The Learning Agent That Improves Itself.

Hermes Agent is a self-improving AI agent that learns from its mistakes. We ran it for 2 weeks on real tasks to see if the learning loop actually works.

Daniel Fleuren2026-06-1410 min readFounders and operatorsUpdated 2026-06-19

Written by

Daniel Fleuren

Founder, AI Kick Start. 20+ years enterprise IT

Updated 2026-06-19

AI Kick Start editorial image for Hermes Agent Review: The Learning Agent That Improves Itself.

Decision

Start narrow

Use the article to decide the smallest useful workflow worth testing before expanding the system.

Risk to watch

Hype drift

Avoid turning a practical adoption step into a broad transformation promise nobody can verify.

Proof to collect

Business signal

Write down the owner, data boundary, review point, and measurable outcome before the first build.

TL;DR

TL;DR: Hermes Agent is a self-improving AI agent that learns from its mistakes. We ran it for 2 weeks on real tasks to see if the learning loop actually works.

Key takeaways

Hermes Agent Review: The Learning Agent That Improves Itself: **TL;DR:** Hermes Agent learns from what it does, and in our two weeks with it the change showed up in the numbers: faster, more accurate, fewer repeated mistakes on our own test suite.
What Is Hermes Agent?: Hermes Agent is an open-source AI agent under the MIT licence, designed to learn from its own experience.
The Learning Loop: Hermes runs on a feedback loop.
Week 1 vs Week 2 comparison:: Task success rate: 62%: 76%: 82%: +32% Average task time: 4m 30s: 3m 15s: 2m 58s: -34% Tool calls per task: 8.2: 6.1: 5.4: -34% Repeated errors: 12: 5: 2: -83% These are our own figures from an internal test suite, so treat them as one team's experience rather than a benchmark anyone can reproduce.
Knowledge Persistence: Here's where the popular description of Hermes is wrong, and it's worth correcting.

Hermes Agent Review: The Learning Agent That Improves Itself

TL;DR: Hermes Agent learns from what it does, and in our two weeks with it the change showed up in the numbers: faster, more accurate, fewer repeated mistakes on our own test suite. The software is free (MIT licence), and you can run it on a cheap VPS for around $5 a month, though the real cost is the model API calls, not the server. The catch: it's early software, it expects a technical user, and we tested an older release than the one available now.

Most AI agents have a kind of amnesia. You give one a task, it works through it, and the moment the session closes it forgets everything it figured out. Next time you ask, it starts from zero again. That's fine for a one-off, but it means the agent never actually gets better at your work.

Hermes Agent, an open-source project from Nous Research, is built to break that habit. It keeps notes on what worked, what didn't, and which shortcuts pay off, then reaches back into those notes the next time a similar job comes up. The promise is an agent that improves with use instead of resetting.

So we ran it for two weeks to see whether the learning was real or just marketing. It was real, and you could watch it happen. By the end the agent was finishing tasks quicker and tripping over the same errors far less often. That's the headline. The fine print is that this is rough, hands-on software aimed at people who are comfortable in a terminal, and a couple of the claims floating around about how it works don't match how it actually works.

If you run a small team and you've been waiting for an agent that remembers your context from one week to the next, Hermes is worth a look. Just go in knowing it's a project to tinker with, not a polished product to deploy and forget.

What Is Hermes Agent?

Hermes Agent is an open-source AI agent under the MIT licence, designed to learn from its own experience. Where most agents start fresh every session, Hermes keeps a running knowledge base of:

Approaches that worked, and what came of them
Attempts that failed, and why
How it tends to use its tools
Domain-specific shortcuts it has picked up

In practice it writes "skill documents" from experience, sharpens them as it goes, searches its own past conversations, and builds up a picture of you across sessions.

Cost: The software is free. Running it persistently costs roughly $5/mo for a VPS, but note that figure leaves out the language-model API calls, which Nous Research points to as the real cost driver.

The Learning Loop

Hermes runs on a feedback loop. The official docs describe it as a closed cycle of planning, acting, curating memory, and recalling later; the five-step framing below is our own shorthand, but it tracks what the architecture docs describe:

Plan, work out a strategy for the task
Execute, carry out the plan using tools
Evaluate, score the result (success, partial, failure)
Learn, write the lessons back into the knowledge base
Apply, reach for those patterns on the next task

Week 1 vs Week 2 comparison:

Metric	Day 1	Day 7	Day 14	Improvement
Task success rate	62%	76%	82%	+32%
Average task time	4m 30s	3m 15s	2m 58s	-34%
Tool calls per task	8.2	6.1	5.4	-34%
Repeated errors	12	5	2	-83%

These are our own figures from an internal test suite, so treat them as one team's experience rather than a benchmark anyone can reproduce. (One wrinkle worth flagging: the TL;DR talks about "28% more accurate," while the table actually shows task success climbing 32%, from 62% to 82%, the two numbers come from different cuts of the same run.) With that caveat, the trend was hard to miss. By day 14 Hermes was spotting tasks it had seen before and reusing strategies that had paid off the first time.

Knowledge Persistence

Here's where the popular description of Hermes is wrong, and it's worth correcting. Hermes does not store its memory in a vector database. Per the architecture docs, it uses a local SQLite file (~/.hermes/state.db) with FTS5 full-text keyword search, plus LLM summarisation to pull the right context back across sessions. The design deliberately skips vector embeddings for its core memory. (A community plugin can bolt on pgvector if you want it, but that's not the default.)

What that storage choice buys you (persistent memory docs):

It survives restarts
You can inspect what it has learned (the SQLite state file plus USER.md and MEMORY.md state files)
Because it's file-based, exporting, importing, and sharing a knowledge base between agents is feasible, though those weren't called out as first-class features in the docs we read, so treat them as plausible rather than confirmed

We exported the knowledge base after two weeks and counted 1,247 learned patterns, 342 documented failure modes, and 89 catalogued strategies. Again, those are numbers from our own run, not figures you'll find published anywhere.

Setup Requirements

Hermes is not a beginner tool, but it's also lighter to install than some write-ups suggest. The official install is a single curl command on Linux, macOS, or WSL2. You'll need:

An API key for a language model (OpenAI, Anthropic, or a local model via Ollama, Hermes documents 18-plus providers)
Comfort on the command line

A couple of things often listed as requirements aren't. Docker and Docker Compose are optional: Docker is just one of several terminal execution backends (local, docker, ssh, modal, daytona, singularity), not a prerequisite. A Linux VPS is one way to run it persistently, which is the setup we used, but it isn't mandatory either.

For our deployment we used a $5/mo DigitalOcean droplet, and setup took about 45 minutes as a technical user. That timing is our experience, not a guarantee.

Pros and Cons

Pros	Cons
Real, measurable learning	Requires technical setup
Cheap to run (server-side)	Early stage, occasional crashes
Knowledge is inspectable and portable	Learning is domain-specific
Open source and hackable	Needs a persistent host
Improves noticeably over time	Rough edges and a fast-moving codebase

One note on that last "con": we found the docs sparse during testing, but Nous Research now maintains a fairly extensive docs site and there are several community guides, so "minimal documentation" is fairer as a snapshot of where the project was than where it is.

Verdict

Score: 8.1/10

Hermes Agent does the thing it sets out to do: the agent got better the more we used it. The learning held up in our testing, the server cost is small, and because it's open source you actually own how your agent develops. It isn't ready for critical production work, but it's the most interesting agent framework we've put through its paces this year.

One honest caveat before you dive in: we tested v0.8.2, and the project moves fast. By the time this published, Hermes had already reached v0.16.0, several releases on from what we ran. Expect some of the rough edges we hit to have been sanded down, and check the current version before you judge it on our notes.

*Published June 14, 2026 | Hermes Agent v0.8.2 tested on Ubuntu 24.04*

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

GitHub advisories

What to do next

Pick the smallest useful workflow that proves the pattern.
Write down the owner, data boundary, review point, and success measure.
Review the result after the first real run and decide whether to scale, change, or stop.

Want help applying this? Explore AI agent design systems.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: Hermes Agent Review: The Learning Agent That Improves Itself

Read with ChatGPT Open Claude Search with AI Mode

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call