Hermes Agent Review: The Learning Agent That Improves Itself
TL;DR: Hermes Agent learns from what it does, and in our two weeks with it the change showed up in the numbers: faster, more accurate, fewer repeated mistakes on our own test suite. The software is free (MIT licence), and you can run it on a cheap VPS for around $5 a month, though the real cost is the model API calls, not the server. The catch: it's early software, it expects a technical user, and we tested an older release than the one available now.
Most AI agents have a kind of amnesia. You give one a task, it works through it, and the moment the session closes it forgets everything it figured out. Next time you ask, it starts from zero again. That's fine for a one-off, but it means the agent never actually gets better at your work.
Hermes Agent, an open-source project from Nous Research, is built to break that habit. It keeps notes on what worked, what didn't, and which shortcuts pay off, then reaches back into those notes the next time a similar job comes up. The promise is an agent that improves with use instead of resetting.
So we ran it for two weeks to see whether the learning was real or just marketing. It was real, and you could watch it happen. By the end the agent was finishing tasks quicker and tripping over the same errors far less often. That's the headline. The fine print is that this is rough, hands-on software aimed at people who are comfortable in a terminal, and a couple of the claims floating around about how it works don't match how it actually works.
If you run a small team and you've been waiting for an agent that remembers your context from one week to the next, Hermes is worth a look. Just go in knowing it's a project to tinker with, not a polished product to deploy and forget.
What Is Hermes Agent?
Hermes Agent is an open-source AI agent under the MIT licence, designed to learn from its own experience. Where most agents start fresh every session, Hermes keeps a running knowledge base of:
- Approaches that worked, and what came of them
- Attempts that failed, and why
- How it tends to use its tools
- Domain-specific shortcuts it has picked up
In practice it writes "skill documents" from experience, sharpens them as it goes, searches its own past conversations, and builds up a picture of you across sessions.
Cost: The software is free. Running it persistently costs roughly $5/mo for a VPS, but note that figure leaves out the language-model API calls, which Nous Research points to as the real cost driver.
The Learning Loop
Hermes runs on a feedback loop. The official docs describe it as a closed cycle of planning, acting, curating memory, and recalling later; the five-step framing below is our own shorthand, but it tracks what the architecture docs describe:
- Plan, work out a strategy for the task
- Execute, carry out the plan using tools
- Evaluate, score the result (success, partial, failure)
- Learn, write the lessons back into the knowledge base
- Apply, reach for those patterns on the next task
Week 1 vs Week 2 comparison:
| Metric | Day 1 | Day 7 | Day 14 | Improvement |
|---|---|---|---|---|
| Task success rate | 62% | 76% | 82% | +32% |
| Average task time | 4m 30s | 3m 15s | 2m 58s | -34% |
| Tool calls per task | 8.2 | 6.1 | 5.4 | -34% |
| Repeated errors | 12 | 5 | 2 | -83% |
These are our own figures from an internal test suite, so treat them as one team's experience rather than a benchmark anyone can reproduce. (One wrinkle worth flagging: the TL;DR talks about "28% more accurate," while the table actually shows task success climbing 32%, from 62% to 82%, the two numbers come from different cuts of the same run.) With that caveat, the trend was hard to miss. By day 14 Hermes was spotting tasks it had seen before and reusing strategies that had paid off the first time.
Knowledge Persistence
Here's where the popular description of Hermes is wrong, and it's worth correcting. Hermes does not store its memory in a vector database. Per the architecture docs, it uses a local SQLite file (~/.hermes/state.db) with FTS5 full-text keyword search, plus LLM summarisation to pull the right context back across sessions. The design deliberately skips vector embeddings for its core memory. (A community plugin can bolt on pgvector if you want it, but that's not the default.)
What that storage choice buys you (persistent memory docs):
- It survives restarts
- You can inspect what it has learned (the SQLite state file plus
USER.mdandMEMORY.mdstate files) - Because it's file-based, exporting, importing, and sharing a knowledge base between agents is feasible, though those weren't called out as first-class features in the docs we read, so treat them as plausible rather than confirmed
We exported the knowledge base after two weeks and counted 1,247 learned patterns, 342 documented failure modes, and 89 catalogued strategies. Again, those are numbers from our own run, not figures you'll find published anywhere.
Setup Requirements
Hermes is not a beginner tool, but it's also lighter to install than some write-ups suggest. The official install is a single curl command on Linux, macOS, or WSL2. You'll need:
- An API key for a language model (OpenAI, Anthropic, or a local model via Ollama, Hermes documents 18-plus providers)
- Comfort on the command line
A couple of things often listed as requirements aren't. Docker and Docker Compose are optional: Docker is just one of several terminal execution backends (local, docker, ssh, modal, daytona, singularity), not a prerequisite. A Linux VPS is one way to run it persistently, which is the setup we used, but it isn't mandatory either.
For our deployment we used a $5/mo DigitalOcean droplet, and setup took about 45 minutes as a technical user. That timing is our experience, not a guarantee.
Pros and Cons
| Pros | Cons |
|---|---|
| Real, measurable learning | Requires technical setup |
| Cheap to run (server-side) | Early stage, occasional crashes |
| Knowledge is inspectable and portable | Learning is domain-specific |
| Open source and hackable | Needs a persistent host |
| Improves noticeably over time | Rough edges and a fast-moving codebase |
One note on that last "con": we found the docs sparse during testing, but Nous Research now maintains a fairly extensive docs site and there are several community guides, so "minimal documentation" is fairer as a snapshot of where the project was than where it is.
Verdict
Score: 8.1/10
Hermes Agent does the thing it sets out to do: the agent got better the more we used it. The learning held up in our testing, the server cost is small, and because it's open source you actually own how your agent develops. It isn't ready for critical production work, but it's the most interesting agent framework we've put through its paces this year.
One honest caveat before you dive in: we tested v0.8.2, and the project moves fast. By the time this published, Hermes had already reached v0.16.0, several releases on from what we ran. Expect some of the rough edges we hit to have been sanded down, and check the current version before you judge it on our notes.
*Published June 14, 2026 | Hermes Agent v0.8.2 tested on Ubuntu 24.04*





