Analysis
Picture a new contractor who turns up to your office every morning with total amnesia. They're competent, they'll do whatever you ask, but they remember nothing from yesterday. You explain your filing system again. You re-state your preferences again. They make the same wrong assumption they made last week, because for them it's the first time. That is roughly how today's AI agents behave.
Nous Research, the open-source AI group behind the Hermes models, put out a tool in February 2026 aimed squarely at that gap. It's called Hermes Agent, and it's an MIT-licensed, open-source agent designed to carry memory forward between sessions instead of wiping the slate each time (NousResearch/hermes-agent on GitHub). The project picked up a serious GitHub audience fast, which tells you the problem it's poking at is one a lot of developers feel.
For an Australian business, the "so what" is straightforward. A lot of the friction in using AI tools is the repetition: re-teaching the same context, re-correcting the same mistakes. An agent that genuinely remembers your patterns is worth more than one that's marginally smarter on day one. Whether Hermes delivers on that promise in practice is the open question, and it's worth looking at how the thing is actually built before getting too excited.
Most AI agents share a basic flaw: they don't learn. Start a fresh session with a coding agent, a research assistant, or a task bot and it begins from the same base state every time. It doesn't recall what worked last time. It doesn't pick up your preferences. It doesn't get sharper with practice. That's the problem Nous Research set out to tackle with Hermes Agent, and the size of the project's GitHub following suggests they've hit a nerve.
Worth a flag up front: the public write-up this article draws on describes Hermes through an architecture that Nous reportedly calls "episodic memory with structured generalisation," built on three memory layers. Independent analysis of the actual project paints a simpler picture, so treat the layered design below as the claimed model rather than confirmed internals. As described, the system keeps three separate memory layers: a short-term working memory for the task in front of it, a medium-term episodic memory that holds successful and failed strategies from earlier sessions, and a long-term semantic memory that pulls general principles out of those past experiences.
The Three-Layer Memory Architecture
The working memory layer is said to behave much like a standard agent context window, holding the running conversation and the tool outputs that matter for the current job. Where Hermes is described as differing is in how it handles that working memory. Instead of just chopping off old context when it hits the limit, the reported design uses a learned compression model to squeeze less-relevant context into short summaries, which then get promoted up into episodic memory. (Note: independent reviews describe the real project's memory as a simpler two-file, character-capped setup plus full-text session search, not a learned-compression pipeline, so this layer should be read as the claimed mechanism.)
The episodic memory is, on this account, where the actual learning is meant to happen. After each task, the system reportedly runs an automatic post-mortem: which strategies did it try, which worked, which failed and why, and were there any surprises along the way. That review supposedly produces structured "experience records" stored in a vector database with detailed metadata tags. (This vector-database-and-learned-ranking description is not corroborated by independent analysis of the project, which points to Markdown skill files and full-text session search instead, so take the specifics as unconfirmed.)
When a new task starts, Hermes is described as searching this episodic memory for relevant past runs. The retrieval is said to go past plain semantic similarity, using a learned ranking model that weighs task type, domain, difficulty, and outcome to surface the best precedents. The claimed payoff: a developer who favours certain coding patterns finds that, over time, Hermes leans into those patterns without being told to.
The semantic memory layer is described as distilling general principles from the pile of episodic records. These take the form of structured rules, the kind of thing a senior engineer says out loud: "when you hit an unfamiliar API, read the official docs before guessing," or "when a test fails on and off, suspect a race condition before a logic bug." On this point the picture is closer to reality: Hermes does produce human-readable, user-editable memory and skill files you can inspect, change, or switch off, which gives a level of transparency that purely neural systems lack (analysis of the Hermes memory system). The separate "semantic memory tier" framing, though, is part of the layered model that independent reviews don't confirm.

Evaluation Results
Nous Research has reportedly published evaluation data for Hermes Agent, and the figures quoted are encouraging, though they should be treated with caution: no public Nous benchmark could be matched to the specific numbers below. On a custom benchmark said to measure task-completion efficiency across repeated similar tasks, Hermes reportedly shows a 34% improvement in completion time and a 28% drop in error rate between the first and tenth time it sees a task type (Source: Nous Research evaluation, 2026, unverified; the only documented Nous figure is roughly 40% faster completion once an agent has built up 20+ of its own skills).
The more interesting claim is positive transfer: skills picked up in one domain reportedly lift performance in related ones. An agent that's done a lot of web scraping is said to do better on API integration work, presumably because both lean on reading structured data and handling authentication. The reported transfer effect is modest, averaging around 8% improvement in related domains (Source: Nous Research, 2026, unverified; this transfer figure was not found in any published source).
Limitations and Concerns
Hermes has trade-offs. The memory system adds real overhead. Every query reportedly has to pull from the episodic and semantic stores, which is said to add 200-500ms of latency versus a stateless agent (unverified, no published benchmark supports this figure). Storage also grows over time, and Nous hasn't published a clear strategy for consolidating or forgetting old memories. Left alone, a long-running Hermes deployment could pile up gigabytes of experience records that are worth less and less.
There's a safety angle too. An agent that learns from what it sees can also learn bad habits if it's fed malicious input. Nous has reportedly added a moderation layer that flags potentially harmful learned behaviours for a human to review, but that safeguard is unconfirmed and hasn't been independently tested.




