Briefing
For about a year, the fashionable way to build software with AI was to type what you wanted in plain English and let the agent sort out the rest. It even had a name: "vibe coding," a phrase Andrej Karpathy coined in early 2025 that went on to become Collins Dictionary's word of the year. The demos were genuinely impressive. The code that reached production often was not.
By the middle of 2026, a lot of the strongest engineering teams had quietly changed tack. The new idea going around is "context engineering," which is a fancy way of saying you stop leaning on a clever sentence and start being deliberate about what the AI can actually see when it works. Karpathy himself now calls vibe coding "passe."
The shift matters for any business shipping software with these tools, because it changes where the effort goes. Less time crafting the perfect instruction, more time making sure the agent has the right files, rules, history, and limits in front of it. The prompt is still the question you ask. The context is the body of knowledge the agent answers from.
Here is the catch worth being honest about: context engineering is a real and well-documented trend, but a lot of the framing below (the five-layer model, the target numbers) is one practitioner's playbook rather than benchmarked fact. Treat it as a sensible way to think, not gospel.
Why Vibe Coding Failed
Vibe coding worked well enough for a few kinds of work: brand-new prototypes, standard CRUD operations, and integrations against well-documented APIs. It struggled with almost everything else. Legacy codebases, performance-sensitive code, security-critical systems, and any domain carrying unwritten rules that never made it into the training data.
The failure tended to look the same each time. Short on context, the agent would produce code that read as correct but broke some constraint nobody had written down. It would reach for patterns that were common in its training data but at odds with how the team actually did things. It would miss the edge cases that anyone who had spent a week in the codebase would have spotted straight away.
Vibe coding assumed the prompt held everything the agent needed. It does not. The prompt is a query. Context is the database.
The Five Layers of Context
Good context engineering feeds the agent information across five layers. (Worth flagging: this five-layer split is the author's own framework, useful but not a settled industry standard.)
Layer 1: Code Context
The agent needs to see the code that matters, not just the file open in front of it. That means:
- Files that call the function being changed
- Files that implement the interfaces in play
- Test files that exercise the code paths you are touching
- Configuration files that change how things behave
Claude Code's Task system reads through the project and pulls in the files it judges relevant, then manages the context window as it goes, though it does this by exploring and reading rather than running a formal static call-graph analysis (how the Task system works). Hermes runs FTS5 full-text search over its past sessions to surface relevant history. The better engineers add explicit pointers on top: "Also look at src/auth/middleware.ts and tests/integration/auth.test.ts."
Layer 2: Convention Context
Every codebase carries conventions that never make it into a lint rule or a style guide. They live in code review comments, team chats, and the habits of senior engineers. This is the hardest context to hand over, because so much of it is unspoken.
The fix is a CONVENTIONS.md: a living document that records the team's standards as they evolve. Not just "we use 2 spaces" but the real stuff. "We prefer early returns over nested conditionals." "We use neverthrow for error handling in new code but allow try/catch in legacy modules." "Database queries go through the repository layer, never straight from a controller."
Layer 3: Historical Context
What has been tried before, and why did it fall over? Tools pair up here: Hermes works alongside Honcho (plastic-labs/honcho) for this kind of memory, though it is worth being clear that Honcho is a separate Plastic Labs product bolted on via integration, not something native to Hermes. OpenClaw's `MEMORY.md` does it by hand. Without this layer, agents keep repeating the same mistakes. "We tried ORM X two years ago and dropped it because of performance problems with large joins" is exactly the kind of note that saves wasted effort.
Layer 4: Constraint Context
The hard limits the agent has to respect: "This must run on Node 18." "This endpoint handles 10k RPM." "This runs in a browser with strict CSP headers." "This processes PII and must not log raw values." Keep constraints explicit, number them, and refer back to them in the prompt.
Layer 5: Intent Context
What is the actual goal behind the task? "Refactor this function" is a task. "Refactor this function so we can reuse it in the new billing service" is intent, and it tells the agent how to make trade-offs. If reuse is the point, the agent should favour a clean interface over a performance tweak.
The Context Engineering Workflow
1. Identify the task
2. Gather code context (relevant files, tests, dependencies)
3. Gather convention context (CONVENTIONS.md, style guides)
4. Gather historical context (Honcho search, MEMORY.md, git log)
5. List explicit constraints
6. State the intent, not just the task
7. Provide the assembled context to the agent
8. Review output for context gaps
9. Refine context and iterateMeasuring Context Quality
You can put numbers on this, and the author suggests these targets (worth treating as sensible starting goals rather than benchmarked figures, since no study backs the specific thresholds):
- First-attempt success rate: the share of tasks completed correctly with no revision (target: >60%)
- Revision count: average back-and-forth turns to reach completion (target: <3)
- Constraint compliance: the share of explicit constraints respected in the output (target: >95%)
- Convention alignment: the share of output that matches team conventions (target: >90%)
Engineers who put the work into context engineering reportedly see something like 40-60% fewer revision cycles than they did with vibe coding, though that figure reads as an estimate rather than a measured result from any published source. The idea, at least, is straightforward: time spent gathering and structuring context up front gets paid back in fewer rounds of fixing things.
Context Engineering vs. Prompt Engineering
Prompt engineering tunes the query. Context engineering tunes the database. Both count, but context tends to win out, for a few reasons:
- A perfect prompt with thin context still fails
- An average prompt with strong context usually lands
- Context carries across many prompts; a prompt is tied to one task
The agents getting the most attention in 2026, Claude Code with its Task system, Hermes paired with Honcho, and OpenHuman with its Memory Trees (tinyhumansai/openhuman), are at heart context engines. Their job is to gather, structure, and surface the right context at the right moment. The engineers getting the best results are the ones who have worked this out and put their effort there.



