Briefing
System prompts are the most underrated part of agent performance. Practitioners often report that a well-written system prompt does more for output quality than swapping in a bigger model, and a sloppy one can make even a top-tier model like Claude Opus 4.8 turn out average work. What follows is drawn from running these agents in production over several months.
Here is the part most teams miss. When you wire up a coding agent or an internal assistant, the instinct is to reach for the most capable model and assume the rest takes care of itself. It usually does not. The system prompt, that block of standing instructions the agent reads before it sees your actual task, quietly shapes every decision it makes.
Get it right and the agent behaves like a senior hire who already knows your codebase and your standards. Get it wrong and you get a clever generalist who keeps guessing at things you never told it. The good news for Australian teams watching their tool spend: tuning a prompt costs nothing, takes minutes to test, and you can do it without touching your model contract.
This piece walks through the structure of a prompt that holds up under real work, the mistakes that quietly drag performance down, and a way to test prompts like you'd test code.
The Anatomy of an Effective System Prompt
A system prompt that earns its keep tends to have six sections.
1. Role Definition
Define what the agent is, not just what it does:
You are a senior TypeScript engineer specialising in API design. You value
type safety, explicit error handling, and composable architectures. You
dislike clever one-liners that sacrifice readability.This sets the persona, the expertise, and the values. It nudges the agent toward sensible choices without you having to spell out a rule for every situation.
2. Constraint Specification
Explicit constraints head off the usual failure modes:
CONSTRAINTS:
- Never use `any` type. Use `unknown` with type guards instead.
- Never use `eval()` or dynamic code execution.
- All database queries must use the repository layer in src/repositories/.
- All public functions must have JSDoc comments.
- Prefer early returns over nested conditionals.
- Use neverthrow for error handling, not try/catch in new code.Constraints work when they're specific and enforceable. "Write good code" is not a constraint. "Use neverthrow for error handling" is, because the agent can either follow it or not, and you can check.
3. Context Provision
Give the agent the context it needs to make good calls:
CONTEXT:
- This is a Fastify-based REST API using Prisma ORM.
- Authentication uses JWT tokens with refresh token rotation.
- The codebase supports Node 18+ and uses native fetch (not axios).
- We are migrating from REST to GraphQL (in progress, ~30% complete).
- Performance target: p95 response time < 200ms for all endpoints.Context stops the agent from falling back on defaults that clash with your setup. (Fastify and Prisma here are just stand-ins for whatever your real stack happens to be.)
4. Output Format Specification
Spell out exactly how you want the output formatted:
OUTPUT FORMAT:
For code changes:
1. Show the complete file content, not just the diff
2. Include JSDoc for all new or modified functions
3. Flag any breaking changes with [BREAKING] prefix
4. Suggest test cases for new functionality
For explanations:
1. Start with a one-sentence summary
2. Provide details in bullet points
3. Include code examples where relevant
4. Note any trade-offs or alternatives considered5. Error Handling Instructions
Tell the agent what to do when things go sideways:
ERROR HANDLING:
- If you cannot complete a task, explain why and what you tried
- If you are uncertain about a requirement, ask for clarification
- If you encounter a pattern that violates the constraints, flag it
- Never silently skip steps or ignore errors
- If a file is too large to read at once, read it in sections6. Chain-of-Thought Trigger
For complex work, tell the agent to reason through it step by step:
For tasks rated medium or high complexity:
1. First, analyse the codebase to understand current patterns
2. Second, identify the minimal set of changes needed
3. Third, consider edge cases and error scenarios
4. Fourth, implement the changes
5. Fifth, verify with tests
Show your reasoning at each step.Anti-Patterns
These habits reliably drag agent performance down:
- Over-constraining: Pile on too many rules and the agent starts optimising for compliance instead of quality
- Under-constraining: Too few rules and the output drifts from one run to the next
- Conflicting instructions: Rules that contradict each other just confuse the agent
- Vague language: "Be careful" and "do good work" give the agent nothing to act on
- All-caps shouting: AGENTS DO NOT NEED TO BE YELLED AT
- Negative framing: "Do not use X" lands weaker than "Use Y instead"
Testing System Prompts
Test your system prompts the way you test code. Keep a suite of representative tasks and score the output:
def test_system_prompt():
test_cases = [
"Add a new REST endpoint for user preferences",
"Refactor this callback-heavy code to use async/await",
"Fix this race condition in the caching layer",
]
for task in test_cases:
output = agent.run(task, system_prompt=candidate_prompt)
score = evaluate(output, criteria=[
"type_safety", "error_handling", "readability", "test_coverage"
])
assert score > 0.8, f"Failed on {task}: {score}"Prompt Length Trade-offs
A longer prompt carries more context but eats into your token budget. The practical ceiling is the model's context window minus the space the actual task needs. With Claude Opus 4.8, a roughly 2,000-token system prompt still leaves plenty of room for involved work. On smaller models, trim the prompt to 500-1,000 tokens by keeping the constraints and the output format and cutting the rest.
System prompt engineering is one of the highest-leverage moves available in agentic coding. Iterating costs nothing, a test cycle takes minutes, and teams that put the work in commonly report a noticeable lift in output quality. It's worth your time here before you reach for a bigger model or bolt on more tools.


