5:51Better Stack
Log in to leave a comment
No posts yet
We are living in an era where AI agents write code and build infrastructure. Yet, developers in the field still feel a sense of unease. This is because an agent that was performing perfectly just a moment ago can suddenly provide nonsensical answers or ignore clearly instructed tools.
Recent experimental results from the Vercel AI SDK team are shocking. When AI agents are given the choice of tools—specifically "Skills"—the failure rate reaches a staggering 56%. This isn't a problem with the model's intelligence; it's evidence that the way we provide information to AI is fundamentally flawed. The secret to pushing an agent's success rate to 100% lies not in adding more tools, but in a persistent context strategy based on agents.md.
Many developers grant agents various tool-calling capabilities, expecting the AI to pull them out and use them whenever necessary. However, this approach has a fatal weakness: decision noise.
An AI model's context window is like human short-term memory. Once a conversation goes back and forth more than five times, the instructions written in the initial system prompt lose priority. This is known as context decay. Every moment, the agent wonders, "Should I use a tool now, or just answer based on what I know?" This decision point itself becomes a single point of failure that invites mistakes.
The solution to turning a 56% failure rate into 0% is simple. Instead of giving the agent a choice, fix the core rules and information of the project into the system prompt. The agents.md file is at the heart of this.
According to Vercel's benchmarks, providing the same information as a tool resulted in a 79% success rate, but including it directly as an index in agents.md recorded a 100% pass rate.
| Analysis Metric | Tool Calling (Skills) | Persistent Context (agents.md) |
|---|---|---|
| Decision Making | Agent decides whether to load every time | Information always resides in the system |
| Reliability | Approx. 53% ~ 79% (Unstable) | Up to 100% achievable |
| Reasoning Load | High load due to decision noise | Low load by bypassing decision-making |
| Characteristics | On-demand approach | Passive approach |
To maximize performance, you must design agents.md as a "README for machines" rather than just a simple text file.
Specific prohibitions improve an agent's output quality more immediately than abstract principles. You need concrete commands, such as "Use MUI v3 and always use Jotai for state management." Instructions like "Never use alert()" and "Leverage components from specific libraries" prevent the agent from straying.
You shouldn't let the agent waste tokens by scanning the entire repository. Provide a mini-index of key file locations. Draw a clear map of whether to use pnpm for builds and where route files and schema files are located.
Performance actually drops if the file becomes too bloated. Vercel recommends a method of indexing by compressing 40KB of documentation into 8KB. The key is to optimize the paths to knowledge rather than spoon-feeding the knowledge itself to the agent.
Just as technical debt accumulates in code, "prompt debt" accumulates in AI utilization. If every team member gives different instructions to the agent, the consistency of the output collapses. By placing agents.md in the project root and managing it via Git, version control is integrated, and the team's standard guide can be applied identically regardless of which model is used.
In the era of AI agents, victory is determined by context engineering rather than model intelligence. Rather than waiting for agents to get smarter, the surest way to increase productivity is to build an environment where agents cannot make mistakes. Start right now by creating agents.md in your project root and codifying your team's rules.