Skills Had ONE Job (They Failed)

BBetter Stack
컴퓨터/소프트웨어창업/스타트업AI/미래기술

Transcript

00:00:00So it turns out skills might not be the best approach to give additional context to your agents and you might actually have more luck going back to the agents.md file.
00:00:08This was actually the surprising result that Viselle found when they were testing the best method to provide coding agents with the Next.js documentation.
00:00:15So let's just jump straight in and break down what happened and why and what this teaches us about using coding agents effectively.
00:00:26So as I said, Viselle's goal here was to provide a coding agent with additional context, in this case, the Next.js documentation, so that when you're using the agent and writing Next.js, it knows of all of the new APIs as some of them might not be in the training data yet or even the opposite.
00:00:41It might be an older version of Next.js and you want to make sure it's using only the methods available in that version.
00:00:47They wanted a system of version matched documentation that the agent could use.
00:00:51So to do that, they tested two common approaches.
00:00:54First, we have our skills.
00:00:56These have been quite popular lately with loads of frameworks and tools and loads more releasing them.
00:01:01And ironically, Viselle are one of the ones helping make this popular with our skills CLI and their skills repository.
00:01:08Highly recommend you check them out.
00:01:09Now, if you don't know what skills are, they're actually just an open standard from Anthropic and they're just modular bundles of instructions, scripts and contexts that an agent can load on demand to perform tasks more accurately.
00:01:20But that's the crucial detail. It's entirely up to the agent to decide when to load this information.
00:01:26And that part seems to be that current downfall. When Viselle ran that evals, they actually found that 56 percent of the time, the skill was never invoked.
00:01:35The agent just decided not to use it.
00:01:37And surprisingly, providing the agent with the skill actually gave it zero improvement in the evals compared to an agent that didn't have the skill.
00:01:44And even more surprisingly, they actually found the skill might have a negative effect.
00:01:48It sometimes performed worse than the baseline when the skill wasn't used, which suggests that an unused skill might introduce some noise or distraction.
00:01:57So to fix this, they did actually try specifically in the prompt saying, please use this skill.
00:02:02And that did help. It did increase the skill trigger rate to 95 percent and it boost the eval pass rate to 79 percent.
00:02:09But it also came with its own problems. They actually discovered that different wordings produced drastically different results.
00:02:15For example, if you just said you must use the skill, it did that, but then it would skip the project context.
00:02:21So you had to say use both the skill and the project context.
00:02:24And Viselle just didn't like the fragility of the system, stating that if small wording tweaks produced large behavioral swings, the approach feels brittle for production use.
00:02:33So they needed a more reliable solution, perhaps one where the agent doesn't have to actually make that decision itself.
00:02:40This is when they tried the agents.md file.
00:02:42Now, this is actually an open format that loads of agents have used. And if you're a Claude fan, this is the exact same as the Claude MD.
00:02:49It's used to provide instructions to coding agents that are always included in the system prompt.
00:02:53So unlike skills, the agent is not in charge of deciding to fetch the information.
00:02:58It has it there already in its system prompt. But this could also create a problem of its own context.
00:03:03But this is where when your context grows, your output gets worse.
00:03:06So you want to put the entire Next.js documentation into the agents.md file.
00:03:10So how do you do it? Well, to counteract this, Viselle actually just used a documentation index in the agents.md.
00:03:17It's simply just a list of the file parts to the individual documentation files within your file system.
00:03:22Then the other crucial piece was just adding an instruction saying prefer retrieval led reasoning over pre-training led reasoning for any Next.js tasks.
00:03:31Now, personally, when I read this, I thought this would just lead to similar results as skills as it still has to go off and actually fetch the file to read that documentation.
00:03:38But when they tested this on their evals, the agents scored 100 percent on all of them and got perfect scores on the build, lint and test evals.
00:03:47So it is significantly more reliable and accurate than skills. It's kind of classic software engineering.
00:03:53It's where the dumber, simpler approach turns out to be the best one all along. And you don't have to over engineer anything.
00:03:58But why is this the case? Why is the agents file better than skills? Well, this is actually pretty hard to tell.
00:04:03AI is a bit of a black box, but Vercel speculates this comes down to three factors and all of them are based around decision making.
00:04:10When you have that agents file, there is no decision point for the agent.
00:04:14We're telling it right at the start in the system prompt to use the documentation and exactly where each file is.
00:04:20So it makes the knowledge persistent context instead of having it on demand and letting the model decide whether it should use it or not.
00:04:27It's already there in the reasoning since we provided it in the system prompt.
00:04:31But this also doesn't mean that skills are completely useless. In fact, Vercel found that they actually complement each other.
00:04:36They said that skills work better for explicit user triggered workflows like saying upgrade my Next.js version,
00:04:41migrate to the app router or apply some framework best practices.
00:04:45But then if you want that general framework knowledge within your coding agent,
00:04:48that passive context with the agents MD is going to outperform skills, especially with today's models.
00:04:54I'm sure in the future models will be optimized for that skill based retrieval workflow, but we're not there yet.
00:04:59For now, Vercel's recommendations, especially for framework authors or those of you that are actually going to be writing skills or the agents MD,
00:05:06they say don't wait for the skills to improve. Compress your context as much as possible.
00:05:10Designed for retrieval, not memory. And most importantly, always test everything with evals.
00:05:16And if you're just a user of these files, Vercel is actually providing a tool to download the documentation
00:05:21and the prebuilt agents.md file for your specific version of Next.js so you can take advantage of this new approach straight away.
00:05:29I'm pretty curious if other tools are going to take this approach as well. And I'm also curious what you think about this.
00:05:34Let me know in the comments down below what you think of agents and skills.
00:05:37And while you're there, subscribe. As always, see you in the next one.

Key Takeaway

Providing coding agents with passive, persistent context via an agents.md file is significantly more reliable than using on-demand skills which suffer from inconsistent invocation and decision-making noise.

Highlights

Vercel discovered that traditional AI "skills" often fail because agents choose not to invoke them 56% of the time.

The "agents.md" (or Claude.md) approach outperformed skills by providing persistent system-level instructions.

Using a documentation index in agents.md led to a 100% success rate in build

Timeline

The Failure of AI Skills in Context Provision

The speaker introduces a surprising discovery by Vercel regarding how to best provide coding agents with Next.js documentation. While modular "skills" based on Anthropic's open standard are popular, they rely on the agent's discretion to load information on demand. Vercel's testing aimed to find a version-matched documentation system to ensure agents don't use outdated APIs. The fundamental issue identified is that agents often fail at the first step of the "one job" they are given. This section sets the stage for a deeper look into the technical pitfalls of current agentic workflows.

The Problem of Skill Invocation and Fragility

Evals revealed that skills were never invoked 56% of the time, resulting in zero improvement over baseline models. Even worse, unused skills introduced noise that occasionally degraded performance compared to agents with no extra context. Attempting to force skill usage through prompting increased the trigger rate to 95%, but created a "brittle" system where minor wording changes caused unpredictable behavior. For instance, telling an agent it "must" use a skill caused it to ignore the broader project context entirely. Vercel concluded that this fragility makes forced skill invocation unsuitable for stable production environments.

The Agents.md Solution and Superior Performance

To solve the reliability issue, Vercel turned to the agents.md format, which integrates instructions directly into the system prompt. Instead of dumping entire documentation sets into the prompt, they used a documentation index consisting of file paths to specific local files. By adding an instruction to prefer retrieval-led reasoning over pre-training, the agent achieved a perfect 100% score on various technical evaluations. This "dumber, simpler" approach proved more effective because it removes the decision-making burden from the AI. It demonstrates that persistent context is currently superior to on-demand tool calling for general framework knowledge.

Why It Works and Future Recommendations

The success of the agents.md file is attributed to the lack of a decision point, making the knowledge part of the agent's core reasoning process. While skills are not useless, they are better suited for explicit tasks like "migrate to the app router" rather than passive knowledge retrieval. Vercel advises developers to compress context, design for retrieval rather than model memory, and prioritize testing with evals. The video concludes by mentioning a tool provided by Vercel to help users download prebuilt agents.md files for Next.js. This shift in strategy highlights the current limitations of model optimization for skill-based retrieval workflows.

Community Posts

View all posts