00:00:00So it turns out skills might not be the best approach to give additional context to your agents and you might actually have more luck going back to the agents.md file.
00:00:08This was actually the surprising result that Viselle found when they were testing the best method to provide coding agents with the Next.js documentation.
00:00:15So let's just jump straight in and break down what happened and why and what this teaches us about using coding agents effectively.
00:00:26So as I said, Viselle's goal here was to provide a coding agent with additional context, in this case, the Next.js documentation, so that when you're using the agent and writing Next.js, it knows of all of the new APIs as some of them might not be in the training data yet or even the opposite.
00:00:41It might be an older version of Next.js and you want to make sure it's using only the methods available in that version.
00:00:47They wanted a system of version matched documentation that the agent could use.
00:00:51So to do that, they tested two common approaches.
00:00:54First, we have our skills.
00:00:56These have been quite popular lately with loads of frameworks and tools and loads more releasing them.
00:01:01And ironically, Viselle are one of the ones helping make this popular with our skills CLI and their skills repository.
00:01:08Highly recommend you check them out.
00:01:09Now, if you don't know what skills are, they're actually just an open standard from Anthropic and they're just modular bundles of instructions, scripts and contexts that an agent can load on demand to perform tasks more accurately.
00:01:20But that's the crucial detail. It's entirely up to the agent to decide when to load this information.
00:01:26And that part seems to be that current downfall. When Viselle ran that evals, they actually found that 56 percent of the time, the skill was never invoked.
00:01:35The agent just decided not to use it.
00:01:37And surprisingly, providing the agent with the skill actually gave it zero improvement in the evals compared to an agent that didn't have the skill.
00:01:44And even more surprisingly, they actually found the skill might have a negative effect.
00:01:48It sometimes performed worse than the baseline when the skill wasn't used, which suggests that an unused skill might introduce some noise or distraction.
00:01:57So to fix this, they did actually try specifically in the prompt saying, please use this skill.
00:02:02And that did help. It did increase the skill trigger rate to 95 percent and it boost the eval pass rate to 79 percent.
00:02:09But it also came with its own problems. They actually discovered that different wordings produced drastically different results.
00:02:15For example, if you just said you must use the skill, it did that, but then it would skip the project context.
00:02:21So you had to say use both the skill and the project context.
00:02:24And Viselle just didn't like the fragility of the system, stating that if small wording tweaks produced large behavioral swings, the approach feels brittle for production use.
00:02:33So they needed a more reliable solution, perhaps one where the agent doesn't have to actually make that decision itself.
00:02:40This is when they tried the agents.md file.
00:02:42Now, this is actually an open format that loads of agents have used. And if you're a Claude fan, this is the exact same as the Claude MD.
00:02:49It's used to provide instructions to coding agents that are always included in the system prompt.
00:02:53So unlike skills, the agent is not in charge of deciding to fetch the information.
00:02:58It has it there already in its system prompt. But this could also create a problem of its own context.
00:03:03But this is where when your context grows, your output gets worse.
00:03:06So you want to put the entire Next.js documentation into the agents.md file.
00:03:10So how do you do it? Well, to counteract this, Viselle actually just used a documentation index in the agents.md.
00:03:17It's simply just a list of the file parts to the individual documentation files within your file system.
00:03:22Then the other crucial piece was just adding an instruction saying prefer retrieval led reasoning over pre-training led reasoning for any Next.js tasks.
00:03:31Now, personally, when I read this, I thought this would just lead to similar results as skills as it still has to go off and actually fetch the file to read that documentation.
00:03:38But when they tested this on their evals, the agents scored 100 percent on all of them and got perfect scores on the build, lint and test evals.
00:03:47So it is significantly more reliable and accurate than skills. It's kind of classic software engineering.
00:03:53It's where the dumber, simpler approach turns out to be the best one all along. And you don't have to over engineer anything.
00:03:58But why is this the case? Why is the agents file better than skills? Well, this is actually pretty hard to tell.
00:04:03AI is a bit of a black box, but Vercel speculates this comes down to three factors and all of them are based around decision making.
00:04:10When you have that agents file, there is no decision point for the agent.
00:04:14We're telling it right at the start in the system prompt to use the documentation and exactly where each file is.
00:04:20So it makes the knowledge persistent context instead of having it on demand and letting the model decide whether it should use it or not.
00:04:27It's already there in the reasoning since we provided it in the system prompt.
00:04:31But this also doesn't mean that skills are completely useless. In fact, Vercel found that they actually complement each other.
00:04:36They said that skills work better for explicit user triggered workflows like saying upgrade my Next.js version,
00:04:41migrate to the app router or apply some framework best practices.
00:04:45But then if you want that general framework knowledge within your coding agent,
00:04:48that passive context with the agents MD is going to outperform skills, especially with today's models.
00:04:54I'm sure in the future models will be optimized for that skill based retrieval workflow, but we're not there yet.
00:04:59For now, Vercel's recommendations, especially for framework authors or those of you that are actually going to be writing skills or the agents MD,
00:05:06they say don't wait for the skills to improve. Compress your context as much as possible.
00:05:10Designed for retrieval, not memory. And most importantly, always test everything with evals.
00:05:16And if you're just a user of these files, Vercel is actually providing a tool to download the documentation
00:05:21and the prebuilt agents.md file for your specific version of Next.js so you can take advantage of this new approach straight away.
00:05:29I'm pretty curious if other tools are going to take this approach as well. And I'm also curious what you think about this.
00:05:34Let me know in the comments down below what you think of agents and skills.
00:05:37And while you're there, subscribe. As always, see you in the next one.