MCP Tools Just Got 10x Faster In Claude Code

BBetter Stack
컴퓨터/소프트웨어AI/미래기술

Transcript

00:00:00The Closco team have just fixed the biggest issue with MCP by adding tool search away
00:00:05to reduce context by up to 95% simply by searching for a tool name before using it instead of
00:00:11preloading all available tools into context, which could be tens of thousands of tokens
00:00:16used up even before writing your first prompt.
00:00:18But why wasn't this the way it worked before?
00:00:21And did they steal this technique from Cloudflare?
00:00:24Hit subscribe and let's get into it.
00:00:26MCP servers are absolutely everywhere, there's one for GitHub, Docker, Notion, there's
00:00:32even a better stack one which I've heard is really good.
00:00:35And with people using Clawed Code and LLMs for everything other than code, it seems like
00:00:40MCP isn't going anywhere anytime soon.
00:00:43But it has its problems, naming collisions, command injections, and the biggest of all
00:00:48token inefficiency, because all the tools from a connected server typically gets preloaded
00:00:53into the model's context window to give the model complete visibility.
00:00:57So tool names, tool descriptions, the full JSON schema documentation that contains optional
00:01:02and required parameters, their types, any constraints, basically a lot of data.
00:01:07The Redis team used 167 tools from four different servers, which took up over 60,000 tokens even
00:01:14before writing a prompt.
00:01:15Almost half of Opus' 200k context window, and this is even outside of skills and plugins.
00:01:21So if you have a lot of servers, that could take up a substantial amount of tokens.
00:01:25Yes, I know there are models out there, like Gemini, that have a 1 million token window,
00:01:31but models tend to perform worse the more things you add to their context.
00:01:35So what's the best way to fix this?
00:01:37Well, I've seen two popular paths online, the programmatic approach, which is what Cloudflare
00:01:42have done, and the search approach, which is what the Clawed Code team have done.
00:01:46I'll talk about the programmatic approach a bit later, but first, let's talk about the search process,
00:01:52which works like this.
00:01:53First, Clawed checks if preloaded MCP tools are more than 10% of the context.
00:01:59So that's 20k tokens if the context window is 200k tokens.
00:02:04If not, then no change happens, and the model uses the MCP tools as normal.
00:02:10But if yes, then Clawed dynamically discovers the correct tools to use using natural language
00:02:17and loads in three to five of the most relevant tools based on the prompt.
00:02:22It will fully load just these tools into context for the model to use as normal.
00:02:27This was actually their most requested feature on GitHub, and it works similar to AgentSkills,
00:02:32which only loads skill names and descriptions into context, and when it finds a skill it
00:02:37thinks is relevant or a skill that was mentioned in the prompt, then it goes ahead and loads
00:02:42all of that specific skill into the context window.
00:02:46Progressive disclosure in a nutshell.
00:02:47Both Anthropic and Cursor have seen great benefits when it comes to using this approach for MCP tools.
00:02:53But what about the programmatic approach?
00:02:55This works by models orchestrating tools through code instead of making API calls.
00:03:01So for these three tools that need to work one after the other based on the previous response,
00:03:06instead of making individual API tool calls, Clawed in particular can write a Python script
00:03:11to do all of this orchestration, then execute the code and present the result back to the model.
00:03:16Cloudflare have taken this one step further by getting the model to write typescript definitions
00:03:21for all the available tools and then running the code in a sandbox which is usually a worker.
00:03:27The Clawed code team actually tried the programmatic approach but found search to work better, which
00:03:32I find really hard to believe considering Clawed is very good at writing code.
00:03:37And also, the agent browser CLI headless chromium thing that Vacel have released works very well
00:03:44in Clawed code and I'm sure if you could convert all MCP tools into CLI commands using
00:03:50something like MCPorter, it would be much easier and context efficient for models to run a specific
00:03:56CLI command for a tool instead of loading things into context, but hey, that's just my opinion.
00:04:01Overall, I'm glad the issues with MCP servers are being looked into and maybe it might just
00:04:07convince me to have more than one server installed.

Key Takeaway

Claude Code solved MCP's biggest token inefficiency problem by implementing tool search that dynamically loads only 3-5 relevant tools instead of preloading all available tools, reducing context usage by up to 95%.

Highlights

Claude Code team implemented tool search to reduce MCP context usage by up to 95%

Previous MCP implementation preloaded all tools into context, consuming tens of thousands of tokens before any prompt

Redis team's setup used 167 tools from four servers, consuming over 60,000 tokens (nearly half of Opus' 200k context window)

Tool search activates when MCP tools exceed 10% of context window, dynamically loading only 3-5 most relevant tools

Two approaches to solving MCP inefficiency: search-based (Claude Code) vs programmatic/code-based (Cloudflare)

Claude Code team tested programmatic approach but found search method worked better despite Claude's strong coding capabilities

Progressive disclosure technique similar to AgentSkills - loading only tool names/descriptions first, then full details when relevant

Timeline

Introduction to MCP Tool Search Solution

The Claude Code team has addressed MCP's biggest issue by implementing tool search functionality that reduces context usage by up to 95%. Instead of preloading all available tools into the context window (which could consume tens of thousands of tokens), the system now searches for specific tool names before using them. This fundamental change solves the problem of token consumption occurring even before users write their first prompt. The video poses intriguing questions about why this wasn't the original implementation and whether the technique was inspired by Cloudflare's approach.

MCP Proliferation and Core Problems

MCP servers have become ubiquitous across platforms including GitHub, Docker, Notion, and BetterStack, with widespread adoption in Claude Code and LLMs beyond just coding use cases. However, this popularity has exposed three critical problems: naming collisions, command injections, and most significantly, token inefficiency. The traditional approach of preloading all tools from connected servers into the model's context window creates visibility but at a massive cost. This preloading includes tool names, descriptions, and complete JSON schema documentation with parameters, types, and constraints, resulting in substantial data consumption before any actual work begins.

Real-World Impact: Redis Team Case Study

The Redis team's configuration demonstrates the severity of the token inefficiency problem with concrete numbers. Their setup utilized 167 tools from four different servers, consuming over 60,000 tokens before writing a single prompt. This represented almost half of Claude Opus' 200,000 token context window, and this calculation doesn't even include skills and plugins. For users with multiple servers installed, the token consumption could become even more substantial, severely limiting the available context for actual prompts and responses. This case study illustrates why token efficiency became the most pressing concern for MCP implementations.

Context Window Limitations and Model Performance

While some models like Gemini offer larger context windows up to 1 million tokens, which might seem to solve the preloading problem, there's a critical performance trade-off. Models tend to perform worse as more content is added to their context window, regardless of the total available capacity. This means that simply having a larger context window doesn't eliminate the inefficiency problem. The degradation in model performance with increased context highlights why a more sophisticated approach to tool loading is necessary, setting up the explanation of potential solutions.

Two Approaches: Programmatic vs Search

Two primary solutions have emerged in the community to address MCP token inefficiency: the programmatic approach championed by Cloudflare and the search approach implemented by the Claude Code team. The search process works with a threshold-based system: Claude first checks if preloaded MCP tools exceed 10% of the context window (20k tokens for a 200k window). If below this threshold, the system operates normally with all tools preloaded. When the threshold is exceeded, Claude dynamically discovers the correct tools using natural language processing and loads only 3-5 of the most relevant tools based on the specific prompt, allowing these selected tools to function normally while drastically reducing overall context consumption.

Search Approach Details and AgentSkills Comparison

The tool search feature was the most requested enhancement on Claude Code's GitHub repository, and its implementation mirrors the progressive disclosure approach used in AgentSkills. Similar to how AgentSkills loads only skill names and descriptions initially, then fully loads specific skills when they're identified as relevant or mentioned in the prompt, the MCP tool search maintains minimal context overhead. Only when the system identifies tools as relevant does it load their complete specifications into the context window. Both Anthropic and Cursor have reported significant benefits from implementing this search-based approach, validating the technique's effectiveness across different platforms.

Programmatic Approach Explained

The programmatic approach represents a fundamentally different solution where models orchestrate tools through code execution rather than API calls. Instead of making individual API tool calls for sequential operations that depend on previous responses, Claude can write a Python script that handles all the orchestration logic. The script executes and presents results back to the model in a single operation. Cloudflare has advanced this concept further by having models write TypeScript definitions for all available tools, then executing the code in sandboxed environments (typically workers). This approach transforms the tool usage paradigm from declarative API calls to imperative code execution.

Claude Code Team's Decision and Alternative Perspectives

Interestingly, the Claude Code team tested the programmatic approach but ultimately found the search method to perform better, a surprising conclusion given Claude's well-known coding capabilities. The speaker expresses skepticism about this decision, noting that Vercel's agent browser CLI with headless Chromium works exceptionally well in Claude Code. The speaker suggests that converting all MCP tools into CLI commands using something like MCPorter could be more efficient and context-friendly, as models could run specific CLI commands for tools instead of loading tool specifications into context. Despite these alternative viewpoints, the speaker acknowledges the positive development that MCP server issues are being actively addressed, suggesting it might encourage broader adoption beyond single-server configurations.

Community Posts

No posts yet. Be the first to write about this video!

Write about this video