I Stopped Using Grep and My Agent Got 10x Faster

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00So, there's this MCP plugin called Claude Context that indexes your entire codebase

00:00:06into a vector database, meaning your coding agent can get the exact code it needs quickly

00:00:11without guessing using grep or glob and hoping it finds the right file.

00:00:15It even passes your code with ASTs and uses a hybrid search approach, combining semantic

00:00:20vectors with keyword matching, which ends up using 40% less context.

00:00:24But it does need a Zilliz cloud account and an OpenAI key for embeddings, even if you use Claude's code.

00:00:30So, is the extra effort and cost worth the token savings?

00:00:34Hit subscribe and let's find out.

00:00:35Okay, so Claude Context, not sure about the name, is made by Zilliz, which is a company

00:00:43created by the founders of Milvus, a very performant vector database.

00:00:47It connects to your agent by MCP, so this means it can work with any agent harness and

00:00:52not just Claude code.

00:00:54But it does three pretty complex things to make your code easily findable.

00:00:58First, it uses TreeSitter to pass through all of the code, creating chunks of functions

00:01:03and classes, and this supports nine languages including TypeScript, Python, Rust and Go.

00:01:08Then it uses a custom Merkle DAG to hash each file with JSON snapshots, meaning it only re-indexes

00:01:15the files that have changed and not the whole codebase.

00:01:18And then when you actually want to search through the code, it does two different types of searches

00:01:22at the same time, a vector search to find the semantic meaning of the code, and a BM25 index

00:01:29search for exact keyword matching.

00:01:31This all results in an up to 40% context reduction, which is a lot for large codebases.

00:01:37In fact, let's see it in action by testing it against the VS Code codebase, which has

00:01:42around 1.5 million lines of code.

00:01:44So inside the cloned VS Code repo, I'm going to be using Open Code with GLM5 Turbo because

00:01:50I don't want to burn through my Clawed Pro weekly limits.

00:01:53And in order to get the MCP server set up, which you can see over here, I've already added

00:01:58the relevant information to my Open Code JSON file.

00:02:01And for this information over here, you could run Milvus locally, but I've used Zilla's Cloud.

00:02:06So I've got my API key from over here, and I created a cluster.

00:02:10So this is an AWS cluster and got the public endpoints from here.

00:02:14Now while we're on clusters, I did try to use the free one first, but I kept on getting

00:02:19time out issues.

00:02:20So I have to get a serverless one, which does cost money, but did work a lot better.

00:02:25Now once you have set up the MCP server, make sure you're running a version of node below

00:02:2824, but above or equal to 20.

00:02:31I'm currently using version 22 just for this project.

00:02:34And that will give you access to four MCP tools, index code, search code, clear index, and get

00:02:39index status.

00:02:40Now the first thing you have to do is index the codebase and we can do that with this prompt.

00:02:44But before we hit enter, let's take a look at how much money we've already spent on embeddings

00:02:48from OpenAI, which is just one cent and was for me testing a 23,000 line codebase.

00:02:53We can also see in our cluster we already have information from the codebase that was indexed.

00:02:58So now if we index this codebase, it does take some time and starts the indexing in the

00:03:02background.

00:03:03At this point, I don't recommend it doing any searches.

00:03:05Now because this is a large codebase, it will take a while to index.

00:03:09So I'll come back later when the indexing is done.

00:03:11And after about 50 minutes, the indexing is complete and we can see we have a new chunk

00:03:16in our cluster with over 223,000 loaded entries.

00:03:21And for reference, the code that I was testing with that had 23,000 lines of code has about

00:03:271,000 lines of entries and took less than one minute to index.

00:03:32And with our OpenAI usage, we've gone from one cent to $1.06, which is a lot, but I don't

00:03:38imagine going through 1.5 million lines of code is something people will do regularly.

00:03:42Okay, let's see how fast it is to make a search.

00:03:45Here we have one instance of open code using the Claude Context MCP server, and here we

00:03:49have one with no MCP servers.

00:03:52So it'll be using the regular grep and glob tools to search through the code.

00:03:56And we'll give it a prompt of use Claude Context to find the entry point of when this project

00:04:00starts up.

00:04:01Let's see how long this takes.

00:04:02Okay, so it's using the index tool and now it's using the search tool.

00:04:06And the whole thing took about 19 seconds to search through this whole project and find

00:04:10the main .ts file.

00:04:11And now we're going to give this open code a similar prompt and it finds the response

00:04:15in 14 seconds.

00:04:16So it's like for this query, using just regular GLM is a lot faster.

00:04:20Let's start a new session.

00:04:21And then I'm going to give it a new prompt of what function opens a new untitled document.

00:04:26This one took a bit longer at 40 seconds and found the main function with a line number

00:04:30and used about 23K tokens.

00:04:32And the other instance did it in 12 seconds and used 18K tokens, but it looks like it found

00:04:37a different file.

00:04:38In fact, Claude Context gives way more information showing the code and other files related to

00:04:44opening the editor.

00:04:45So I'm going to ask them both to show me the exact code.

00:04:48And at this point, Claude Context responds in 23 seconds with the code and the non-Claude

00:04:52Context open code responds in 49 seconds, almost double the amount of time.

00:04:56And it gets to the exact same code as Claude Context did, which gives me an idea.

00:05:00I'm going to give it a more broad generalized prompt of look through the code and tell me

00:05:04how this project works.

00:05:06Claude Context finishes in 49 seconds using 41K tokens, and the other instance finishes

00:05:11faster and uses less tokens.

00:05:13But if we have a look at the output produced, we can see there's much more detail from Claude

00:05:17Context giving the layered architecture and even some information about the Electron app

00:05:22and the processes it uses.

00:05:23And the non-Claude Context option does also give some architecture information, but it's

00:05:28not as detailed as the other one.

00:05:30In fact, I know it doesn't look like it, but I would say Claude Context is very good at

00:05:34getting code information upfront quickly in lots of detail.

00:05:37I mean, take a look at this.

00:05:38So from this prompt, I asked a follow-up prompt to tell me more information about the main

00:05:43process in the Electron app, which it stated up here.

00:05:47So after I asked this prompt, it took about 1 minute and 47 seconds, but look at all that

00:05:52detail.

00:05:53So it started out with the boot sequence and then phase one, so the service creation and

00:05:58service initialization.

00:05:59And we've got so much more from phase two, the code application app with all the references

00:06:04to the relevant files.

00:06:05So we've got app TS line 185, and we could keep going on and on.

00:06:10Whereas without Claude Context, OpenCode is still going through all the files using multiple

00:06:15sub agents.

00:06:16And this is a bit deceiving because we can't see exactly how many tokens each sub agent

00:06:21is using.

00:06:22But if we wait for a bit and come back, we can see after about five minutes, OpenCode

00:06:26responds with a lot of information about the Electron process, but this isn't as much

00:06:31as what Claude Context provided, and it did take five times longer.

00:06:34Now, yes, maybe if I used a smarter model like Opus 4.7 with high effort, it would get

00:06:40more information, but it will still take a long time and use a lot of tokens.

00:06:44And these are the kind of differences, so five minutes and one minute that I was seeing when

00:06:48I was testing before recording with the 23K line code base.

00:06:51So in the end, it's difficult to say who is the clear winner.

00:06:54I mean, Claude Context did always provide more detail, but it wasn't always the fastest and

00:07:00the most token efficient.

00:07:01And for large code bases, it did take a very long time to index.

00:07:05However, for average sized code bases, so 20 to 30,000 lines of code, the indexing time

00:07:10is really quick.

00:07:11And the difference in detail when it comes to results is very apparent.

00:07:14In fact, I would say I would much rather use Claude Context for average sized code bases

00:07:20than use it on large code bases.

00:07:22So that's something to think about.

00:07:23But to be honest, this is more of a great sales tool for Zillow's because before using Claude

00:07:27Context, I had never heard of them and now they have a new paying customer.

00:07:31But even though it did take a while to set up and indexing large code bases took a very

00:07:36long while.

00:07:37As someone who regularly goes through open source code bases and ask questions, I think

00:07:42this is a tool I'm going to be using a lot more.

00:07:44I mean for an average sized code base, the serverless plan isn't too expensive as the

00:07:49open AI embeddings don't cost too much either.

00:07:52So I'm happy to take the hit.

00:07:53Speaking of data retrieval and AI.

00:07:55If you want to learn how to build a really good rag system from scratch that actually

00:07:59works, then check out this video from Andris.

00:08:02And if you're a Star Wars fan, you're especially going to like this video.

Key Takeaway

Integrating Claude Context via MCP significantly improves code-searching precision and architectural detail by replacing traditional grep methods with a hybrid vector-keyword indexing system, particularly for average-sized codebases.

Highlights

Claude Context reduces context usage by up to 40% in large codebases by indexing code into a vector database.
The system uses TreeSitter to chunk code across nine languages and employs a hybrid search combining semantic vectors with BM25 keyword matching.
Indexing a 1.5 million line codebase takes approximately 50 minutes and generates over 223,000 entries.
A 23,000-line codebase requires less than one minute to index, making the tool efficient for average-sized projects.
Complex queries for architectural details yield significantly more comprehensive responses using Claude Context compared to standard grep-based methods.
Claude Context is compatible with any agent harness supporting the Model Context Protocol (MCP).

Timeline

Architecture and Mechanisms of Claude Context

Claude Context utilizes a vector database to index entire codebases for faster, more accurate retrieval.
The system employs a hybrid search strategy, combining semantic vectors with BM25 keyword matching.
Code parsing occurs via TreeSitter, supporting languages like TypeScript, Python, Rust, and Go.

The plugin eliminates the need for manual grep or glob searches by indexing code into a structure that coding agents can query directly. It parses code into functional and class-based chunks using TreeSitter. A custom Merkle DAG approach ensures efficient re-indexing by only processing changed files.

Setup and Indexing Large Repositories

Testing involved the VS Code codebase containing 1.5 million lines of code.
The tool requires a Zilliz Cloud account and an OpenAI API key for embedding generation.
Indexing a small 23,000-line codebase costs approximately one cent in OpenAI fees.

Configuration requires creating a Zilliz cluster and mapping the endpoint into the agent's MCP settings. Large repositories take significant time to index—up to 50 minutes for 1.5 million lines—while smaller projects finish in under a minute. Users must ensure Node.js versions remain between 20 and 24.

Search Performance and Output Quality

Claude Context consistently provides more detailed architecture explanations than standard grep-based search.
Complex deep-dive queries may take longer than basic keyword lookups but return richer file references.
Standard grep-based agents can take five times longer to synthesize architectural information compared to Claude Context.

Comparative testing shows that while standard agents may locate specific files faster for simple queries, Claude Context excels at broad, conceptual prompts. It produces detailed responses including layered architecture and process dependencies that standard tools miss. The performance gap becomes most visible when deep analysis of large-scale code structures is required.

Utility and Use Case Recommendations

Claude Context proves most effective and efficient for average-sized codebases of 20,000 to 30,000 lines.
The serverless indexing cost is manageable for developers frequently auditing open-source repositories.
The tool functions as an effective, performance-oriented interface for developers needing deep code insights.

While the indexing process for massive projects is time-intensive, the trade-off in output quality makes it a viable solution for code audits. Frequent users of open-source repositories benefit from the structured index, justifying the serverless infrastructure costs. The integration through MCP ensures flexibility across different AI agent implementations.

Community Posts

Write about this video