Transcript
00:00:00So, there's this MCP plugin called Claude Context that indexes your entire codebase
00:00:06into a vector database, meaning your coding agent can get the exact code it needs quickly
00:00:11without guessing using grep or glob and hoping it finds the right file.
00:00:15It even passes your code with ASTs and uses a hybrid search approach, combining semantic
00:00:20vectors with keyword matching, which ends up using 40% less context.
00:00:24But it does need a Zilliz cloud account and an OpenAI key for embeddings, even if you use Claude's code.
00:00:30So, is the extra effort and cost worth the token savings?
00:00:34Hit subscribe and let's find out.
00:00:35Okay, so Claude Context, not sure about the name, is made by Zilliz, which is a company
00:00:43created by the founders of Milvus, a very performant vector database.
00:00:47It connects to your agent by MCP, so this means it can work with any agent harness and
00:00:52not just Claude code.
00:00:54But it does three pretty complex things to make your code easily findable.
00:00:58First, it uses TreeSitter to pass through all of the code, creating chunks of functions
00:01:03and classes, and this supports nine languages including TypeScript, Python, Rust and Go.
00:01:08Then it uses a custom Merkle DAG to hash each file with JSON snapshots, meaning it only re-indexes
00:01:15the files that have changed and not the whole codebase.
00:01:18And then when you actually want to search through the code, it does two different types of searches
00:01:22at the same time, a vector search to find the semantic meaning of the code, and a BM25 index
00:01:29search for exact keyword matching.
00:01:31This all results in an up to 40% context reduction, which is a lot for large codebases.
00:01:37In fact, let's see it in action by testing it against the VS Code codebase, which has
00:01:42around 1.5 million lines of code.
00:01:44So inside the cloned VS Code repo, I'm going to be using Open Code with GLM5 Turbo because
00:01:50I don't want to burn through my Clawed Pro weekly limits.
00:01:53And in order to get the MCP server set up, which you can see over here, I've already added
00:01:58the relevant information to my Open Code JSON file.
00:02:01And for this information over here, you could run Milvus locally, but I've used Zilla's Cloud.
00:02:06So I've got my API key from over here, and I created a cluster.
00:02:10So this is an AWS cluster and got the public endpoints from here.
00:02:14Now while we're on clusters, I did try to use the free one first, but I kept on getting
00:02:19time out issues.
00:02:20So I have to get a serverless one, which does cost money, but did work a lot better.
00:02:25Now once you have set up the MCP server, make sure you're running a version of node below
00:02:2824, but above or equal to 20.
00:02:31I'm currently using version 22 just for this project.
00:02:34And that will give you access to four MCP tools, index code, search code, clear index, and get
00:02:39index status.
00:02:40Now the first thing you have to do is index the codebase and we can do that with this prompt.
00:02:44But before we hit enter, let's take a look at how much money we've already spent on embeddings
00:02:48from OpenAI, which is just one cent and was for me testing a 23,000 line codebase.
00:02:53We can also see in our cluster we already have information from the codebase that was indexed.
00:02:58So now if we index this codebase, it does take some time and starts the indexing in the
00:03:02background.
00:03:03At this point, I don't recommend it doing any searches.
00:03:05Now because this is a large codebase, it will take a while to index.
00:03:09So I'll come back later when the indexing is done.
00:03:11And after about 50 minutes, the indexing is complete and we can see we have a new chunk
00:03:16in our cluster with over 223,000 loaded entries.
00:03:21And for reference, the code that I was testing with that had 23,000 lines of code has about
00:03:271,000 lines of entries and took less than one minute to index.
00:03:32And with our OpenAI usage, we've gone from one cent to $1.06, which is a lot, but I don't
00:03:38imagine going through 1.5 million lines of code is something people will do regularly.
00:03:42Okay, let's see how fast it is to make a search.
00:03:45Here we have one instance of open code using the Claude Context MCP server, and here we
00:03:49have one with no MCP servers.
00:03:52So it'll be using the regular grep and glob tools to search through the code.
00:03:56And we'll give it a prompt of use Claude Context to find the entry point of when this project
00:04:00starts up.
00:04:01Let's see how long this takes.
00:04:02Okay, so it's using the index tool and now it's using the search tool.
00:04:06And the whole thing took about 19 seconds to search through this whole project and find
00:04:10the main .ts file.
00:04:11And now we're going to give this open code a similar prompt and it finds the response
00:04:15in 14 seconds.
00:04:16So it's like for this query, using just regular GLM is a lot faster.
00:04:20Let's start a new session.
00:04:21And then I'm going to give it a new prompt of what function opens a new untitled document.
00:04:26This one took a bit longer at 40 seconds and found the main function with a line number
00:04:30and used about 23K tokens.
00:04:32And the other instance did it in 12 seconds and used 18K tokens, but it looks like it found
00:04:37a different file.
00:04:38In fact, Claude Context gives way more information showing the code and other files related to
00:04:44opening the editor.
00:04:45So I'm going to ask them both to show me the exact code.
00:04:48And at this point, Claude Context responds in 23 seconds with the code and the non-Claude
00:04:52Context open code responds in 49 seconds, almost double the amount of time.
00:04:56And it gets to the exact same code as Claude Context did, which gives me an idea.
00:05:00I'm going to give it a more broad generalized prompt of look through the code and tell me
00:05:04how this project works.
00:05:06Claude Context finishes in 49 seconds using 41K tokens, and the other instance finishes
00:05:11faster and uses less tokens.
00:05:13But if we have a look at the output produced, we can see there's much more detail from Claude
00:05:17Context giving the layered architecture and even some information about the Electron app
00:05:22and the processes it uses.
00:05:23And the non-Claude Context option does also give some architecture information, but it's
00:05:28not as detailed as the other one.
00:05:30In fact, I know it doesn't look like it, but I would say Claude Context is very good at
00:05:34getting code information upfront quickly in lots of detail.
00:05:37I mean, take a look at this.
00:05:38So from this prompt, I asked a follow-up prompt to tell me more information about the main
00:05:43process in the Electron app, which it stated up here.
00:05:47So after I asked this prompt, it took about 1 minute and 47 seconds, but look at all that
00:05:52detail.
00:05:53So it started out with the boot sequence and then phase one, so the service creation and
00:05:58service initialization.
00:05:59And we've got so much more from phase two, the code application app with all the references
00:06:04to the relevant files.
00:06:05So we've got app TS line 185, and we could keep going on and on.
00:06:10Whereas without Claude Context, OpenCode is still going through all the files using multiple
00:06:15sub agents.
00:06:16And this is a bit deceiving because we can't see exactly how many tokens each sub agent
00:06:21is using.
00:06:22But if we wait for a bit and come back, we can see after about five minutes, OpenCode
00:06:26responds with a lot of information about the Electron process, but this isn't as much
00:06:31as what Claude Context provided, and it did take five times longer.
00:06:34Now, yes, maybe if I used a smarter model like Opus 4.7 with high effort, it would get
00:06:40more information, but it will still take a long time and use a lot of tokens.
00:06:44And these are the kind of differences, so five minutes and one minute that I was seeing when
00:06:48I was testing before recording with the 23K line code base.
00:06:51So in the end, it's difficult to say who is the clear winner.
00:06:54I mean, Claude Context did always provide more detail, but it wasn't always the fastest and
00:07:00the most token efficient.
00:07:01And for large code bases, it did take a very long time to index.
00:07:05However, for average sized code bases, so 20 to 30,000 lines of code, the indexing time
00:07:10is really quick.
00:07:11And the difference in detail when it comes to results is very apparent.
00:07:14In fact, I would say I would much rather use Claude Context for average sized code bases
00:07:20than use it on large code bases.
00:07:22So that's something to think about.
00:07:23But to be honest, this is more of a great sales tool for Zillow's because before using Claude
00:07:27Context, I had never heard of them and now they have a new paying customer.
00:07:31But even though it did take a while to set up and indexing large code bases took a very
00:07:36long while.
00:07:37As someone who regularly goes through open source code bases and ask questions, I think
00:07:42this is a tool I'm going to be using a lot more.
00:07:44I mean for an average sized code base, the serverless plan isn't too expensive as the
00:07:49open AI embeddings don't cost too much either.
00:07:52So I'm happy to take the hit.
00:07:53Speaking of data retrieval and AI.
00:07:55If you want to learn how to build a really good rag system from scratch that actually
00:07:59works, then check out this video from Andris.
00:08:02And if you're a Star Wars fan, you're especially going to like this video.