00:00:00If you've been coding in Claude code, you've probably experienced context bloat. The problem
00:00:05is that every MCP tool call in Claude code is ridiculously expensive because every one of these
00:00:11calls dumps its full output directly into the model's 200k context window. And the more tools
00:00:17you have under the tool belt, the faster your context depletes. Under certain scenarios,
00:00:22you're looking at 30 minutes of active agent use before your context compacts. And that's
00:00:28when the AI starts forgetting files, tasks, and crucial decisions. Not to mention you're spending
00:00:34a lot of money on those tokens. But there's an MCP server out there that solves this crucial issue.
00:00:40It's called context mode. In today's video, we'll take a look at what context mode does,
00:00:44how it works, and try it out for ourselves with a little demo.
00:00:48It's going to be a lot of fun, so let's dive into it.
00:00:55To understand why this happens, let's look at the math. A single playwright snapshot of a
00:01:00web page is about 56 kilobytes. Reading 20 GitHub issues is 59 kilobytes. If we do these operations
00:01:08in the planning phase multiple times in a session, you've probably eaten 70% of your window before the
00:01:14agent has even written a single line of code. Context mode acts as a virtualization layer.
00:01:20Instead of the AI talking directly to your OS, it talks to a sandbox. And instead of dumping massive
00:01:26outputs, context mode indexes them in a local SQLite database using FTS5, aka full text search.
00:01:34And the result is pretty significant. For example, that 56k playwright snapshot is reduced to 299
00:01:41bytes, a 99% reduction. Or for example, this analytics CSV is crunched down to 222 bytes,
00:01:49which is a near 100% reduction. But saving tokens is just one part of the fix. The real utility here
00:01:56is the session continuity. We've all seen how the agent's compact history and suddenly you lose track
00:02:03of the code it has written 10 minutes earlier. But context mode uses hooks to monitor every file edit,
00:02:09git operation, and sub-agent task. When your conversation compacts, context mode builds a
00:02:15priority tiered snapshot, usually under 2 kilobytes, and injects it back in. It's essentially a save
00:02:22checkpoint for your coding session. So you could hypothetically extend your session time from 30
00:02:27minutes to approximately 3 hours. It also tracks decisions and errors. For example, if the AI tried
00:02:34a fix that failed 20 minutes ago, it won't repeat that mistake even after the context resets. And
00:02:40installing it is very straightforward. If you're on clod code, first add the context mode marketplace
00:02:46by running this following command. And then run the plugin install command. And once you're done,
00:02:53you're good to go. Once you've installed it, it handles the MCP server, the hooks, and the
00:02:57routing instructions automatically. If you're on Gemini CLI or VS Code Copilot, you can run
00:03:03npm install context mode and add the config to your settings. Now let's see context mode in action. I
00:03:10have this simple Python command here that will create a dummy access log file that contains a
00:03:15list of a bunch of dummy API requests and their status codes. And every hundredth line is a 500
00:03:22error log. Now we can fire up clod and ask, hey, use context mode to index access dot log. I want
00:03:30to find all the 500 error patterns and summarize the IP addresses associated with them. And in the
00:03:36background, context mode chunks the 5,000 lines of the access dot log file into its own SQLite
00:03:44FTS5 database. And clod only receives confirmation that the file is indexed, not the raw 5,000 lines
00:03:51of the file. And now clod can intelligently search the index database to query the contents instead
00:03:57of parsing the whole file. And here we can see the findings returned by clod. But more importantly,
00:04:02let's look at the cost savings. We can do this by running context mode, column CTS stats, and we can
00:04:09check out how much data is saved by context mode in this current session. And you can see the results
00:04:15right here. Instead of dumping the entire 20 kilobytes into the conversation, context mode kept
00:04:21about 5 kilobytes of that raw data in the sandbox. And this result is pretty impressive for a small
00:04:27file. It spared about 1,200 tokens from entering the context window. So overall, we get a nice
00:04:3425% reduction running this little test. That may not sound like much, but keep in mind that
00:04:41in a standard clod session, the data would just sit there forever getting resent with every single
00:04:47message that you send. And by keeping it in the sandbox, we've already started to extend the life
00:04:53of this session. And this demo file is pretty small, but if you deal with larger files,
00:04:58the savings here could be massive. If you're running a massive repo research project or analyzing
00:05:03production scale logs, that 1,200 token saving can easily turn into 100,000 tokens. But the goal here
00:05:11isn't just about saving money on API costs, though that is a nice bonus. It's also about maintaining
00:05:18the intelligence of the model. When you clear the noise out of the context window, you're leaving
00:05:24more room for actual reasoning. You're giving clod the space it needs to be a better engineer.
00:05:30So if you're building complex projects with AI agents, give this tool a shot and see how
00:05:35much longer you can extend the sessions before the agent starts compacting and forgetting things.
00:05:41And if you enjoyed this technical breakdown, please let me know by smashing that like button
00:05:45underneath the video. And also don't forget to subscribe to our channel. This has been
00:05:50Andris from Better Stack, and I will see you in the next videos.