00:00:00Ever since models started getting powerful, many people have started building really cool products,
00:00:04integrating models into them and solving lots of problems for us. But these systems consume a lot
00:00:09of tokens, especially if you're actually integrating a model using an API. The solution to this is much
00:00:15simpler than you think. The best architecture is not some extreme pipeline or highly scaled tuning,
00:00:20but actually an old philosophy that forms the basis of a Unix based system that everything is a file.
00:00:25Now, I know they weren't talking about model costs, and we're talking about devices and files. But
00:00:30surprisingly enough, the solution to this high cost issue is the exact same principle. And this is
00:00:35exactly what a software engineer at Vercel talks about. Before we explore why files are the solution,
00:00:41let's understand a few things about how these models actually work. Models have been trained
00:00:46on massive amounts of code. This is the exact reason why they're better at understanding code,
00:00:50directory structures and native bash scripts that developers use to navigate files and find what
00:00:56they need. When an agent uses grep and LS, it's not doing something new. It's simply doing something it
00:01:01already knows how to do just in a more controlled way. This approach isn't limited to code agents can
00:01:06navigate any directory containing anything, be it code or not, because they're already comfortable
00:01:11with commands and understand file systems. Whenever an agent needs something, it looks around the file
00:01:17system using native bash commands like LS and find. Once the agent finds the exact file using find,
00:01:23it searches for relevant content within that file using pattern matching with grep and cat.
00:01:27Only a small relevant slice of information is sent to the model while the rest stays out of memory,
00:01:32keeping the context window clean. This means we're not burning through tokens on irrelevant
00:01:36data that the model doesn't need. Using this approach, the agent returns a structured output.
00:01:41This pattern works so well that Vercel ended up open sourcing a bash tool built specifically
00:01:46around it, giving agents the ability to explore file systems the same way a developer would.
00:01:51When building large language model systems, there are two ways of providing the right information
00:01:56to the model, either through a detailed system prompt, hoping the agent actually follows it,
00:02:00or by feeding a lot of data into a vector database and using semantic search to extract it. But each
00:02:06approach has limitations. System prompts have a limited token window, which limits how much
00:02:10information we can send to the model at a time. To handle larger data sets, we use semantic search,
00:02:15which finds information based on matching meanings to the query. But vector search is used for
00:02:20semantic similarity rather than exact searches. It returns chunks of data that match the general
00:02:25context of the query, not necessarily the specific value we're looking for. This leaves extracting the
00:02:30right content from all the chunks to the model itself. File systems, however, offer a different
00:02:35approach. With file systems, the structure actually maps to your domain. You often have relationships
00:02:40between files in the folder structure that mirror the relationships between parent folders. With file
00:02:45systems, you don't have to flatten these relationships into model understandable vector chunks,
00:02:49which helps avoid missing relationships that are usually lost in semantic search. These hierarchical
00:02:54connections are preserved naturally, maintaining the organizational logic that already exists in
00:02:59your data. Another advantage is that retrieval is precise because grep and bash tools return exact
00:03:05matches, unlike vector search, which returns all chunks that loosely match the query and then leaves
00:03:10it to the model to decide which one to use. You get only the required value. The context is minimal
00:03:15when agents use bash tools because they receive the specific chunk they need, and many other chunks
00:03:20don't go into memory. This allows them to stay aligned and focused on the exact piece of information
00:03:25without getting lost in unrelated data. Now this idea isn't something you are unfamiliar with. This
00:03:30idea has already been used inside Claude Code and all CLI agents, where they use bash functions to
00:03:36narrow down findings using pattern matching. We've already been using the file system and
00:03:41Claude Code's capabilities for research purposes for any idea that we evaluate. We usually pass
00:03:46the software tool we come across through this pipeline, which contains multiple phases with
00:03:51our evaluation criteria that the research must pass through. All of this is defined in a markdown file
00:03:56containing the requirements and objectives of the tool we're testing, how to write the final document,
00:04:01and all the information required for each phase. We also provide Claude with certain documents as
00:04:06samples, which act as a guide for style matching, and the final document is saved in a research
00:04:11results folder. To guide the research, we have a Claude.md file explaining how to pass the idea
00:04:17through each phase one by one, ultimately giving us a research document that meets all our checks.
00:04:22Whenever we have anything to research, I just go to Claude and tell it the idea or the tool to
00:04:27research. It then runs it through the six-phase validation process, first by understanding the
00:04:32tool or idea and then passing it through each phase one by one. Once the idea has gone through
00:04:37all the phases, Claude generates a final report which we can read to verify whether the idea has
00:04:42potential or not. This file system approach saves us a lot of time by automating a research process
00:04:47that we would otherwise have to do step by step. If you want to try out this pipeline for yourself
00:04:52for your own use case, you can get a ready to use template for creating your own research pipeline
00:04:57similar to ours in our recently launched community called AI Labs Pro. For this and for all the
00:05:03previous videos, you get ready to use templates, prompts, all the commands and skills that you can
00:05:08plug directly into your projects. If you found value in what we do and want to support the channel,
00:05:12this is the best way to do it. Links in the description. I was going through their case study
00:05:17in which they explained how to build a sales summary agent using this architecture. They've
00:05:22also open sourced it, but it gave me a really interesting idea that I wanted to try out on my
00:05:27own. I was actually building a company policy project where I had a lot of company data in the
00:05:32form of JSON, Markdown and TXT files, all separated by department. Normally, I would have implemented
00:05:39this system using a vector database like Chroma, but I decided to give this tool a shot. I went
00:05:44ahead and implemented this architecture. On the back end, I included the path to the document folder
00:05:49containing the company's data and gave the agent access to the LS, CAT, GREP and FIND commands,
00:05:55along with a guide on how to use the tool and when to use each command. I used the Gemini 2.5 flash
00:06:01model, provided it with Vercel's bash tool and gave it the path to the documents inside the tool. And
00:06:06so, when I tested the agent by asking it any question related to the data, it basically
00:06:11answered based on the exact content from the company's policies, including the handbook and
00:06:16leave policy documents. To verify how it was working, I logged its tool usage on the terminal.
00:06:21The agent first used the LS command to see what documents were available and then used GREP with
00:06:27pattern matching to look for off days or any similar terms. This set of commands handled our query and
00:06:32gave us results with the same level of accuracy as a rag system would. If you want the source code for
00:06:38this project, you can find it on our community from where you can download and try out for yourself.
00:06:43Now, the first question that came to my mind while going through this tool was,
00:06:47is it really safe to equip agents to execute commands on the server? We literally saw a
00:06:52vulnerability in React server components this past December, which scored a 10.0, the highest on the
00:06:57scale, and it involved code being executed on the server. So this is a really powerful and potentially
00:07:03dangerous capability to give agents. So why did I actually trust this tool? It's because it runs in
00:07:08a sandbox and has isolation. It only accesses the specific directory we provide. It doesn't
00:07:14modify anything else. In the article, they also mentioned that the agent explores the files without
00:07:19access to the production system, so your production code remains safe even if the agent tries to run
00:07:24harmful commands on the server. It provides two types of isolation. The first is an in-memory
00:07:29environment. In this setup, it uses just bash tool, which runs bash scripts only on the files it has
00:07:35access to, just like we did when creating our agent. The second type is a fully compatible
00:07:40sandbox environment offering full virtual machine isolation using the Vercel sandbox. We can choose
00:07:46either based on our needs. The in-memory approach is lighter and faster for simple use cases while
00:07:51the full VM isolation is better when you need stronger security guarantees. Even though this
00:07:56approach is really good for saving costs per model call, it's not the right approach for all kinds of
00:08:01problems. It's definitely not ideal if you need to match the meaning of words because bash tools are
00:08:06for exact matching. As we saw when we called our agent, it used specific keywords to locate the
00:08:11required data. It's also not suitable for unorganized file structures where the agent would have to
00:08:17struggle with multiple tool calls. A structure that the agent can easily navigate is much better. My
00:08:22personal suggestion is to use the bash tool when you have highly structured data and your requests
00:08:27are mostly clear in terms of what you want. Use rag when you care more about the meaning of what's
00:08:32written in the files or when your queries are likely to be messy. Before we wrap up, here's a word from
00:08:37our sponsor. Brilliant. The best engineers don't just know syntax. They break down problems from
00:08:42first principles. That is why we've partnered with Brilliant. Their philosophy is that you learn best
00:08:47by doing. They prioritize active problem solving so you get hands-on with concepts instead of just
00:08:52memorizing. For example, in their course named "How AI Works", you don't just read, you manipulate the
00:08:57actual logic. You'll get hands-on with technicalities like calculating loss in the loss space and
00:09:02visualizing interpolation, building a deep intuition you just can't get from a video lecture. Through
00:09:07their interactive technical courses, you get the most effective way to truly master the concepts
00:09:12we talk about. You'll also get 20% off on annual premium subscription, unlocking their entire catalog
00:09:17of math, data, and CS courses, giving you a complete roadmap to upskill. Click the link in the description
00:09:22or scan the QR code on your screen to claim your free 30-day trial. That brings us to the end of
00:09:28this video. If you'd like to support the channel and help us keep making videos like this, you can
00:09:32do so by using the super thanks button below. As always, thank you for watching and I'll see you in the next one.