Big Projects Always Fail... Anthropic Is Fixing That

AAI LABS
Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00Nowadays shipping small projects has become really easy but agents start failing the moment
00:00:04the code base grows large and gets multiple dependencies. The issue gets even worse if
00:00:09you are working with unconventional languages where errors and issues become even harder to trace.
00:00:14What people miss is that you need to take proper steps before making the agents work on large code
00:00:19bases and this is exactly what Anthropic talks about here. They cover how to actually handle
00:00:23projects when they scale. It was really insightful because these are things we ourselves have been
00:00:28using in our own projects and have found pretty helpful. Before we go into detail on how to set
00:00:33up a project at a large scale let us first understand how the agents navigate around the
00:00:38code in general. There are two ways they do this. The first is rag-based. This works by embedding
00:00:43the entire code base and retrieving the relevant chunks at query time. Based on your query it runs
00:00:48a semantic search which matches your query with the code in its database. From the similarity matches
00:00:53it loads that specific context for the model to analyze and work ahead from. This might work for
00:00:58small scale apps but it does not sustain on large scale ones. This is because there is a central
00:01:03database that maintains the data and if there are a lot of files in the database the semantic matching
00:01:08might be problematic. This is the reason coding agents hallucinate modules that no longer exist.
00:01:14Exactly because of its issues the rag-based approach has been completely replaced. The other
00:01:18type is file system based navigation which is what Claude Code and most other agents now use. This
00:01:24is similar to how software developers actually navigate. The agent uses bash tools, finds files
00:01:29with the ls command, then greps and narrows down to the exact code snippet it needs and loads that
00:01:34into context. Bash tools work because they do not pollute the context window with unnecessary
00:01:39snippets. So this mode handles all the ways rag-based systems were failing and almost all
00:01:44coding agents now navigate this way. The thing here is that no matter how models are improving
00:01:49on their own the model alone does not determine how good the code you are able to produce will be.
00:01:54An even more important thing that matters when it comes to working systems is what harness you use
00:01:59for coding. So whichever tool you use whether it is Claude Code, Codex or Gemini CLI the output you
00:02:05get is not solely defined by their powerful models. It also depends on the harness you combine with the
00:02:10model's capabilities. If the harness is weak and the model is strong there is no point in the model
00:02:15being strong on its own. Now we know agents like Claude Code and Codex have strong inherent
00:02:19harnesses but this does not mean you have to rely on those entirely. You need to set up a harness
00:02:24tailored to your project directly so it fits your project better. There are also open source harnesses
00:02:29like superpowers and you can use any of those when you are building something but when you are
00:02:33developing a large scale project these harnesses might not sustain and you would need to set up
00:02:38your own anyway. Every agent harness you build on your own or pull from shared chats contains
00:02:42five pieces centered on how Claude's jobs and agentic loops are configured environmentally.
00:02:47We will go through each. The first piece in the agent harness is the Claude.md file which is loaded
00:02:53at the start of the session and remains in memory for the entire session. This file is really
00:02:57important because it gives Claude the knowledge base for the code base. We have already done a separate
00:03:02video on how to write and structure a proper Claude.md which you can check out on the channel.
00:03:07When your code base grows large Claude.md becomes critical. If you do not spend time on it your
00:03:12project is bound to fail at scale. This file is for project conventions, code base knowledge
00:03:17and the do's and don'ts that apply across the entire code base not just a single aspect. This
00:03:22might be fine if your code base is small but it becomes a problem the moment you scale into multiple
00:03:26architectures. So stuffing every aspect of the code into one file is highly inefficient. It distracts
00:03:32the agent with information it does not need at the moment. That's why the Claude.md should stay short
00:03:37ideally around 300 lines and if you are running a mono repo with multiple areas each subdirectory
00:03:43should have its own Claude.md following the same rules. The agent progressively loads it when working
00:03:48in that directory so instead of pulling everything from the root file it gets more focused instructions
00:03:53from the sub repo files. This file is not something you write once and rely on forever. We need to
00:03:58maintain it actively not only as the project evolves but also as model intelligence evolves.
00:04:04The principles applicable for Sonnet 4.5 will definitely not apply for Opus. Newer models are
00:04:09trained to overcome patterns that were failing in earlier instructions. So giving the same
00:04:14instructions to every model just wastes tokens. But before we move forwards let's have a word by
00:04:19our sponsor CleanMyMac. If you work with AI tools like we do your Mac quietly piles up junk old builds
00:04:26cache broken downloads and you don't notice until it starts lagging. I run CleanMyMac every week and
00:04:31it frees up over 15 gigs in a single scan. That's it one click and my Mac was brand new again. CleanMyMac
00:04:37is built by MacPaw, Apple Notarized and trusted by over 29 million people for 17 years. The cleanup
00:04:42feature removes over 20 types of junk so your system stays fast without babysitting it. SpaceLens maps
00:04:48your drive visually so you know what's eating up space. It even scans your iCloud, Google Drive and
00:04:53Dropbox locally for unsynced files wasting cloud storage. And it catches 99% of known malware through
00:04:59moonlock so your Mac stays clean and secure. Your Mac should keep up with you not the other way
00:05:03around. Use code AI Labs for 20% off and try CleanMyMac free for seven days. Now hooks are
00:05:09another important thing that helps when working with these large code bases. They are basically
00:05:14scripts that let the agent take specific actions based on certain conditions. There are many types
00:05:19of hooks you can configure usually written as shell scripts that control the agent's behavior. For
00:05:23example you can configure a session start hook which loads the information you want at the start
00:05:28of each session like which files Claude should load for context. You can also use a hook with exit code
00:05:33too and feed the error message back to Claude so it can iterate on that. Pre-tool use hooks are another
00:05:38type. Whenever the agent uses whichever tool you have configured the hook for it runs your commands.
00:05:43You can use it to prevent Claude from editing files you do not want it to touch. But one of the most
00:05:48important hooks is the stop hook which runs after a session ends. This pushes Claude to reflect on
00:05:53what has been done so far. From that it can update the Claude.md with the learnings from the session
00:05:58so the same issues do not happen again. You can also configure hooks for linting, running tests,
00:06:03and many other purposes. All of these strung together help a lot with large scale code bases.
00:06:08Hooks force the agent to do things it should be careful about where instructions in Claude.md
00:06:13alone may not suffice. Instructions in Claude.md can get blurred in the agent's attention span due
00:06:19to too many things to focus on but hooks actually force Claude to act. The third piece in the
00:06:23workflow is skills. It is a set of skills.md files and other grouped files that load on demand instead
00:06:29of being present in every session and bloating it unnecessarily. Skills are important because they
00:06:34use progressive disclosure and are tailored to perform a specific specialized task needed for the
00:06:40workflow. They expand the agent's knowledge of something it is already capable of doing. If you
00:06:44put these instructions in Claude.md they just consume unnecessary tokens. Project specific
00:06:49instructions should go into skills because they load only when the agent actually needs them. You
00:06:54can also scope skills to specific paths so they only activate in the relevant part of the code
00:06:59and do not bloat the context outside of that. For example if you are working in the deployment area
00:07:04you can specify the path of that directory in the skill description so the skill is never loaded when
00:07:09you are working elsewhere. To configure skills you just invoke the skill creator that now comes built
00:07:14into Claude code. Previously you had to get it open source from GitHub then you answer the questions
00:07:19it asks during the discussion session. You will have a skill tailored to your exact needs which
00:07:23you can access once you restart the session. Aside from skills you can also use plugins. Plugins are a
00:07:29bundle of skills, hooks and MCPs available as a single downloadable and distributable package. So
00:07:34whoever installs this plugin will have the exact same context and configurations made available
00:07:39for their use right away. So if you are working in a team creating your own plugins to distribute
00:07:44to teammates becomes really important. If you set up all your configs in one place that information
00:07:49can be distributed across the organization so your team members have the same context as you. You can
00:07:54do this by creating your own plugins and managing them by either manually uploading them or syncing
00:07:59with a GitHub repository. You can install any plugin using the plugin command and you can
00:08:03browse the marketplace and install whichever one you want. You can also add other marketplaces using
00:08:08the add plugin marketplace command. Claude code also comes bundled with multiple plugins like
00:08:13front-end design, code review, code simplifier, playwright and others all from the Claude official
00:08:18marketplace. You can use them directly in your workflow and you can create your own as well.
00:08:22Plugins matter especially for large scale projects because a lot of people work on the same project
00:08:27and distributing context among them is important. So instead of making each person download skills
00:08:32and other components separately they can install the plugin directly. Also if you are enjoying our
00:08:36content consider pressing the hype button because it helps us create more content like this and reach
00:08:41out to more people. Another thing that matters in agent harnesses but is not talked about enough is
00:08:46LSP. Language server protocol or LSP is basically an integration that gives the agent the same kind
00:08:52of navigation a developer has in an IDE. There is an LSP for almost any programming language and it
00:08:58might be unnecessary with popular ones but it becomes critical with unconventional ones. It
00:09:03gives the agent intelligence about the programming language so it can navigate the code base the way
00:09:08a human does. For example when a human wants to find a function they check where that function
00:09:13is imported from, go to that file and check that file for the function's definition. That is how
00:09:17they actually find the exact source they need. Without LSP the agent pattern matches based on
00:09:22text and is likely to land on the wrong symbol. As we mentioned Claude code uses the file system
00:09:28based approach with bash commands so without LSP it is just pattern matching on file names and text
00:09:33not navigating with deeper intelligence. Now do not assume LSP is not needed just because your agent
00:09:38has not run into errors yet. Set up LSP even before you start working on the project. Configure it for
00:09:44all the languages you will use even before writing any code so the agent already has information on
00:09:49how to work with them. Instead of letting the agent guess patterns installing LSP lets it read and edit
00:09:55code the way a developer thinks about it not just as text. Now as you already know MCP is used to
00:10:00connect the agent to external tools but you can also connect your MCPs to your project's internal
00:10:05tools, data sources, APIs or other systems the agent otherwise cannot reach. For that you need
00:10:10to create your own MCPs and make them available so people on your team can use them easily. MCPs
00:10:16are basically an extension to the existing setup loaded whenever they are needed and the tools they
00:10:20provide are then available for the agent to use. If you are working on a large code base you can build
00:10:25MCPs that serve many purposes like acting as a documentation guide, retrieving analytics or even
00:10:31letting you make changes through them. These are helpful because if you have your own code base you
00:10:35can let the agent naturally interact with internal information call tools and make changes there
00:10:40instead of fumbling through huge documentation. This gives the agent more direct access to the
00:10:45information and systems it needs. But to configure an MCP the basic setup of the app needs to already
00:10:51be working. If you configure your MCP before that things can go wrong and the MCP implementation may
00:10:57fail. So first make sure your app is working properly then create the MCP and let the agent
00:11:02interact with your project with more intelligence and better information. Another thing you need to
00:11:06create is subagents. Subagents contain isolated context windows of their own and do whichever task
00:11:12is delegated to them by the main orchestrator agent then return only the final output to the parent.
00:11:17This is a key part of an agent harness because using subagents properly does not bloat the context
00:11:22window and makes context utilization much better since they do not fill the main agent's context
00:11:28with information it does not need. Subagents only run when invoked and then return their findings.
00:11:33Claude spins off subagents on its own but you can configure subagents yourself as well. You can
00:11:38configure whichever tools and models you want for them and provide instructions on how they should
00:11:42operate creating specific agents for your own workflows. You can also override Claude's existing
00:11:47agents for example you can create your own agent whose instructions override existing ones like
00:11:52explore and provide a description on how it should navigate around your directory. Claude's own
00:11:57explore agent is generalized for all kinds of code bases but if you configure your own the custom one
00:12:03overrides the default. This gives the agent more context on how the files in your project are
00:12:08structured so it does not waste tokens navigating files relying only on the information in Claude.md.
00:12:13So you can make the main agent control the whole project execution and rely on subagents for the
00:12:18actual work. Subagents also help because you can parallelize their work through agent delegation
00:12:23which makes the workflow much smoother and faster than doing everything sequentially.
00:12:28There are a few more practices you need to follow when navigating around a large code base.
00:12:32This is important because Claude's ability to navigate a large code base is determined by
00:12:36whether it is able to find the right context. So ensuring Claude gets the right context is
00:12:41important so the agent does not get too little or too much and stays focused. Aside from separating
00:12:46the Claude.md file you need to separate tests for each subdirectory instead of having them all in one
00:12:51place. This way they stay segmented avoid timeout issues when a lot of tests run at once and can
00:12:56be scoped more effectively. You can also create a separate code base map file that maps your project
00:13:01structure. If you are working with conventional apps like React or Next.js you can skip this
00:13:06because the agents have been trained extensively on those. But with unconventional languages like C++
00:13:12you need a code base map. It acts as a table of contents for the agent letting it know where
00:13:16each file lives instead of running a lot of bash commands to narrow down to the right one. Lastly
00:13:21but most importantly review your setup every few months as the model evolves. Remove the instructions,
00:13:26hooks or anything else that the newer model no longer needs. Use .ignore files like .gitignore
00:13:32and .agentignore so the files you do not want the agent or version control to touch are left alone.
00:13:37This way your setup will be able to sustain on large scale apps. Now the resources for this
00:13:41video can be found in AI Labs Pro for this video and for all our previous videos from where you can
00:13:46download and use it for your own projects. If you've found value in what we do and want to support the
00:13:51channel this is the best way to do it. The links in the description. That brings us to the end of
00:13:55this video. If you'd like to support the channel and help us keep making videos like this you can
00:14:00do so by using the super thanks button below. As always thank you for watching and I'll see you in the next one.

Key Takeaway

Scaling coding agents requires building a custom project harness that uses file system-based navigation, segmented Claude.md instructions, and automated hooks to maintain context focus.

Highlights

  • File system-based navigation replaces RAG-based approaches by using bash tools to load specific, relevant code snippets instead of relying on a central database.

  • A Claude.md file should ideally be capped at 300 lines to prevent distracting the agent with unnecessary information.

  • Subdirectories in monorepos should contain their own Claude.md files to provide the agent with focused, directory-specific instructions.

  • Hooks, such as session start, pre-tool, and stop, force specific agent behaviors that generalized instructions in Claude.md cannot reliably enforce.

  • Plugins bundle skills, hooks, and Model Context Protocol (MCP) servers into single, distributable packages for team-wide consistency.

  • Language Server Protocol (LSP) integration is necessary for agents to navigate codebases with the same intelligence as a human developer.

  • Subagents allow for isolated context windows and parallelized task execution, preventing the main agent's context window from becoming bloated.

Timeline

Agent Navigation and Harnessing

  • File system-based navigation using bash tools is superior to RAG-based systems for large codebases.
  • A coding agent's performance is defined by the combination of its model and its specific project harness.
  • RAG-based systems often hallucinate by retrieving irrelevant or outdated context from a centralized database.

While RAG-based systems embed entire codebases and perform semantic searches, they fail at scale because of data pollution in the context window. File system-based navigation mimics human developers by using bash tools like 'ls' and 'grep' to load only the necessary code snippets. Regardless of model power, a robust, project-specific harness is required to effectively manage large-scale code development.

Claude.md and Behavioral Hooks

  • Claude.md files should be kept short, ideally around 300 lines, to maintain agent focus.
  • Subdirectories in monorepos require localized Claude.md files for progressive instruction loading.
  • Hooks act as mandatory scripts that enforce agent behavior beyond what static instructions can achieve.

The Claude.md file provides essential project conventions and knowledge, but it becomes inefficient if it grows too large or is used for every sub-component of a project. Hooks allow developers to trigger actions at specific lifecycle points, such as session start, tool usage, or session exit. These hooks, such as a stop hook that triggers reflection, force the agent to follow rules even when instructions might be ignored or blurred in the context.

Skills, Plugins, and LSP

  • Skills files should be used for specialized, path-scoped instructions to avoid context bloat.
  • Plugins provide a way to package and distribute skills, hooks, and MCPs across team members.
  • Language Server Protocol (LSP) provides agents with human-like understanding of code structure and dependencies.

Skills are loaded on demand based on the agent's location in the directory structure, preventing unnecessary token usage. Plugins bundle these configurations into distributable packages, ensuring team-wide consistency on large projects. Furthermore, LSP integration is critical for agents to move beyond simple pattern matching on filenames and gain deep intelligence about code definitions and imports.

MCPs and Subagents

  • MCPs enable agents to interact directly with internal tools, data sources, and APIs.
  • Subagents operate in isolated context windows and only return final outputs to the main orchestrator.
  • Unconventional languages require a code base map to act as a table of contents for the agent.

Model Context Protocol (MCP) servers allow agents to reach systems they would otherwise be blocked from. Subagents improve performance by offloading tasks and preventing the main agent's context from overflowing. Final best practices include segmenting tests, utilizing .agentignore files, and regularly reviewing the harness setup to ensure it remains compatible with newer, more intelligent models.

Community Posts

No posts yet. Be the first to write about this video!

Write about this video