Big Projects Always Fail... Anthropic Is Fixing That

EnglishEspañol Bahasa Indonesia 한국어 Português

Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00Nowadays shipping small projects has become really easy but agents start failing the moment

00:00:04the code base grows large and gets multiple dependencies. The issue gets even worse if

00:00:09you are working with unconventional languages where errors and issues become even harder to trace.

00:00:14What people miss is that you need to take proper steps before making the agents work on large code

00:00:19bases and this is exactly what Anthropic talks about here. They cover how to actually handle

00:00:23projects when they scale. It was really insightful because these are things we ourselves have been

00:00:28using in our own projects and have found pretty helpful. Before we go into detail on how to set

00:00:33up a project at a large scale let us first understand how the agents navigate around the

00:00:38code in general. There are two ways they do this. The first is rag-based. This works by embedding

00:00:43the entire code base and retrieving the relevant chunks at query time. Based on your query it runs

00:00:48a semantic search which matches your query with the code in its database. From the similarity matches

00:00:53it loads that specific context for the model to analyze and work ahead from. This might work for

00:00:58small scale apps but it does not sustain on large scale ones. This is because there is a central

00:01:03database that maintains the data and if there are a lot of files in the database the semantic matching

00:01:08might be problematic. This is the reason coding agents hallucinate modules that no longer exist.

00:01:14Exactly because of its issues the rag-based approach has been completely replaced. The other

00:01:18type is file system based navigation which is what Claude Code and most other agents now use. This

00:01:24is similar to how software developers actually navigate. The agent uses bash tools, finds files

00:01:29with the ls command, then greps and narrows down to the exact code snippet it needs and loads that

00:01:34into context. Bash tools work because they do not pollute the context window with unnecessary

00:01:39snippets. So this mode handles all the ways rag-based systems were failing and almost all

00:01:44coding agents now navigate this way. The thing here is that no matter how models are improving

00:01:49on their own the model alone does not determine how good the code you are able to produce will be.

00:01:54An even more important thing that matters when it comes to working systems is what harness you use

00:01:59for coding. So whichever tool you use whether it is Claude Code, Codex or Gemini CLI the output you

00:02:05get is not solely defined by their powerful models. It also depends on the harness you combine with the

00:02:10model's capabilities. If the harness is weak and the model is strong there is no point in the model

00:02:15being strong on its own. Now we know agents like Claude Code and Codex have strong inherent

00:02:19harnesses but this does not mean you have to rely on those entirely. You need to set up a harness

00:02:24tailored to your project directly so it fits your project better. There are also open source harnesses

00:02:29like superpowers and you can use any of those when you are building something but when you are

00:02:33developing a large scale project these harnesses might not sustain and you would need to set up

00:02:38your own anyway. Every agent harness you build on your own or pull from shared chats contains

00:02:42five pieces centered on how Claude's jobs and agentic loops are configured environmentally.

00:02:47We will go through each. The first piece in the agent harness is the Claude.md file which is loaded

00:02:53at the start of the session and remains in memory for the entire session. This file is really

00:02:57important because it gives Claude the knowledge base for the code base. We have already done a separate

00:03:02video on how to write and structure a proper Claude.md which you can check out on the channel.

00:03:07When your code base grows large Claude.md becomes critical. If you do not spend time on it your

00:03:12project is bound to fail at scale. This file is for project conventions, code base knowledge

00:03:17and the do's and don'ts that apply across the entire code base not just a single aspect. This

00:03:22might be fine if your code base is small but it becomes a problem the moment you scale into multiple

00:03:26architectures. So stuffing every aspect of the code into one file is highly inefficient. It distracts

00:03:32the agent with information it does not need at the moment. That's why the Claude.md should stay short

00:03:37ideally around 300 lines and if you are running a mono repo with multiple areas each subdirectory

00:03:43should have its own Claude.md following the same rules. The agent progressively loads it when working

00:03:48in that directory so instead of pulling everything from the root file it gets more focused instructions

00:03:53from the sub repo files. This file is not something you write once and rely on forever. We need to

00:03:58maintain it actively not only as the project evolves but also as model intelligence evolves.

00:04:04The principles applicable for Sonnet 4.5 will definitely not apply for Opus. Newer models are

00:04:09trained to overcome patterns that were failing in earlier instructions. So giving the same

00:04:14instructions to every model just wastes tokens. But before we move forwards let's have a word by

00:04:19our sponsor CleanMyMac. If you work with AI tools like we do your Mac quietly piles up junk old builds

00:04:26cache broken downloads and you don't notice until it starts lagging. I run CleanMyMac every week and

00:04:31it frees up over 15 gigs in a single scan. That's it one click and my Mac was brand new again. CleanMyMac

00:04:37is built by MacPaw, Apple Notarized and trusted by over 29 million people for 17 years. The cleanup

00:04:42feature removes over 20 types of junk so your system stays fast without babysitting it. SpaceLens maps

00:04:48your drive visually so you know what's eating up space. It even scans your iCloud, Google Drive and

00:04:53Dropbox locally for unsynced files wasting cloud storage. And it catches 99% of known malware through

00:04:59moonlock so your Mac stays clean and secure. Your Mac should keep up with you not the other way

00:05:03around. Use code AI Labs for 20% off and try CleanMyMac free for seven days. Now hooks are

00:05:09another important thing that helps when working with these large code bases. They are basically

00:05:14scripts that let the agent take specific actions based on certain conditions. There are many types

00:05:19of hooks you can configure usually written as shell scripts that control the agent's behavior. For

00:05:23example you can configure a session start hook which loads the information you want at the start

00:05:28of each session like which files Claude should load for context. You can also use a hook with exit code

00:05:33too and feed the error message back to Claude so it can iterate on that. Pre-tool use hooks are another

00:05:38type. Whenever the agent uses whichever tool you have configured the hook for it runs your commands.

00:05:43You can use it to prevent Claude from editing files you do not want it to touch. But one of the most

00:05:48important hooks is the stop hook which runs after a session ends. This pushes Claude to reflect on

00:05:53what has been done so far. From that it can update the Claude.md with the learnings from the session

00:05:58so the same issues do not happen again. You can also configure hooks for linting, running tests,

00:06:03and many other purposes. All of these strung together help a lot with large scale code bases.

00:06:08Hooks force the agent to do things it should be careful about where instructions in Claude.md

00:06:13alone may not suffice. Instructions in Claude.md can get blurred in the agent's attention span due

00:06:19to too many things to focus on but hooks actually force Claude to act. The third piece in the

00:06:23workflow is skills. It is a set of skills.md files and other grouped files that load on demand instead

00:06:29of being present in every session and bloating it unnecessarily. Skills are important because they

00:06:34use progressive disclosure and are tailored to perform a specific specialized task needed for the

00:06:40workflow. They expand the agent's knowledge of something it is already capable of doing. If you

00:06:44put these instructions in Claude.md they just consume unnecessary tokens. Project specific

00:06:49instructions should go into skills because they load only when the agent actually needs them. You

00:06:54can also scope skills to specific paths so they only activate in the relevant part of the code

00:06:59and do not bloat the context outside of that. For example if you are working in the deployment area

00:07:04you can specify the path of that directory in the skill description so the skill is never loaded when

00:07:09you are working elsewhere. To configure skills you just invoke the skill creator that now comes built

00:07:14into Claude code. Previously you had to get it open source from GitHub then you answer the questions

00:07:19it asks during the discussion session. You will have a skill tailored to your exact needs which

00:07:23you can access once you restart the session. Aside from skills you can also use plugins. Plugins are a

00:07:29bundle of skills, hooks and MCPs available as a single downloadable and distributable package. So

00:07:34whoever installs this plugin will have the exact same context and configurations made available

00:07:39for their use right away. So if you are working in a team creating your own plugins to distribute

00:07:44to teammates becomes really important. If you set up all your configs in one place that information

00:07:49can be distributed across the organization so your team members have the same context as you. You can

00:07:54do this by creating your own plugins and managing them by either manually uploading them or syncing

00:07:59with a GitHub repository. You can install any plugin using the plugin command and you can

00:08:03browse the marketplace and install whichever one you want. You can also add other marketplaces using

00:08:08the add plugin marketplace command. Claude code also comes bundled with multiple plugins like

00:08:13front-end design, code review, code simplifier, playwright and others all from the Claude official

00:08:18marketplace. You can use them directly in your workflow and you can create your own as well.

00:08:22Plugins matter especially for large scale projects because a lot of people work on the same project

00:08:27and distributing context among them is important. So instead of making each person download skills

00:08:32and other components separately they can install the plugin directly. Also if you are enjoying our

00:08:36content consider pressing the hype button because it helps us create more content like this and reach

00:08:41out to more people. Another thing that matters in agent harnesses but is not talked about enough is

00:08:46LSP. Language server protocol or LSP is basically an integration that gives the agent the same kind

00:08:52of navigation a developer has in an IDE. There is an LSP for almost any programming language and it

00:08:58might be unnecessary with popular ones but it becomes critical with unconventional ones. It

00:09:03gives the agent intelligence about the programming language so it can navigate the code base the way

00:09:08a human does. For example when a human wants to find a function they check where that function

00:09:13is imported from, go to that file and check that file for the function's definition. That is how

00:09:17they actually find the exact source they need. Without LSP the agent pattern matches based on

00:09:22text and is likely to land on the wrong symbol. As we mentioned Claude code uses the file system

00:09:28based approach with bash commands so without LSP it is just pattern matching on file names and text

00:09:33not navigating with deeper intelligence. Now do not assume LSP is not needed just because your agent

00:09:38has not run into errors yet. Set up LSP even before you start working on the project. Configure it for

00:09:44all the languages you will use even before writing any code so the agent already has information on

00:09:49how to work with them. Instead of letting the agent guess patterns installing LSP lets it read and edit

00:09:55code the way a developer thinks about it not just as text. Now as you already know MCP is used to

00:10:00connect the agent to external tools but you can also connect your MCPs to your project's internal

00:10:05tools, data sources, APIs or other systems the agent otherwise cannot reach. For that you need

00:10:10to create your own MCPs and make them available so people on your team can use them easily. MCPs

00:10:16are basically an extension to the existing setup loaded whenever they are needed and the tools they

00:10:20provide are then available for the agent to use. If you are working on a large code base you can build

00:10:25MCPs that serve many purposes like acting as a documentation guide, retrieving analytics or even

00:10:31letting you make changes through them. These are helpful because if you have your own code base you

00:10:35can let the agent naturally interact with internal information call tools and make changes there

00:10:40instead of fumbling through huge documentation. This gives the agent more direct access to the

00:10:45information and systems it needs. But to configure an MCP the basic setup of the app needs to already

00:10:51be working. If you configure your MCP before that things can go wrong and the MCP implementation may

00:10:57fail. So first make sure your app is working properly then create the MCP and let the agent

00:11:02interact with your project with more intelligence and better information. Another thing you need to

00:11:06create is subagents. Subagents contain isolated context windows of their own and do whichever task

00:11:12is delegated to them by the main orchestrator agent then return only the final output to the parent.

00:11:17This is a key part of an agent harness because using subagents properly does not bloat the context

00:11:22window and makes context utilization much better since they do not fill the main agent's context

00:11:28with information it does not need. Subagents only run when invoked and then return their findings.

00:11:33Claude spins off subagents on its own but you can configure subagents yourself as well. You can

00:11:38configure whichever tools and models you want for them and provide instructions on how they should

00:11:42operate creating specific agents for your own workflows. You can also override Claude's existing

00:11:47agents for example you can create your own agent whose instructions override existing ones like

00:11:52explore and provide a description on how it should navigate around your directory. Claude's own

00:11:57explore agent is generalized for all kinds of code bases but if you configure your own the custom one

00:12:03overrides the default. This gives the agent more context on how the files in your project are

00:12:08structured so it does not waste tokens navigating files relying only on the information in Claude.md.

00:12:13So you can make the main agent control the whole project execution and rely on subagents for the

00:12:18actual work. Subagents also help because you can parallelize their work through agent delegation

00:12:23which makes the workflow much smoother and faster than doing everything sequentially.

00:12:28There are a few more practices you need to follow when navigating around a large code base.

00:12:32This is important because Claude's ability to navigate a large code base is determined by

00:12:36whether it is able to find the right context. So ensuring Claude gets the right context is

00:12:41important so the agent does not get too little or too much and stays focused. Aside from separating

00:12:46the Claude.md file you need to separate tests for each subdirectory instead of having them all in one

00:12:51place. This way they stay segmented avoid timeout issues when a lot of tests run at once and can

00:12:56be scoped more effectively. You can also create a separate code base map file that maps your project

00:13:01structure. If you are working with conventional apps like React or Next.js you can skip this

00:13:06because the agents have been trained extensively on those. But with unconventional languages like C++

00:13:12you need a code base map. It acts as a table of contents for the agent letting it know where

00:13:16each file lives instead of running a lot of bash commands to narrow down to the right one. Lastly

00:13:21but most importantly review your setup every few months as the model evolves. Remove the instructions,

00:13:26hooks or anything else that the newer model no longer needs. Use .ignore files like .gitignore

00:13:32and .agentignore so the files you do not want the agent or version control to touch are left alone.

00:13:37This way your setup will be able to sustain on large scale apps. Now the resources for this

00:13:41video can be found in AI Labs Pro for this video and for all our previous videos from where you can

00:13:46download and use it for your own projects. If you've found value in what we do and want to support the

00:13:51channel this is the best way to do it. The links in the description. That brings us to the end of

00:13:55this video. If you'd like to support the channel and help us keep making videos like this you can

00:14:00do so by using the super thanks button below. As always thank you for watching and I'll see you in the next one.

Key Takeaway

Scaling coding agents requires building a custom project harness that uses file system-based navigation, segmented Claude.md instructions, and automated hooks to maintain context focus.

Highlights

File system-based navigation replaces RAG-based approaches by using bash tools to load specific, relevant code snippets instead of relying on a central database.
A Claude.md file should ideally be capped at 300 lines to prevent distracting the agent with unnecessary information.
Subdirectories in monorepos should contain their own Claude.md files to provide the agent with focused, directory-specific instructions.
Hooks, such as session start, pre-tool, and stop, force specific agent behaviors that generalized instructions in Claude.md cannot reliably enforce.
Plugins bundle skills, hooks, and Model Context Protocol (MCP) servers into single, distributable packages for team-wide consistency.
Language Server Protocol (LSP) integration is necessary for agents to navigate codebases with the same intelligence as a human developer.
Subagents allow for isolated context windows and parallelized task execution, preventing the main agent's context window from becoming bloated.

Timeline

Agent Navigation and Harnessing

File system-based navigation using bash tools is superior to RAG-based systems for large codebases.
A coding agent's performance is defined by the combination of its model and its specific project harness.
RAG-based systems often hallucinate by retrieving irrelevant or outdated context from a centralized database.

While RAG-based systems embed entire codebases and perform semantic searches, they fail at scale because of data pollution in the context window. File system-based navigation mimics human developers by using bash tools like 'ls' and 'grep' to load only the necessary code snippets. Regardless of model power, a robust, project-specific harness is required to effectively manage large-scale code development.

Claude.md and Behavioral Hooks

Claude.md files should be kept short, ideally around 300 lines, to maintain agent focus.
Subdirectories in monorepos require localized Claude.md files for progressive instruction loading.
Hooks act as mandatory scripts that enforce agent behavior beyond what static instructions can achieve.

The Claude.md file provides essential project conventions and knowledge, but it becomes inefficient if it grows too large or is used for every sub-component of a project. Hooks allow developers to trigger actions at specific lifecycle points, such as session start, tool usage, or session exit. These hooks, such as a stop hook that triggers reflection, force the agent to follow rules even when instructions might be ignored or blurred in the context.

Skills, Plugins, and LSP

Skills files should be used for specialized, path-scoped instructions to avoid context bloat.
Plugins provide a way to package and distribute skills, hooks, and MCPs across team members.
Language Server Protocol (LSP) provides agents with human-like understanding of code structure and dependencies.

Skills are loaded on demand based on the agent's location in the directory structure, preventing unnecessary token usage. Plugins bundle these configurations into distributable packages, ensuring team-wide consistency on large projects. Furthermore, LSP integration is critical for agents to move beyond simple pattern matching on filenames and gain deep intelligence about code definitions and imports.

MCPs and Subagents

MCPs enable agents to interact directly with internal tools, data sources, and APIs.
Subagents operate in isolated context windows and only return final outputs to the main orchestrator.
Unconventional languages require a code base map to act as a table of contents for the agent.

Model Context Protocol (MCP) servers allow agents to reach systems they would otherwise be blocked from. Subagents improve performance by offloading tasks and preventing the main agent's context from overflowing. Final best practices include segmenting tests, utilizing .agentignore files, and regularly reviewing the harness setup to ensure it remains compatible with newer, more intelligent models.

Community Posts

No posts yet. Be the first to write about this video!

Write about this video