Claude's Best Release Yet + 10 Tricks That Gave Me an Unfair Advantage

AAI LABS
컴퓨터/소프트웨어창업/스타트업AI/미래기술

Transcript

00:00:00Even though cloud code is one of the most powerful tools for AI development,
00:00:03why does it fall apart on certain tasks? And between the features, Anthropic has been
00:00:08dropping recently and the workflows we've been building around it, the way you're supposed to
00:00:12use this thing looks completely different from a few weeks ago. Our team has been using cloud
00:00:16code every day and it's not just for development but also for research, managing our production
00:00:21pipeline and automating tasks that have nothing to do with code. So let me just show you everything
00:00:26that we've figured out. Anthropic recently added the insights command for cloud code. It analyzes
00:00:31all your past cloud code sessions over a certain time period and generates a report. The report
00:00:36analyzes your working style, roasts your working patterns, highlights what you were doing right and
00:00:40what you weren't and tells you how to improve. The main thing we were interested in was identifying
00:00:45where things went wrong because that's where we can learn to improve ourselves. The report highlighted
00:00:49the areas where we had the most friction and also suggested features we could add to make the
00:00:54workflow better. For example, we remember a session where the main agent repeatedly pulled the task
00:00:58list for a long time when we were using agent teams. It caused the session to take too long and
00:01:03we had to end it ourselves. To prevent this from happening in the future, we can copy this prompt
00:01:07into cloud.md so that whenever we're using cloud code with multi-agents, cloud doesn't poll
00:01:12indefinitely and acts upon it. We can import these tips into our projects for future workflows so that
00:01:17our experience with cloud code gets better over time. Our team has spent a lot of time working
00:01:22with cloud code and the most important step is still how well you give context to the agent.
00:01:26This can be project requirements broken down into sub-parts or documentation of the frameworks and
00:01:30libraries you're using because when you give it the right context, errors basically drop to zero
00:01:35because it knows what to act upon. For project documentation, we prefer using cloud to write it
00:01:39rather than doing it ourselves. We gave cloud a specific prompt that contained all the information
00:01:44needed to break down the project's idea into the required documents. We asked it to create
00:01:48four documents each focused on the specific aspect of the app. The most important one is the PRD
00:01:53which contains information about the project requirements and scope. Then there's architecture.md
00:01:57which has data formatting, file structure, APIs and all the architecture details written out.
00:02:02Then decision.md which contains all the decisions cloud made during the creation of this project
00:02:08acting as a reference for future use. And then the most important one is feature.json which contains
00:02:12all the features in a specific JSON format. It has all the details about each feature in a token
00:02:17efficient way and contains criteria for what makes a feature complete along with a passes key for
00:02:22keeping track of what's been implemented and what hasn't. Now that your large task is split into
00:02:27smaller sections, we need to provide documentation on what tools it needs for implementation through
00:02:31the Context 7 MCP. It has documentation for all the libraries and frameworks and gets updated
00:02:36frequently so that agents can pull the latest docs and fill the gap between what the model knows and
00:02:41what actually the current update. Setting up the MCP only takes a few steps. Once installed, it
00:02:46used the tools from Context 7 and fetched the library information directly. This lets it use
00:02:50the latest documentation, prevent code errors caused by dependency mismatches and get a more
00:02:55accurate implementation. Now hooks are another underutilized feature. The hooks in Claude code
00:03:00are shell commands that fire at specific points in the lifecycle. There are many types that trigger at
00:03:05certain times like session start before any tool is used or after a tool is used. But the most important
00:03:11part is setting them up with specific exit codes. The exit codes tell Claude code whether to proceed,
00:03:16block or ignore an action. An exit code of 0 means success. An exit code 2 means a blocking error. So
00:03:22whenever Claude tries to do something it shouldn't, it hits exit code 2, it gets an error message back
00:03:27and can correct itself. Any exit code other than those two is non-blocking, shown in verbose mode,
00:03:32and execution continues. This exit code 2 is important because using it, you can control the
00:03:37agent's behavior. If you've ever worked with test-driven development using Claude code,
00:03:41you might have noticed that it tends to modify the tests if it fails to meet them. To prevent
00:03:46that, we set up a custom hook that triggers on pre-tool use. The hook protects the test scripts
00:03:50from modification. If the path it's trying to work on is a test directory or contains the word test,
00:03:55it shows an error message saying modifications to test folders are not allowed and returns
00:04:00exit code 2. With this hook in place, when we gave Claude a prompt to run the tests and the tests
00:04:05failed, it tried to modify the test files. But the script blocked it and a blocked from modification
00:04:10message appeared. This stopped Claude from editing files it shouldn't be editing. So if you've worked
00:04:15with MCPs, you know they bloat the context window. And when you're working on a large-scale project,
00:04:19the number of connected MCPs increases. So all the MCP tools end up living in the context window
00:04:25and it gets bloated. For this exact purpose, Claude code has an experimental MCP CLI mode that solves
00:04:31this. We set the experimental MCP CLI flag to true. Once we set it, all the MCPs that were showing up
00:04:36in the context disappeared and no context window was taken up by the MCP tools. The question was
00:04:41how to access the tools if they don't exist in memory anymore. Instead of loading all the
00:04:45tool schemas up front, Claude code uses MCP CLI info and MCP CLI calls and it runs all the connected
00:04:52MCPs through these tools via bash. With the flag set, when we gave it a prompt, instead of calling
00:04:56the MCP tool directly, it called them via MCP CLI tools and ran them as bash commands rather than
00:05:03MCP tools. This way, it only loaded the required tool on demand, preventing the context bloat. Also,
00:05:08if you are enjoying our content, consider pressing the hype button because it helps us create more
00:05:13content like this and reach out to more people. Now in our previous videos, we have stressed using
00:05:18git to have all the agents work tracked in version control. You can also revert back if the agents
00:05:23don't implement things correctly. We also covered a video where we used git to run an agent on a long
00:05:28horizon task which you can check out on the channel. We used parallel agents to work on different work
00:05:32trees so they could create all of the project's features while staying isolated from each other.
00:05:37This way, we could merge their output together later without interference because agents working
00:05:41on the same files cause conflicts. Branches aren't preferred because they cause conflicts. Agents have
00:05:46difficulty checking out different branches since branches share the same working directory but work
00:05:50trees don't. So we gave it a prompt where we provided multiple features that needed to be
00:05:55implemented and specified that each agent should work on a separate work tree. It used a separate
00:05:59agent for each work tree and implemented the features in isolation even though their task
00:06:03descriptions overlapped at certain points. After Claude implemented all the features correctly in
00:06:08separate branches, we had it merge the output so we could get all the features in a single working
00:06:13directory. Now strict mode is essential for shifting the burden of error checking to the agent. This is
00:06:18something you should be setting up for whatever language you're using because it catches bugs
00:06:22when you build instead of when users hit them at runtime. Since our primary language is TypeScript,
00:06:26we always set strict mode to true in our projects. This turns on checks for null values and implicit
00:06:31types, enforces strict typing and null checks and overall means fewer runtime errors. This matters
00:06:36for AI agents because they don't have a built-in way to catch runtime errors. Strict mode minimizes
00:06:41the chance of runtime failures and makes sure the compiler handles these issues instead. Agents can
00:06:46rely on error logs in the terminal to apply known fixes. Instead of letting the project be tested
00:06:51only by scripts, there's an additional layer of testing worth adding. You write user stories that
00:06:56describe how the user interacts with the system in order to guide the testing process once the
00:07:00app is built. We actually define the user stories before implementing our projects because this sets
00:07:05a standard that the implementation should follow. Using a prompt, Claude wrote multiple stories
00:07:10inside a folder containing all the possible ways a user can interact with the system. Each story
00:07:15features a specific aspect of the app, its priority and the acceptance criteria for the agent to test
00:07:21against. The user stories covered all possible test scenarios including best case and edge cases. These
00:07:26stories basically tell the agents how to interact with the system we just built and with the right
00:07:31instructions on how to interact with the system. Any agent can apply the same principles to the
00:07:35app it's building and meet user expectations better. With the stories documented, we asked
00:07:40Claude to implement them one by one and prompted it to start with the optimal path listed in each
00:07:45story making sure all edge cases were covered. This way the implementation had fewer gaps and
00:07:50better user satisfaction overall. Now all the tips we have been talking about are available in the
00:07:55form of ready to use templates in AI Labs Pro. For those who don't know it's our recently launched
00:08:00community where you get ready to use templates, prompts, all the commands and skills that you can
00:08:05plug directly into your projects for this video and all previous ones. If you found value in what
00:08:10we do and want to support the channel this is the best way to do it. Links in the description.
00:08:14So we need to make use of parallelization as much as we can because this is how the agent speeds up
00:08:20its workflow and implements things that don't need to wait on each other. We know Claude automatically
00:08:25detects whether a task can run in parallel or sequentially and decides on its own but it
00:08:29doesn't hurt to create agents ourselves. We also covered these agent capabilities in our previous
00:08:34video where we talked about how you can use agents to make your workflow faster but this speed comes
00:08:39at the cost of increased token usage. Still parallelization effort is worth it. At one point
00:08:43we were working on research for the impact of opus 4.6's improvement using the same model and it kept
00:08:49hallucinating facts even though we provided sources. It kept writing incorrect information and we had to
00:08:54correct it repeatedly. Making this research felt pointless because we had to keep fixing things
00:08:58ourselves. To prevent this from happening again we used parallel agents. We set up a research task
00:09:03where we wanted to compare the agent swarm capabilities of KimiK 2.5 and Claude's agent swarm.
00:09:09We used two agents one to do the research and another to fact check the research agent. The key
00:09:14idea was to have both agents communicate with each other to make sure the findings were accurate so we
00:09:19wouldn't have to do that ourselves. In this setup one agent does the task while the other critically
00:09:24analyzes it giving them an adversarial way of working. The research agent started first and the
00:09:28fact checker was blocked until the research agent produced the first draft. Once the first draft was
00:09:33done the fact checker started verifying it. It immediately identified many inaccuracies in the
00:09:38data that the research agent had listed and we no longer had to catch them manually. Both agents kept
00:09:43communicating with each other and kept the fact checking process tight. One agent dedicated to
00:09:47calling out the other on wrong information. There are many tasks you can run in an adversarial setup
00:09:52like this. Not just research but development work too where one agent implements a feature and
00:09:57another reviews the implementation against the plan. According to the words of Claude Code's creator,
00:10:02the agent works better if it has some way to verify its own work. The core idea here is giving the
00:10:07agent eyes meaning the ability to check whether the implemented feature is correct and meets
00:10:12expectations. Because these agents are terminal based they can't identify issues that happen at
00:10:17runtime especially on the client side. We use multiple ways to verify the agent's work. The
00:10:21first is the Claude Chrome extension which provides browser-centric tools like DOM capturing, console
00:10:26log checking and more. Another tool is the Puppeteer MCP. This one is useful because it runs in a
00:10:31separate browser that doesn't contain your existing sessions unlike Claude's Chrome extension. It's
00:10:36isolated and doesn't interfere with any of your current sessions so you get an extra layer of
00:10:41privacy. But our preferred option is Versil's agent browser. This isn't an MCP but a CLI tool that
00:10:46gives agents browser testing capabilities. It has tools for navigation, capturing screenshots and
00:10:51more. Unlike the other tools it doesn't navigate based just on screenshots. Instead it uses the
00:10:56accessibility tree where each element has a unique reference. This compacts the full DOM from thousands
00:11:01of tokens down to around 200 to 400 tokens so it's way more context efficient. That was the main issue
00:11:07with the Claude Chrome extension which the agent browser solved. It loads the entire DOM into the
00:11:12context window and exhausts it quickly. We also added instructions in Claude.md to have Claude
00:11:17rely on agent browser first before falling back to MCP based testing. So Claude uses agent browser as
00:11:23the primary verification method. But there's another angle here. Testing is always important but there's
00:11:28a way to reduce errors that doesn't involve tests or code reviews. We ask Claude to predict things
00:11:33that haven't happened yet. We ask Claude to check the implementation and identify areas where the
00:11:38app could fail. This works because we're giving Claude a chance to predict potential issues by
00:11:43pattern matching against failures that already existed in other apps even if we haven't hit them
00:11:47ourselves through testing yet. It pushes Claude to look at the code from a different angle than
00:11:52before. When we asked it to do so it identified critical gaps that passed even our multi-layer
00:11:57testing process and found 18 issues that could have been harmful in production. But our testing
00:12:01processes didn't catch them. They could only be identified when we pushed Claude to look at the
00:12:06project from another angle. That brings us to the end of this video. If you'd like to support the
00:12:10channel and help us keep making videos like this you can do so by using the super thanks button
00:12:15below. As always thank you for watching and I'll see you in the next one.

Key Takeaway

Mastering Claude Code requires moving beyond basic prompts to a structured ecosystem of multi-agent workflows, specialized documentation, and automated guardrails that ensure accuracy and efficiency.

Highlights

Introduction of the new Insights command for Claude Code that roasts and analyzes working patterns to suggest workflow improvements.

The importance of structured project documentation like PRDs, architecture files, and feature.json to minimize AI agent errors.

Utilizing Model Context Protocol (MCP) and the experimental MCP CLI mode to prevent context window bloat and fetch real-time documentation.

Implementing lifecycle hooks with exit codes (specifically code 2) to block agents from modifying critical files like test scripts.

Adopting parallel agents and git worktrees to handle overlapping features in isolation without merge conflicts.

Using adversarial agent setups where one AI researches while another fact-checks to eliminate hallucinations and manual verification.

Advanced testing strategies using user stories and the Vercel agent browser for context-efficient, accessibility-tree-based verification.

Timeline

Optimizing Workflows with Insights and Documentation

The speaker introduces the evolution of Claude Code and how modern workflows differ significantly from just a few weeks ago. A core new feature is the 'insights' command, which analyzes past sessions to identify friction points and suggests specific improvements to the user's working style. By reviewing these reports, the team discovered a persistent polling issue with multi-agent teams that they solved using prompt configurations in a cloud.md file. This section emphasizes that the most critical step in AI development remains the quality of context provided to the agent. The overarching goal is to create a self-improving development environment where the AI learns from its past mistakes and the user's feedback loops.

Structured Files and Real-Time Documentation via MCP

To reduce errors to nearly zero, the team utilizes Claude to generate four essential project documents: a PRD, an architecture file, a decision log, and a feature.json. The feature.json is particularly notable as it uses a token-efficient format to track implementation progress and completion criteria. To bridge the gap between static training data and current software updates, they integrate the Context 7 MCP for real-time library documentation. Setting up this MCP allows the agent to fetch the latest framework details directly, preventing common errors caused by version mismatches. This systematic approach ensures the agent has a clear roadmap and the most accurate technical information available before it begins coding.

Controlling Agents with Hooks and MCP CLI Mode

The video delves into advanced technical controls like lifecycle hooks which use shell commands to trigger actions during a session. By using specific exit codes like 'code 2', developers can create blocking errors that prevent the agent from performing unauthorized actions, such as modifying test scripts during failed test runs. This section also introduces the experimental MCP CLI mode, which is a game-changer for managing large projects with multiple connected tools. Instead of bloating the context window with tool schemas, this mode loads tools on-demand via bash commands. These features collectively offer a higher degree of safety and resource management, allowing for complex development without exceeding token limits or losing control over the AI's behavior.

Parallelization, Git Worktrees, and Strict Mode

Managing multiple AI agents requires a sophisticated version control strategy to avoid the conflicts inherent in standard branching. The speaker recommends using git worktrees, which allow parallel agents to work in isolated directories on the same project without interfering with each other's files. Once features are completed in isolation, they are merged into the main directory to consolidate the output. Additionally, the video highlights the necessity of 'strict mode' in languages like TypeScript to catch null values and implicit types during the build process. This shifts the burden of error checking to the compiler, which provides clear logs that the AI can then use to self-correct runtime issues before they reach a user.

Testing Strategies and Adversarial Multi-Agent Setups

Testing is expanded beyond simple scripts to include detailed user stories that define how a human interacts with the system. These stories include acceptance criteria and edge cases, serving as a standard that the AI implementation must follow to meet user expectations. To combat the issue of AI hallucinations during research or development, the team employs an adversarial setup where two agents work in tandem. One agent performs the primary task while the second agent acts as a dedicated fact-checker, blocking the workflow if inaccuracies are detected. This collaborative yet critical dynamic significantly improves the reliability of the output and reduces the need for manual oversight by the human developer.

Advanced Verification and Predictive Failure Analysis

The final section covers how to give agents 'eyes' to verify runtime and client-side issues using browser-based tools. While the Claude Chrome extension and Puppeteer MCP are options, the speaker highly recommends the Vercel agent browser for its efficiency in using accessibility trees rather than full DOM snapshots. This method reduces token usage from thousands to just a few hundred while providing a precise reference for every UI element. Beyond reactive testing, the video suggests asking Claude to perform predictive analysis to identify potential failure points before they occur. This technique successfully identified 18 critical gaps in their project that traditional multi-layer testing had missed, proving that changing the AI's perspective is a powerful tool for quality assurance.

Community Posts

View all posts