00:00:00Even though cloud code is one of the most powerful tools for AI development,
00:00:03why does it fall apart on certain tasks? And between the features, Anthropic has been
00:00:08dropping recently and the workflows we've been building around it, the way you're supposed to
00:00:12use this thing looks completely different from a few weeks ago. Our team has been using cloud
00:00:16code every day and it's not just for development but also for research, managing our production
00:00:21pipeline and automating tasks that have nothing to do with code. So let me just show you everything
00:00:26that we've figured out. Anthropic recently added the insights command for cloud code. It analyzes
00:00:31all your past cloud code sessions over a certain time period and generates a report. The report
00:00:36analyzes your working style, roasts your working patterns, highlights what you were doing right and
00:00:40what you weren't and tells you how to improve. The main thing we were interested in was identifying
00:00:45where things went wrong because that's where we can learn to improve ourselves. The report highlighted
00:00:49the areas where we had the most friction and also suggested features we could add to make the
00:00:54workflow better. For example, we remember a session where the main agent repeatedly pulled the task
00:00:58list for a long time when we were using agent teams. It caused the session to take too long and
00:01:03we had to end it ourselves. To prevent this from happening in the future, we can copy this prompt
00:01:07into cloud.md so that whenever we're using cloud code with multi-agents, cloud doesn't poll
00:01:12indefinitely and acts upon it. We can import these tips into our projects for future workflows so that
00:01:17our experience with cloud code gets better over time. Our team has spent a lot of time working
00:01:22with cloud code and the most important step is still how well you give context to the agent.
00:01:26This can be project requirements broken down into sub-parts or documentation of the frameworks and
00:01:30libraries you're using because when you give it the right context, errors basically drop to zero
00:01:35because it knows what to act upon. For project documentation, we prefer using cloud to write it
00:01:39rather than doing it ourselves. We gave cloud a specific prompt that contained all the information
00:01:44needed to break down the project's idea into the required documents. We asked it to create
00:01:48four documents each focused on the specific aspect of the app. The most important one is the PRD
00:01:53which contains information about the project requirements and scope. Then there's architecture.md
00:01:57which has data formatting, file structure, APIs and all the architecture details written out.
00:02:02Then decision.md which contains all the decisions cloud made during the creation of this project
00:02:08acting as a reference for future use. And then the most important one is feature.json which contains
00:02:12all the features in a specific JSON format. It has all the details about each feature in a token
00:02:17efficient way and contains criteria for what makes a feature complete along with a passes key for
00:02:22keeping track of what's been implemented and what hasn't. Now that your large task is split into
00:02:27smaller sections, we need to provide documentation on what tools it needs for implementation through
00:02:31the Context 7 MCP. It has documentation for all the libraries and frameworks and gets updated
00:02:36frequently so that agents can pull the latest docs and fill the gap between what the model knows and
00:02:41what actually the current update. Setting up the MCP only takes a few steps. Once installed, it
00:02:46used the tools from Context 7 and fetched the library information directly. This lets it use
00:02:50the latest documentation, prevent code errors caused by dependency mismatches and get a more
00:02:55accurate implementation. Now hooks are another underutilized feature. The hooks in Claude code
00:03:00are shell commands that fire at specific points in the lifecycle. There are many types that trigger at
00:03:05certain times like session start before any tool is used or after a tool is used. But the most important
00:03:11part is setting them up with specific exit codes. The exit codes tell Claude code whether to proceed,
00:03:16block or ignore an action. An exit code of 0 means success. An exit code 2 means a blocking error. So
00:03:22whenever Claude tries to do something it shouldn't, it hits exit code 2, it gets an error message back
00:03:27and can correct itself. Any exit code other than those two is non-blocking, shown in verbose mode,
00:03:32and execution continues. This exit code 2 is important because using it, you can control the
00:03:37agent's behavior. If you've ever worked with test-driven development using Claude code,
00:03:41you might have noticed that it tends to modify the tests if it fails to meet them. To prevent
00:03:46that, we set up a custom hook that triggers on pre-tool use. The hook protects the test scripts
00:03:50from modification. If the path it's trying to work on is a test directory or contains the word test,
00:03:55it shows an error message saying modifications to test folders are not allowed and returns
00:04:00exit code 2. With this hook in place, when we gave Claude a prompt to run the tests and the tests
00:04:05failed, it tried to modify the test files. But the script blocked it and a blocked from modification
00:04:10message appeared. This stopped Claude from editing files it shouldn't be editing. So if you've worked
00:04:15with MCPs, you know they bloat the context window. And when you're working on a large-scale project,
00:04:19the number of connected MCPs increases. So all the MCP tools end up living in the context window
00:04:25and it gets bloated. For this exact purpose, Claude code has an experimental MCP CLI mode that solves
00:04:31this. We set the experimental MCP CLI flag to true. Once we set it, all the MCPs that were showing up
00:04:36in the context disappeared and no context window was taken up by the MCP tools. The question was
00:04:41how to access the tools if they don't exist in memory anymore. Instead of loading all the
00:04:45tool schemas up front, Claude code uses MCP CLI info and MCP CLI calls and it runs all the connected
00:04:52MCPs through these tools via bash. With the flag set, when we gave it a prompt, instead of calling
00:04:56the MCP tool directly, it called them via MCP CLI tools and ran them as bash commands rather than
00:05:03MCP tools. This way, it only loaded the required tool on demand, preventing the context bloat. Also,
00:05:08if you are enjoying our content, consider pressing the hype button because it helps us create more
00:05:13content like this and reach out to more people. Now in our previous videos, we have stressed using
00:05:18git to have all the agents work tracked in version control. You can also revert back if the agents
00:05:23don't implement things correctly. We also covered a video where we used git to run an agent on a long
00:05:28horizon task which you can check out on the channel. We used parallel agents to work on different work
00:05:32trees so they could create all of the project's features while staying isolated from each other.
00:05:37This way, we could merge their output together later without interference because agents working
00:05:41on the same files cause conflicts. Branches aren't preferred because they cause conflicts. Agents have
00:05:46difficulty checking out different branches since branches share the same working directory but work
00:05:50trees don't. So we gave it a prompt where we provided multiple features that needed to be
00:05:55implemented and specified that each agent should work on a separate work tree. It used a separate
00:05:59agent for each work tree and implemented the features in isolation even though their task
00:06:03descriptions overlapped at certain points. After Claude implemented all the features correctly in
00:06:08separate branches, we had it merge the output so we could get all the features in a single working
00:06:13directory. Now strict mode is essential for shifting the burden of error checking to the agent. This is
00:06:18something you should be setting up for whatever language you're using because it catches bugs
00:06:22when you build instead of when users hit them at runtime. Since our primary language is TypeScript,
00:06:26we always set strict mode to true in our projects. This turns on checks for null values and implicit
00:06:31types, enforces strict typing and null checks and overall means fewer runtime errors. This matters
00:06:36for AI agents because they don't have a built-in way to catch runtime errors. Strict mode minimizes
00:06:41the chance of runtime failures and makes sure the compiler handles these issues instead. Agents can
00:06:46rely on error logs in the terminal to apply known fixes. Instead of letting the project be tested
00:06:51only by scripts, there's an additional layer of testing worth adding. You write user stories that
00:06:56describe how the user interacts with the system in order to guide the testing process once the
00:07:00app is built. We actually define the user stories before implementing our projects because this sets
00:07:05a standard that the implementation should follow. Using a prompt, Claude wrote multiple stories
00:07:10inside a folder containing all the possible ways a user can interact with the system. Each story
00:07:15features a specific aspect of the app, its priority and the acceptance criteria for the agent to test
00:07:21against. The user stories covered all possible test scenarios including best case and edge cases. These
00:07:26stories basically tell the agents how to interact with the system we just built and with the right
00:07:31instructions on how to interact with the system. Any agent can apply the same principles to the
00:07:35app it's building and meet user expectations better. With the stories documented, we asked
00:07:40Claude to implement them one by one and prompted it to start with the optimal path listed in each
00:07:45story making sure all edge cases were covered. This way the implementation had fewer gaps and
00:07:50better user satisfaction overall. Now all the tips we have been talking about are available in the
00:07:55form of ready to use templates in AI Labs Pro. For those who don't know it's our recently launched
00:08:00community where you get ready to use templates, prompts, all the commands and skills that you can
00:08:05plug directly into your projects for this video and all previous ones. If you found value in what
00:08:10we do and want to support the channel this is the best way to do it. Links in the description.
00:08:14So we need to make use of parallelization as much as we can because this is how the agent speeds up
00:08:20its workflow and implements things that don't need to wait on each other. We know Claude automatically
00:08:25detects whether a task can run in parallel or sequentially and decides on its own but it
00:08:29doesn't hurt to create agents ourselves. We also covered these agent capabilities in our previous
00:08:34video where we talked about how you can use agents to make your workflow faster but this speed comes
00:08:39at the cost of increased token usage. Still parallelization effort is worth it. At one point
00:08:43we were working on research for the impact of opus 4.6's improvement using the same model and it kept
00:08:49hallucinating facts even though we provided sources. It kept writing incorrect information and we had to
00:08:54correct it repeatedly. Making this research felt pointless because we had to keep fixing things
00:08:58ourselves. To prevent this from happening again we used parallel agents. We set up a research task
00:09:03where we wanted to compare the agent swarm capabilities of KimiK 2.5 and Claude's agent swarm.
00:09:09We used two agents one to do the research and another to fact check the research agent. The key
00:09:14idea was to have both agents communicate with each other to make sure the findings were accurate so we
00:09:19wouldn't have to do that ourselves. In this setup one agent does the task while the other critically
00:09:24analyzes it giving them an adversarial way of working. The research agent started first and the
00:09:28fact checker was blocked until the research agent produced the first draft. Once the first draft was
00:09:33done the fact checker started verifying it. It immediately identified many inaccuracies in the
00:09:38data that the research agent had listed and we no longer had to catch them manually. Both agents kept
00:09:43communicating with each other and kept the fact checking process tight. One agent dedicated to
00:09:47calling out the other on wrong information. There are many tasks you can run in an adversarial setup
00:09:52like this. Not just research but development work too where one agent implements a feature and
00:09:57another reviews the implementation against the plan. According to the words of Claude Code's creator,
00:10:02the agent works better if it has some way to verify its own work. The core idea here is giving the
00:10:07agent eyes meaning the ability to check whether the implemented feature is correct and meets
00:10:12expectations. Because these agents are terminal based they can't identify issues that happen at
00:10:17runtime especially on the client side. We use multiple ways to verify the agent's work. The
00:10:21first is the Claude Chrome extension which provides browser-centric tools like DOM capturing, console
00:10:26log checking and more. Another tool is the Puppeteer MCP. This one is useful because it runs in a
00:10:31separate browser that doesn't contain your existing sessions unlike Claude's Chrome extension. It's
00:10:36isolated and doesn't interfere with any of your current sessions so you get an extra layer of
00:10:41privacy. But our preferred option is Versil's agent browser. This isn't an MCP but a CLI tool that
00:10:46gives agents browser testing capabilities. It has tools for navigation, capturing screenshots and
00:10:51more. Unlike the other tools it doesn't navigate based just on screenshots. Instead it uses the
00:10:56accessibility tree where each element has a unique reference. This compacts the full DOM from thousands
00:11:01of tokens down to around 200 to 400 tokens so it's way more context efficient. That was the main issue
00:11:07with the Claude Chrome extension which the agent browser solved. It loads the entire DOM into the
00:11:12context window and exhausts it quickly. We also added instructions in Claude.md to have Claude
00:11:17rely on agent browser first before falling back to MCP based testing. So Claude uses agent browser as
00:11:23the primary verification method. But there's another angle here. Testing is always important but there's
00:11:28a way to reduce errors that doesn't involve tests or code reviews. We ask Claude to predict things
00:11:33that haven't happened yet. We ask Claude to check the implementation and identify areas where the
00:11:38app could fail. This works because we're giving Claude a chance to predict potential issues by
00:11:43pattern matching against failures that already existed in other apps even if we haven't hit them
00:11:47ourselves through testing yet. It pushes Claude to look at the code from a different angle than
00:11:52before. When we asked it to do so it identified critical gaps that passed even our multi-layer
00:11:57testing process and found 18 issues that could have been harmful in production. But our testing
00:12:01processes didn't catch them. They could only be identified when we pushed Claude to look at the
00:12:06project from another angle. That brings us to the end of this video. If you'd like to support the
00:12:10channel and help us keep making videos like this you can do so by using the super thanks button
00:12:15below. As always thank you for watching and I'll see you in the next one.