00:00:00The main problem with AI agents is the limited context window,
00:00:03which restricts what they remember from previous actions.
00:00:06When we give Claude code a larger task,
00:00:08it compacts multiple times while attempting a single feature,
00:00:11forgetting the main task it was asked to implement,
00:00:14making it less effective for long running tasks.
00:00:17Anthropic just released a solution that is based on how real teams work
00:00:20in an actual engineering environment.
00:00:22They identified two key reasons for why it fails on long tasks.
00:00:26Many of us have tried to one shot entire applications
00:00:29or some big features,
00:00:30and doing too much causes the model to run out of its context.
00:00:34After repeated compaction,
00:00:35the context window is refreshed with the feature only half implemented
00:00:39with no memory of the feature's progress,
00:00:41and it leads to incomplete implementation.
00:00:43The second issue is that, due to less testing capabilities,
00:00:46Claude marks untested features as completed.
00:00:49It assumes the feature is complete, even if it doesn't actually work properly.
00:00:53Their solution was using an initializing agent and coding agent in Harmony,
00:00:57inspired by how real software teams work.
00:00:59This workflow is originally meant for agents you build yourself,
00:01:02but I realized it could apply to Claude code instances as well.
00:01:06The first agent focuses on properly initializing your coding agent,
00:01:09and you have to be patient here because it takes a little time.
00:01:12I have an empty Next.js project and I want to build an online Python compiler.
00:01:16Before starting, create a Claude.md file using the init command.
00:01:20This file is a document for your codebase and is at the root of your project,
00:01:24containing an overview and all important information.
00:01:27Next, generate the feature list JSON in the project root.
00:01:30It should list all features and their corresponding testing steps as well,
00:01:33with all tests marked as initially failing, so Claude is forced to test them.
00:01:38We use JSON instead of Markdown
00:01:40because JSON files are easier to manage in the context.
00:01:43Since Claude can only test the code, not the interface we see on the browser,
00:01:46I connected Puppeteer for browser testing.
00:01:49After that, create an init script to guide starting the dev server
00:01:52and a progress tracking file so the system is able to keep track of the project completion status.
00:01:57For guidelines, Claude needs to update progress.md after each run
00:02:02and test each feature after implementation.
00:02:04The most important practice is committing to Git.
00:02:07We underestimate how crucial it is to commit in a mergeable state.
00:02:10Git commits with clear logs show what's completed
00:02:13and let you revert if implementation fails.
00:02:15Finally, Claude should not change the features list
00:02:18beyond marking features as implemented.
00:02:20With the environment ready, we move to the coding part.
00:02:23The idea was to implement each feature one by one from the features JSON.
00:02:27Claude also made descriptive commit messages after each tested feature
00:02:31and also launched the browser when needed.
00:02:33Once it verified the app was working,
00:02:35it updated the JSON fields from false to true
00:02:37and updated progress.md with what had been completed so far.
00:02:41Finally, it committed the changes and verified the commit was successful.
00:02:45The advantage of this incremental approach is that even if the session terminates,
00:02:49you can resume exactly where you left off.
00:02:51Everything is tracked in the Git logs,
00:02:53so you don't have to worry about breaking code.
00:02:55Claude can understand the project from the Git logs and progress file,
00:02:59not from the code itself, so you can resume the session easily.
00:03:02Your next prompt is simply to implement the next feature marked "Not done".
00:03:06This approach also reduces Claude's tendency
00:03:09to mark features complete without proper testing.
00:03:11Each iteration ensures the app is built end-to-end with real testing,
00:03:15helping identify bugs that are not obvious from code alone.
00:03:19We repeat this cycle until all features are marked true.
00:03:22You might think this is similar to the BMAD method.
00:03:24It shares similarities, but I think Claude's workflow is better in some ways.
00:03:28It was easier since you didn't call agents separately,
00:03:31and context utilization was better too.
00:03:33After implementing so many features,
00:03:35it only used 84% of context,
00:03:37where BMAD would have already hit compact twice
00:03:40because of the large stories that it makes.
00:03:42That said, BMAD is still an out-of-the-box full system
00:03:45while this is still an idea that needs to be implemented.
00:03:48But BMAD could use some things from this, such as the Git system.
00:03:51After teaching millions of people how to build with AI,
00:03:54we started implementing these workflows ourselves.
00:03:57We discovered we could build better products faster than ever before.
00:04:00We helped bring your ideas to life, whether it's apps or websites.
00:04:04Maybe you've watched our videos thinking,
00:04:06"I have a great idea, but I don't have a tech team to build it."
00:04:08That's exactly where we come in.
00:04:10Think of us as your technical co-pilot.
00:04:12We apply the same workflows we've taught millions directly to your project,
00:04:17turning concepts into real, working solutions
00:04:19without the headaches of hiring or managing a dev team.
00:04:22Ready to accelerate your idea into reality?
00:04:25Reach out at hello@autometer.dev
00:04:27That brings us to the end of this video.
00:04:29If you'd like to support the channel and help us keep making videos like this,
00:04:33you can do so by using the super thanks button below.
00:04:36As always, thank you for watching, and I'll see you in the next one.