Claude Just Leveled Up Their Agents Game

AAI LABS
Computing/SoftwareManagementInternet Technology

Transcript

00:00:00Was Opus 4.6 the only upgrade from Anthropic?
00:00:03You already know about sub-agents, where each agent operates as an individual entity with
00:00:07its own context window.
00:00:09But these sub-agents failed when there was a task that required coordination between them.
00:00:13In those cases, the orchestrator had to step in, taking responses from one agent and delegating
00:00:17them to another, or the agents had to rely on notes in the project folder.
00:00:21Because of this communication gap, simple tasks would become too overcomplicated.
00:00:25To deal with this, Anthropic released a new upgrade to sub-agents and named them Agent-Teams.
00:00:30They've been launched alongside Opus 4.6.
00:00:33Although this is still an experimental feature, we've implemented it in multiple workflows,
00:00:37and the greatest improvement was that the time these tasks took was greatly reduced.
00:00:41But it is experimental for a reason and still has some rough edges, and we found little
00:00:44fixes for those problems as well.
00:00:47Agent-Teams is the idea of having multiple ClodCode instances working together.
00:00:51Each member of the team works on isolated tasks and has centralized management controlled
00:00:55by one agent.
00:00:56Now, you might think this sounds really similar to the already existing Clod sub-agents because
00:01:00both run in parallel and split up tasks, but they're not the same.
00:01:03This is because Agent-Teams solved the one problem the sub-agent framework has.
00:01:08Sub-agents are not able to communicate with each other and have to rely on the orchestrator
00:01:12agent to act as a medium of communication for them.
00:01:15Team members, on the other hand, are able to communicate with each other.
00:01:18The core idea behind Agent-Teams is having multiple ClodCode sessions working together.
00:01:22One session acts as a team leader, coordinating work, assigning tasks, and synthesizing results,
00:01:27while the teammates work independently in their own context windows.
00:01:31Sub-agents have their own context window, and they report the result back to the caller.
00:01:34But for teams, it works differently.
00:01:36Each member of the agent team is a fully independent terminal session.
00:01:40They're not restricted or coordinated by an orchestrator that just divides tasks.
00:01:43Instead, these terminal sessions are opened and closed by the main team lead.
00:01:47They are able to work across tasks that require discussion and collaboration between agents
00:01:52because of their ability to communicate.
00:01:54So an agent team essentially consists of a team lead and teammates.
00:01:57The team lead is the main agent that creates the team and coordinates their work.
00:02:01The teammates are the workers who actually perform the tasks.
00:02:03Each teammate receives a task list, which is a shared list of items.
00:02:07Each member identifies what it needs to do from this list and executes it.
00:02:10To communicate, they also have a shared mailbox that allows them to send messages to each other.
00:02:15Now the question was how this actually works if each team member is independent.
00:02:19How do they know what the other members are doing?
00:02:21This works because all the information regarding the team, the members, and the tasks each member
00:02:26is working on is stored locally in the .claud folder and identified by the task name.
00:02:30This feature is still experimental and disabled by default, so there are going to be some bugs
00:02:34in teammate handling during this phase.
00:02:36In order to try it out, we had to manually enable it.
00:02:38We did this by setting the claud code CLI flag for experimental agent teams to 1.
00:02:43With this CLI flag enabled, agent teams were available for use in further sessions.
00:02:47With this flag enabled, we were able to access the team's feature in claud code.
00:02:51Since this is an experimental feature, we needed to use specific wording that tells
00:02:55claud we want to use the agent team for a certain job.
00:02:58Our team has started using this feature to parallelize code review, letting code issues
00:03:02be identified and fixed at the same time.
00:03:04To do this, we asked claud to use one team member to find issues in the code base and
00:03:08another to fix the issues identified by the first member.
00:03:11We had to be detailed in the prompt to make it follow the right direction.
00:03:15Now, if sub-agents were handling this, they would be writing a report to some physical
00:03:19file to let the other agents know what to fix.
00:03:21But here we wanted to speed up the review process by letting this happen without the overhead
00:03:26of writing to a local file.
00:03:27When we gave the prompt to claud code, the team members spawned, each controlled by the
00:03:31team lead.
00:03:32The lead agent gave the prompt to individual agents, letting them know what task to perform.
00:03:36Now the first code reviewer agent started working, and after analyzing the task, it shared messages
00:03:40with the code fixer bug by bug.
00:03:42This agent was prioritizing critical security issues, and once the code fixer received the
00:03:47messages from the code reviewer, it started implementing the fixes while the code reviewer
00:03:51continued looking for more issues.
00:03:53Similarly, they kept talking to each other and reporting back the changes that were implemented.
00:03:57Once the critical issues were completed, the two agents moved towards fixing the medium
00:04:01priority issues.
00:04:02The code review and code fixing were happening simultaneously, which saved a lot of time.
00:04:06The good thing about this is that you can also assign or modify any task for a team member.
00:04:10With this enabled, you can steer the direction of the work of that specific team member.
00:04:14Once the agents were done working, control was handed back to the main agent, which is
00:04:18responsible for making sure the required changes are implemented correctly and for shutting
00:04:22these agents down gracefully, ensuring their exit does not cause errors later on.
00:04:26You've probably noticed we build a lot in these videos.
00:04:28All the prompts, the code, the templates, you know, the stuff you'd normally have to
00:04:32pause and copy from the screen, it's all in our community, this video, and every video
00:04:36before it too.
00:04:37Links in the description.
00:04:38Scale finding and fixing is a really good thing, but there are often cases where you get issues
00:04:43and just can't figure out what's causing them.
00:04:45In those cases, we can use an agent team to test multiple perspectives of the same app
00:04:49and work progressively toward the bug.
00:04:51This way, team members can communicate their findings to each other and move forward together.
00:04:55We asked Claude to find a bug in the code base and specified using multiple team members,
00:04:59letting them approach the problem from different perspectives.
00:05:02It then spawned four sub-agents, each focused on a different perspective of the same app.
00:05:06They received similar prompts from the team lead and investigated the errors based on
00:05:09their specific aspect of the application, while the main lead waited for them to finish and
00:05:14then analyzed the findings from their research.
00:05:16Without teams, we would have had a single thread, which would have taken much longer.
00:05:19But with these agents, the process was much faster.
00:05:22The investigation completed quickly, and all of the research by the agents was done in approximately
00:05:272 to 3 minutes, which is a significant improvement compared to linear checking, which would have
00:05:31easily taken 5 to 10 minutes.
00:05:33One thing to watch out for is that this approach burns a lot of tokens, because each agent has
00:05:37its own context window, so we need to be careful about that.
00:05:40Once the agents returned their output and were shut down, the team lead also verified the
00:05:45results by checking itself.
00:05:46All four agents converged on the same bug, and they correctly pointed out the issue with
00:05:50a stale closure in the use effect.
00:05:52This exact part was flagged by all four agents.
00:05:54Also, if you are enjoying our content, consider pressing the hype button, because it helps
00:05:59us create more content like this and reach out to more people.
00:06:02This agent framework has changed how we work on long horizon tasks, because with their abilities,
00:06:07agents don't have to rely on documenting their progress only.
00:06:10With agent teams, we can handle different aspects of an application in parallel, and
00:06:14also have a member dedicated to handling research.
00:06:16When we gave Claude the prompt, it spawned 6 agents.
00:06:19Two were working on research and laying the foundations, while the rest were for building
00:06:23the pages.
00:06:24The builder agents were blocked by the agent laying the foundation, because it was responsible
00:06:28for installing required packages and making the environment ready with all the dependencies.
00:06:32Each agent received a specific prompt defining their job.
00:06:35The blocked agents kept waiting for the unblocked signal from the team lead.
00:06:38Once the research and foundations were complete, the remaining agents were unblocked and started
00:06:43implementing their respective parts of the application side by side.
00:06:46They kept communicating with each other for consistency between each component.
00:06:49The team lead kept coordinating with the agents, and once any agent finished, the team lead
00:06:53sent a shutdown message to that agent, handling its exit gracefully.
00:06:57This whole process consumed around 170k tokens of the context window, but in the end, we
00:07:02got the app built exactly as we wanted, all from a single prompt.
00:07:05As we mentioned in the video, when our team was testing this, we came across multiple
00:07:09ways to make agent teams work better for us, and again, these best practices are available
00:07:13in AI Labs Pro, so you can try them out for yourself.
00:07:16The first recommendation is generally applicable to all agents, and not only limited to the
00:07:20agent team feature.
00:07:21You need to explicitly specify the scope of where the agent should be working.
00:07:25You can do this either by defining it in the prompt, specifying which files to look for
00:07:29in order to perform the task, or by creating documents in the project containing individual
00:07:33tasks as we did for our workflow, where we prepared a proper task document for each assignment
00:07:38so that the agent can work independently and within the right scope.
00:07:41Another thing to keep in mind is that each of these agents should be working on independent
00:07:45tasks from each other, because if they are editing the same file at the same time, it
00:07:49would create a conflict and might lead to overwriting the content.
00:07:52Aside from this, there were times when we found that the main agent would get impatient
00:07:56if any agent takes a long time to complete a task and start implementing the task itself
00:08:00instead of letting teammates complete it, so it's important to remind the main agent
00:08:04to wait for teammates to complete before proceeding.
00:08:06You also need to size tasks properly.
00:08:08If you assign tasks that are too small, it creates coordination overhead.
00:08:11If tasks are too large, it increases the risk of wasted effort, so tasks need to be balanced
00:08:16and self-contained.
00:08:17Finally, you need to monitor the agent's work.
00:08:19If any agent is not performing as expected, you can halt its execution and give it new
00:08:23instructions on what it should be doing.
00:08:25Following these practices makes using this experimental feature much more effective.
00:08:29That brings us to the end of this video.
00:08:31If you'd like to support the channel and help us keep making videos like this, you can do
00:08:35so by using the super thanks button below.
00:08:38As always, thank you for watching and I'll see you in the next one.

Key Takeaway

Anthropic's new Agent-Teams feature evolves AI orchestration by allowing multiple Claude instances to communicate and collaborate autonomously in parallel, significantly increasing efficiency for complex coding and research tasks.

Highlights

Anthropic has introduced an experimental feature called "Agent-Teams" alongside the Opus 4.6 model upgrade.

Unlike traditional sub-agents that require an orchestrator to mediate

Timeline

Introduction to Agent-Teams and Opus 4.6

The speaker introduces Agent-Teams as a major upgrade released alongside Anthropic's Opus 4.6 model. While previous sub-agent frameworks existed, they often failed at tasks requiring complex coordination because each agent operated in a silo. This section explains that the communication gap in older models forced the orchestrator to manually move data between agents, overcomplicating simple workflows. The new experimental feature aims to reduce task completion time by allowing multiple Claude instances to work together more naturally. This evolution represents a shift from individual entities to a cohesive, centralized management structure.

Architectural Differences: Sub-Agents vs. Teams

This segment clarifies the technical distinction between standard sub-agents and the new Agent-Teams architecture. Traditional sub-agents report results back to a caller and cannot talk to one another, whereas Team members utilize a shared mailbox and a centralized task list. Each teammate operates as a fully independent terminal session that is opened and closed by a designated Team Lead. The system stores team data locally in a .claud folder, identifying specific tasks by name to keep all members synchronized. Users must manually enable this experimental feature by setting a specific CLI flag for Claude Code.

Case Study: Parallelized Code Review

The speaker demonstrates a practical application of Agent-Teams by parallelizing a code review and bug-fixing process. In this workflow, one agent is tasked with finding security vulnerabilities while a second agent begins fixing them immediately upon receipt of a message. This simultaneous execution removes the overhead of writing intermediate reports to physical files, which was a bottleneck in older frameworks. The Team Lead monitors the progress and ensures that control is handed back gracefully once the agents complete their specific assignments. This real-time collaboration illustrates how agents can steer the direction of a project through active communication.

Bug Hunting and Efficiency Gains

The analysis shifts to a debugging scenario where four agents were deployed to investigate a single application from different perspectives. By approaching the problem from multiple angles simultaneously, the team identified a stale closure bug in a use effect hook in just 2 to 3 minutes. This is a significant improvement over the 5 to 10 minutes required for linear, single-threaded checking. However, the speaker warns that this method burns a high volume of tokens because each agent maintains its own context window. Despite the cost, the convergence of all four agents on the same root cause proved the reliability of the multi-perspective approach.

Building Applications and Long-Horizon Tasks

This section explores how Agent-Teams handle complex, long-horizon tasks such as building an entire application from a single prompt. The example shows a six-agent team where two agents handle research and foundation while four others focus on building specific pages. The system manages dependencies automatically, where builder agents wait for an 'unblocked' signal from the lead once the environment is ready. Throughout the process, which consumed 170k tokens, the agents communicated to ensure consistency across different UI components. This capability allows developers to generate high-fidelity results that align perfectly with their initial requirements.

Best Practices and Performance Optimization

The video concludes with essential recommendations for successfully implementing Agent-Teams in professional workflows. Users should explicitly define the scope for each agent using task documents to prevent overlapping work or file-writing conflicts. A critical tip is to remind the Main Agent to wait for teammates, as it can sometimes become 'impatient' and try to take over their tasks. Task sizing is also vital; tasks must be large enough to justify the coordination overhead but small enough to be self-contained and manageable. Following these guidelines, such as monitoring and halting underperforming agents, makes the experimental feature significantly more effective for users.

Community Posts

View all posts