This Huge Update Changed The Way I Use Claude Code

AAI LABS
Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00No single Claude model is enough on its own. Opus has the reasoning but burns through your
00:00:04limits. Sonnet is fast but hits a wall on harder decisions. And the answer isn't picking one over
00:00:10the other. It's using all of them together. Now Claude code already does this to some extent.
00:00:14It orchestrates between models on its own. But Anthropic just released something that
00:00:18not only saves tokens but also makes smaller models almost as capable as the larger ones.
00:00:23Now when building with Claude you might have noticed this. Whenever you hand Opus a task
00:00:28and it determines that it doesn't need that much effort, it hands it off to Sonnet or Haiku and
00:00:32delegates tasks to the smaller models in order to manage token usage properly. But there's a problem
00:00:37with this approach. As we mentioned in our previous video, Anthropic has been lowering the rate limits
00:00:42so during peak hours your 5 hour window fills up faster. And on top of that, Opus consumes a lot
00:00:47of tokens even on simple tasks which means using Opus means your context limit fills up faster.
00:00:52Anthropic decided to flip the script on this and they came out with something called the
00:00:55Advisor strategy. The way this strategy works is that you give the role of executor to the Sonnet
00:01:00model and use Opus purely as an advisor that only gets consulted when the executor actually needs
00:01:05it. There are two agents involved. The executor is your main agent running on Sonnet and it handles
00:01:10all tool calls, code changes and user facing output. The advisor runs on Opus and its only
00:01:15job is to guide the executor when it gets stuck. The advisor never writes code or makes any changes.
00:01:20When Anthropic experimented with this approach, they found it outperformed Sonnet alone on the
00:01:25SWE bench. They found that this combination outdid Sonnet alone in terms of both performance and cost.
00:01:31And it costs significantly less than running Opus as the main agent because Opus only gets invoked
00:01:36when it actually matters, not for every single iteration. Now you might think that we already
00:01:40have a lot of frameworks for building apps that are better and ready to use so why bother with this
00:01:45setup? The reason is that most existing frameworks are not built with cost and token efficiency in
00:01:50mind. Even though they get the job done, they fall short when it comes to making Claude run longer
00:01:54and more efficiently because they are primarily focused on building the app rather than optimizing
00:01:59for token usage. With this setup, you can build a working app using a weaker model, making the
00:02:04whole process far more token efficient. And that connects back to the limits problem we mentioned
00:02:09earlier. We already made a video on Claude's limits and told you to switch to a smaller model to make
00:02:13it last longer. Here's how it connects. Sonnet consumes way fewer tokens and requires less effort
00:02:19than Opus to perform the same task. Opus is a very large and powerful model so it consumes a lot of
00:02:24tokens even for simple tasks. Sonnet is able to handle many of those tasks more efficiently. So
00:02:30using Opus only to bridge the performance gap on harder decisions is where the real impact comes in.
00:02:35You're only invoking that power when you actually need it, not for every single task. This makes the
00:02:40overall usage more token efficient and lets you get more done within the same limits. We share
00:02:45everything we find on building products with AI on this channel so if you want more videos on that
00:02:50subscribe and keep an eye out for future videos. So we wanted to test how this actually plays out on an
00:02:55app that was already built using Sonnet. To use the strategy inside Claude code we set the advisor
00:03:00command with Opus 4.6 as the advisor model. Our main agent was the executor which I had already
00:03:05set to Sonnet since I built the app using it. The app was supposed to have real-time sync and while
00:03:10moving and resizing elements synced perfectly across sessions deletion wasn't syncing at all. We tried
00:03:16debugging this multiple times with Sonnet on its own but the issue kept persisting no matter how
00:03:20much it tried to fix the issues. So after turning on Opus as the advisor we gave Claude the prompt
00:03:25describing the problem and because Sonnet had already failed multiple times instead of taking
00:03:30another shot on its own it decided to invoke the advisor this time. The advisor reviewed the
00:03:34conversation so far to assess the situation. It provided the exact changes that needed to be made pinpointing
00:03:40where the sync logic was breaking and what specifically needed to be restructured. The executor model took
00:03:45in that advice and it applied those fixes directly without any additional back and forth. We tested it
00:03:50across multiple devices to test the sync and found that the issue was resolved. Both ends were
00:03:55reflecting deletions properly as intended even if the user had selected the item at one end and the
00:04:00other end was being deleted which wasn't the case previously. If we had tried fixing this using Sonnet
00:04:05alone it would have taken more rounds of back and forth prompting because Sonnet inherently is a
00:04:09weaker model and not capable enough to handle complex logic by itself. On the other hand using Opus alone
00:04:15would have consumed far more tokens and likely wouldn't have been this fast. Using Sonnet with Opus
00:04:20as an advisor made the process much more efficient. So overall this strategy helped debug syncing issues
00:04:25much faster than before. But before we move forwards let's have a word by our sponsor Juni by JetBrains.
00:04:30If you're a developer you know the struggle context switching between your terminal IDE and CI pipelines
00:04:36just to get stuff done. Most coding agents lock you into one environment or one specific LLM and
00:04:41call it a day. Juni CLI is different. It's an LLM agnostic coding agent that works everywhere. Your
00:04:47terminal, your IDE, GitHub, CI/CD pipelines, even your task manager. One agent everywhere. Delegate
00:04:54real work to it. Writing tests, building backends, refactoring, automating code reviews on every commit.
00:04:59Right now JetBrains is running a free early access program including $50 in Gemini credits to test the
00:05:04agent plus BYOK support so you can use any model you prefer. Full access to all features, early access
00:05:10to new ones and direct support from the dev team shaping the product. It's simply better with Juni.
00:05:15Click the link in the pinned comment to join for free. Now we wanted to test whether Sonnet actually
00:05:20consults the advisor for major UI changes. We had a previously built application and we wanted to
00:05:25transform its UI to a different library. On top of that we wanted to make multiple UI changes in one
00:05:31go which isn't normally recommended but we wanted to see how the smaller model performs in coordination
00:05:36with the larger one on a bigger task. It first accessed the current UI using the Playwright MCP.
00:05:41Once it understood the layout instead of jumping straight into code changes it consulted the advisor
00:05:46to determine the best approach because it was a major critical change and might break the app if
00:05:50handled wrongly. The advisor reported that the library we chose as a new library and the one that
00:05:55was already used in the project had version issues. So before any UI work could start Claude needed to
00:06:00resolve these first. Sonnet handled those first, ran multiple commands to make sure the dependencies
00:06:04were properly applied then checked the current state of the UI through Playwright to confirm the app was
00:06:09still running correctly with no client side issues. Once the dependencies were sorted it started making
00:06:14the changes as the advisor suggested working through each component one by one and effectively
00:06:18redesigning the app as a whole. The UI it created was much more interactive and looked significantly
00:06:23more polished than before. It still had some issues but the overall improvement was clear. But here's
00:06:27where the limitation showed up. The entire process took around 31 minutes. Opus on its own would have
00:06:32done this much faster because it's better at orchestrating tasks by identifying what can run in
00:06:37parallel and executing them at the same time. Sonnet being a smaller model handled everything sequentially
00:06:43without breaking any of the work into parallel sub-agents. For an app that wasn't even that complex
00:06:4831 minutes is longer than it should have been. It also handles smaller changes on its own without
00:06:53involving the advisor which is the right behavior for minor tweaks. But for large scale changes across
00:06:58an entire app like this you're better off using Opus directly because that will save you significantly
00:07:03more time and effort. Now we wanted to test whether it implements a completely new feature on an
00:07:08existing code base properly. We had an app already built and wanted to add another page with a
00:07:13different feature to it. We gave it a prompt describing what we wanted and this time we fully
00:07:17expected it to use the advisor because it wasn't a simple task but it went ahead and implemented
00:07:22the changes entirely on its own without consulting the advisor at all. It treated the whole thing as
00:07:27routine implementation work which it clearly wasn't given the scope of the feature. When we tested the
00:07:31application we found multiple issues. If we modified something and pressed the run button changes like
00:07:37heading updates or color adjustments were also reflected in components outside the preview pane
00:07:41which shouldn't happen. On top of that we wanted it to sync directly instead of requiring us to press
00:07:46run again after every change. So we prompted it again and told it to use the advisor to fix
00:07:51these issues. Upon our prompt it first invoked the advisor agent. The advisor looked at the
00:07:56implementation and identified what was actually causing both problems. That being the wrong
00:08:00component choice. It laid out what needed to change and why the original approach had introduced those
00:08:06issues in the first place. The executor took that guidance and applied it across the app. When we
00:08:10tested it again streaming worked correctly. All changes reflected immediately as we edited without
00:08:16needing to press run after every modification. The issue of changes bleeding across components
00:08:20was also resolved and everything updated properly within the right boundaries. So there are times
00:08:25when it works exactly as intended but other times the executor assumes a task is small enough and
00:08:30decides not to consult the advisor. In those cases you often have to nudge it yourself so it follows
00:08:35the intended workflow. The model doesn't always judge the complexity of a task the same way you
00:08:40do and when it misjudges you end up with bugs that the advisor would have caught from the start. Also
00:08:44if you are enjoying our content consider pressing the hype button because it helps us create more
00:08:49content like this and reach out to more people. With real-time distributed state involved this
00:08:54approach still needed multiple rounds of prompting before everything was working correctly. The
00:08:58strategy helped but it has a ceiling you should understand before committing to it for a project.
00:09:02For simpler to medium scale applications the advisor strategy can save you several rounds
00:09:07of back and forth that you'd otherwise spend trying to push sonnet past its limits on its
00:09:11own. If what you're building requires occasional deep reasoning but mostly straightforward
00:09:16implementation this is a genuinely good structure for it. You can build more within your token limits
00:09:20without having to babysit the model through every decision or fall back to opus for the whole session.
00:09:25For complex apps with many connected dependencies or multiple failure points you're better off just
00:09:30using opus directly as your main agent. Even when sonnet follows the advisor's guidance correctly
00:09:36it can still choose the wrong implementation path because it doesn't have the reasoning depth to
00:09:40evaluate multiple approaches at once and weigh the downstream consequences. The advisor helps close
00:09:45that gap but it doesn't fully close it. In those cases the back and forth can cost you more time
00:09:50than running opus from the start would have. So this strategy is useful when you're working within
00:09:54tight token limits and the application doesn't require opus level reasoning at every step. If
00:09:58both of those conditions are true for what you're building it's worth setting up. That brings us to
00:10:03the end of this video. If you'd like to support the channel and help us keep making videos like this
00:10:08you can do so by using the super thanks button below. As always thank you for watching and I'll
00:10:12see you in the next one.

Key Takeaway

The Advisor strategy optimizes performance and cost by using Sonnet for execution and Opus for high-level reasoning, though it remains up to 3x slower than Opus for large-scale parallel tasks.

Highlights

The Advisor strategy uses Claude 3.5 Sonnet as the main executor and Claude 3 Opus exclusively as a consultant to bypass token limits and rate restrictions.

This dual-agent setup outperformed Sonnet alone on the SWE bench in both performance and cost efficiency.

Claude 3 Opus consumes significantly more tokens than Sonnet even for simple tasks, making it inefficient for full-session usage.

A complex UI migration project using this strategy took 31 minutes because Sonnet executes tasks sequentially rather than in parallel.

The Advisor strategy resolved a real-time sync deletion bug that Sonnet alone failed to fix after multiple debugging attempts.

Sonnet often misjudges task complexity and fails to invoke the Advisor for new feature implementations unless manually prompted by the user.

Timeline

The inefficiency of single-model workflows

  • Opus possesses superior reasoning but exhausts token limits and 5-hour rate windows quickly.
  • Standard orchestration delegates tasks to smaller models but still consumes excessive Opus tokens for simple context management.
  • Anthropic has been lowering rate limits, forcing a shift toward more token-efficient strategies.

Relying on a single model creates a trade-off between reasoning depth and operational speed. While Opus handles harder decisions, its high token consumption fills the context limit even during routine tasks. This problem is exacerbated by peak-hour rate limit reductions that shorten the usable window for developers.

The Advisor strategy mechanics and benefits

  • The executor role belongs to Sonnet for tool calls and code writing, while Opus acts as a non-coding advisor.
  • Combinatorial usage of Sonnet and Opus outperforms Sonnet alone on the SWE bench for technical tasks.
  • Existing app-building frameworks often prioritize deployment over token and cost efficiency.

The Advisor strategy flips the traditional hierarchy by making the smaller model the primary agent. The executor manages all user-facing output and implementation while only consulting the advisor when it reaches a logic threshold. This configuration maintains high performance while significantly lowering the cost of long development sessions.

Debugging complex state synchronization

  • Sonnet failed to resolve a real-time deletion sync issue despite multiple independent debugging rounds.
  • The Advisor pinpointed the exact breakdown in sync logic and provided a restructure plan for the executor.
  • The combined approach resolved the bug across multiple devices where deletions previously failed during item selection.

Testing the strategy on a real-time sync application revealed that Sonnet struggles with complex logic gaps on its own. After activating the Advisor, the Opus model reviewed the full conversation context to identify the root cause of the sync failure. The executor then applied the specific logic changes without further back-and-forth prompting.

Limitations in large-scale UI transformations

  • A full UI library migration took 31 minutes because Sonnet lacks the parallel orchestration capabilities of Opus.
  • The Advisor correctly identified version dependency conflicts before the UI work began.
  • Directly using Opus is more efficient for large-scale changes across an entire application codebase.

During a UI library overhaul, the Advisor strategy was hindered by Sonnet's sequential execution style. While the advisor accurately spotted library version issues, the executor could not break the workload into parallel sub-tasks. For projects involving interconnected dependencies and massive code changes, the time saved by Opus's speed outweighs the token savings of the Advisor strategy.

Failures in complexity assessment and implementation

  • Sonnet occasionally treats large feature implementations as routine work and fails to consult the Advisor.
  • Manual user intervention is required when the executor misjudges the reasoning depth needed for a task.
  • The strategy serves medium-scale applications well but has a ceiling for complex apps with many failure points.

In a test adding a new feature page, the executor attempted the build without help, resulting in UI bugs where changes bled across unrelated components. The user had to manually 'nudge' the model to use the advisor, which then identified the wrong component choice as the root cause. This highlights that while the strategy closes the reasoning gap, it does not replace the need for human oversight in complex environments.

Community Posts

View all posts