00:00:00No single Claude model is enough on its own. Opus has the reasoning but burns through your
00:00:04limits. Sonnet is fast but hits a wall on harder decisions. And the answer isn't picking one over
00:00:10the other. It's using all of them together. Now Claude code already does this to some extent.
00:00:14It orchestrates between models on its own. But Anthropic just released something that
00:00:18not only saves tokens but also makes smaller models almost as capable as the larger ones.
00:00:23Now when building with Claude you might have noticed this. Whenever you hand Opus a task
00:00:28and it determines that it doesn't need that much effort, it hands it off to Sonnet or Haiku and
00:00:32delegates tasks to the smaller models in order to manage token usage properly. But there's a problem
00:00:37with this approach. As we mentioned in our previous video, Anthropic has been lowering the rate limits
00:00:42so during peak hours your 5 hour window fills up faster. And on top of that, Opus consumes a lot
00:00:47of tokens even on simple tasks which means using Opus means your context limit fills up faster.
00:00:52Anthropic decided to flip the script on this and they came out with something called the
00:00:55Advisor strategy. The way this strategy works is that you give the role of executor to the Sonnet
00:01:00model and use Opus purely as an advisor that only gets consulted when the executor actually needs
00:01:05it. There are two agents involved. The executor is your main agent running on Sonnet and it handles
00:01:10all tool calls, code changes and user facing output. The advisor runs on Opus and its only
00:01:15job is to guide the executor when it gets stuck. The advisor never writes code or makes any changes.
00:01:20When Anthropic experimented with this approach, they found it outperformed Sonnet alone on the
00:01:25SWE bench. They found that this combination outdid Sonnet alone in terms of both performance and cost.
00:01:31And it costs significantly less than running Opus as the main agent because Opus only gets invoked
00:01:36when it actually matters, not for every single iteration. Now you might think that we already
00:01:40have a lot of frameworks for building apps that are better and ready to use so why bother with this
00:01:45setup? The reason is that most existing frameworks are not built with cost and token efficiency in
00:01:50mind. Even though they get the job done, they fall short when it comes to making Claude run longer
00:01:54and more efficiently because they are primarily focused on building the app rather than optimizing
00:01:59for token usage. With this setup, you can build a working app using a weaker model, making the
00:02:04whole process far more token efficient. And that connects back to the limits problem we mentioned
00:02:09earlier. We already made a video on Claude's limits and told you to switch to a smaller model to make
00:02:13it last longer. Here's how it connects. Sonnet consumes way fewer tokens and requires less effort
00:02:19than Opus to perform the same task. Opus is a very large and powerful model so it consumes a lot of
00:02:24tokens even for simple tasks. Sonnet is able to handle many of those tasks more efficiently. So
00:02:30using Opus only to bridge the performance gap on harder decisions is where the real impact comes in.
00:02:35You're only invoking that power when you actually need it, not for every single task. This makes the
00:02:40overall usage more token efficient and lets you get more done within the same limits. We share
00:02:45everything we find on building products with AI on this channel so if you want more videos on that
00:02:50subscribe and keep an eye out for future videos. So we wanted to test how this actually plays out on an
00:02:55app that was already built using Sonnet. To use the strategy inside Claude code we set the advisor
00:03:00command with Opus 4.6 as the advisor model. Our main agent was the executor which I had already
00:03:05set to Sonnet since I built the app using it. The app was supposed to have real-time sync and while
00:03:10moving and resizing elements synced perfectly across sessions deletion wasn't syncing at all. We tried
00:03:16debugging this multiple times with Sonnet on its own but the issue kept persisting no matter how
00:03:20much it tried to fix the issues. So after turning on Opus as the advisor we gave Claude the prompt
00:03:25describing the problem and because Sonnet had already failed multiple times instead of taking
00:03:30another shot on its own it decided to invoke the advisor this time. The advisor reviewed the
00:03:34conversation so far to assess the situation. It provided the exact changes that needed to be made pinpointing
00:03:40where the sync logic was breaking and what specifically needed to be restructured. The executor model took
00:03:45in that advice and it applied those fixes directly without any additional back and forth. We tested it
00:03:50across multiple devices to test the sync and found that the issue was resolved. Both ends were
00:03:55reflecting deletions properly as intended even if the user had selected the item at one end and the
00:04:00other end was being deleted which wasn't the case previously. If we had tried fixing this using Sonnet
00:04:05alone it would have taken more rounds of back and forth prompting because Sonnet inherently is a
00:04:09weaker model and not capable enough to handle complex logic by itself. On the other hand using Opus alone
00:04:15would have consumed far more tokens and likely wouldn't have been this fast. Using Sonnet with Opus
00:04:20as an advisor made the process much more efficient. So overall this strategy helped debug syncing issues
00:04:25much faster than before. But before we move forwards let's have a word by our sponsor Juni by JetBrains.
00:04:30If you're a developer you know the struggle context switching between your terminal IDE and CI pipelines
00:04:36just to get stuff done. Most coding agents lock you into one environment or one specific LLM and
00:04:41call it a day. Juni CLI is different. It's an LLM agnostic coding agent that works everywhere. Your
00:04:47terminal, your IDE, GitHub, CI/CD pipelines, even your task manager. One agent everywhere. Delegate
00:04:54real work to it. Writing tests, building backends, refactoring, automating code reviews on every commit.
00:04:59Right now JetBrains is running a free early access program including $50 in Gemini credits to test the
00:05:04agent plus BYOK support so you can use any model you prefer. Full access to all features, early access
00:05:10to new ones and direct support from the dev team shaping the product. It's simply better with Juni.
00:05:15Click the link in the pinned comment to join for free. Now we wanted to test whether Sonnet actually
00:05:20consults the advisor for major UI changes. We had a previously built application and we wanted to
00:05:25transform its UI to a different library. On top of that we wanted to make multiple UI changes in one
00:05:31go which isn't normally recommended but we wanted to see how the smaller model performs in coordination
00:05:36with the larger one on a bigger task. It first accessed the current UI using the Playwright MCP.
00:05:41Once it understood the layout instead of jumping straight into code changes it consulted the advisor
00:05:46to determine the best approach because it was a major critical change and might break the app if
00:05:50handled wrongly. The advisor reported that the library we chose as a new library and the one that
00:05:55was already used in the project had version issues. So before any UI work could start Claude needed to
00:06:00resolve these first. Sonnet handled those first, ran multiple commands to make sure the dependencies
00:06:04were properly applied then checked the current state of the UI through Playwright to confirm the app was
00:06:09still running correctly with no client side issues. Once the dependencies were sorted it started making
00:06:14the changes as the advisor suggested working through each component one by one and effectively
00:06:18redesigning the app as a whole. The UI it created was much more interactive and looked significantly
00:06:23more polished than before. It still had some issues but the overall improvement was clear. But here's
00:06:27where the limitation showed up. The entire process took around 31 minutes. Opus on its own would have
00:06:32done this much faster because it's better at orchestrating tasks by identifying what can run in
00:06:37parallel and executing them at the same time. Sonnet being a smaller model handled everything sequentially
00:06:43without breaking any of the work into parallel sub-agents. For an app that wasn't even that complex
00:06:4831 minutes is longer than it should have been. It also handles smaller changes on its own without
00:06:53involving the advisor which is the right behavior for minor tweaks. But for large scale changes across
00:06:58an entire app like this you're better off using Opus directly because that will save you significantly
00:07:03more time and effort. Now we wanted to test whether it implements a completely new feature on an
00:07:08existing code base properly. We had an app already built and wanted to add another page with a
00:07:13different feature to it. We gave it a prompt describing what we wanted and this time we fully
00:07:17expected it to use the advisor because it wasn't a simple task but it went ahead and implemented
00:07:22the changes entirely on its own without consulting the advisor at all. It treated the whole thing as
00:07:27routine implementation work which it clearly wasn't given the scope of the feature. When we tested the
00:07:31application we found multiple issues. If we modified something and pressed the run button changes like
00:07:37heading updates or color adjustments were also reflected in components outside the preview pane
00:07:41which shouldn't happen. On top of that we wanted it to sync directly instead of requiring us to press
00:07:46run again after every change. So we prompted it again and told it to use the advisor to fix
00:07:51these issues. Upon our prompt it first invoked the advisor agent. The advisor looked at the
00:07:56implementation and identified what was actually causing both problems. That being the wrong
00:08:00component choice. It laid out what needed to change and why the original approach had introduced those
00:08:06issues in the first place. The executor took that guidance and applied it across the app. When we
00:08:10tested it again streaming worked correctly. All changes reflected immediately as we edited without
00:08:16needing to press run after every modification. The issue of changes bleeding across components
00:08:20was also resolved and everything updated properly within the right boundaries. So there are times
00:08:25when it works exactly as intended but other times the executor assumes a task is small enough and
00:08:30decides not to consult the advisor. In those cases you often have to nudge it yourself so it follows
00:08:35the intended workflow. The model doesn't always judge the complexity of a task the same way you
00:08:40do and when it misjudges you end up with bugs that the advisor would have caught from the start. Also
00:08:44if you are enjoying our content consider pressing the hype button because it helps us create more
00:08:49content like this and reach out to more people. With real-time distributed state involved this
00:08:54approach still needed multiple rounds of prompting before everything was working correctly. The
00:08:58strategy helped but it has a ceiling you should understand before committing to it for a project.
00:09:02For simpler to medium scale applications the advisor strategy can save you several rounds
00:09:07of back and forth that you'd otherwise spend trying to push sonnet past its limits on its
00:09:11own. If what you're building requires occasional deep reasoning but mostly straightforward
00:09:16implementation this is a genuinely good structure for it. You can build more within your token limits
00:09:20without having to babysit the model through every decision or fall back to opus for the whole session.
00:09:25For complex apps with many connected dependencies or multiple failure points you're better off just
00:09:30using opus directly as your main agent. Even when sonnet follows the advisor's guidance correctly
00:09:36it can still choose the wrong implementation path because it doesn't have the reasoning depth to
00:09:40evaluate multiple approaches at once and weigh the downstream consequences. The advisor helps close
00:09:45that gap but it doesn't fully close it. In those cases the back and forth can cost you more time
00:09:50than running opus from the start would have. So this strategy is useful when you're working within
00:09:54tight token limits and the application doesn't require opus level reasoning at every step. If
00:09:58both of those conditions are true for what you're building it's worth setting up. That brings us to
00:10:03the end of this video. If you'd like to support the channel and help us keep making videos like this
00:10:08you can do so by using the super thanks button below. As always thank you for watching and I'll
00:10:12see you in the next one.