Claude's New Advisor Mode: Better Results + CHEAPER

CChase AI
컴퓨터/소프트웨어경영/리더십AI/미래기술

Transcript

00:00:00Anthropic just released the advisor strategy,
00:00:02which allows us to not only get better performance
00:00:05from our Anthropic models, but do it at a lower cost.
00:00:09And the way it works is pretty simple.
00:00:10It pairs Opus as an advisor
00:00:12with Sonnet or Haiku as an executor.
00:00:15So Opus is coming up with a plan
00:00:17and the cheaper model does all the work.
00:00:19So this is very similar to when we're using cloud code
00:00:22and have Opus run the plan mode,
00:00:24but have the actual execution pass off the Sonnet.
00:00:27The difference is with the advisor strategy,
00:00:30this is all done automatically via an API.
00:00:32So this is perfect if you're working on things
00:00:34outside of cloud code.
00:00:35So if you have any sort of web application
00:00:38that uses Anthropic APIs under the hood,
00:00:41this is a no brainer.
00:00:42You're going to get more effective outputs for cheaper.
00:00:46And it's actually a bit more sophisticated
00:00:48than what we do in cloud code with Opus planning
00:00:50and then Sonic executing.
00:00:52Because this advisor executor relationship
00:00:55is constantly in flux and it isn't a one-time thing
00:00:58where Opus advises one time and then Sonnet executes.
00:01:01It actually goes back and forth.
00:01:02Like it states here when the executor,
00:01:04so Sonnet or Haiku hits a decision
00:01:06it can't reasonably solve,
00:01:08it consults Opus for guidance as the advisor.
00:01:11Opus has full context of what Sonnet is doing.
00:01:15And so it isn't just like plan mode
00:01:16where it gives it one strategy and then it goes.
00:01:19It's as if you did that and Sonnet goes and tries to execute.
00:01:22It hits a stumbling block, then it's going to go back to Opus.
00:01:24So there's a constant back and forth.
00:01:26Furthermore, to keep costs low,
00:01:28Opus isn't doing any tool calls at any point in time.
00:01:30The only tool calls are being done by that smaller LLM,
00:01:34in this case, Sonnet or Haiku.
00:01:35But Opus does retain that full shared context.
00:01:39And like I mentioned in the intro,
00:01:40this gives us better results for less.
00:01:43So right here, it's comparing Sonnet 4.6 high
00:01:46with Opus advisor versus Sonnet 4.6 high on its own.
00:01:50Sonnet scored higher on SWE bench at 74.8 versus 72.1,
00:01:55and it came in cheaper.
00:01:56So it was just over 96 cents per agentic task
00:02:00versus almost a dollar nine cents, which is significant.
00:02:03And you see the same thing play out in other benchmarks
00:02:06like browse comp and terminal bench.
00:02:08So 60.4 versus 58.1, and it's cheaper.
00:02:12The cheaper thing is great because as we all know,
00:02:14the anthropic APIs are awesome,
00:02:16but they're so damn expensive.
00:02:19And oftentimes you feel like you want something
00:02:21in between Sonnet and Opus, but that just doesn't exist.
00:02:24So this gives us a middle ground
00:02:26in terms of Sonnet and Opus performance,
00:02:28but with a cost that is cheaper than normal Sonnet.
00:02:31So what's not to love?
00:02:32Like I said before, this is an API thing,
00:02:33not necessarily a Claude code thing.
00:02:35So to use this, you're just going to have to adjust your code
00:02:38and how it's actually making those API calls.
00:02:41Specifically, you have to call out the type to be advisor,
00:02:45as well as the max uses.
00:02:47Now the max uses being the number of times
00:02:48it's going to go back to Opus
00:02:50to get advice on a particular issue.
00:02:52So to sum it up, this is an amazing upgrade.
00:02:54If you're someone who uses anthropics API
00:02:56in actual projects outside of the Claude code ecosystem,
00:03:00we're getting better results for cheaper.
00:03:03Because as you know, oftentimes Opus is just overkill
00:03:06for the vast majority of things,
00:03:08yet sometimes you want something a little better with Sonnet.
00:03:10And here we go, this is the perfect middle ground.

Key Takeaway

Anthropic's advisor strategy increases benchmark performance by nearly 3% while lowering per-task costs to $0.96 by using Opus as a strategic guide and Sonnet as a tool-calling executor.

Highlights

The advisor strategy pairs Claude 3 Opus as a planner with Sonnet or Haiku as the active executor for API-based tasks.

On the SWE-bench benchmark, the advisor strategy achieves a score of 74.8 compared to 72.1 for standalone Sonnet.

Implementing the advisor model reduces the cost per agentic task from $1.09 to $0.96.

Smaller executor models handle all tool calls while Opus provides guidance only when the executor encounters a decision it cannot solve.

Developers must specify the 'advisor' type and 'max_uses' parameter within the API call to limit how many times the executor consults Opus.

The strategy improves performance across multiple benchmarks including BrowseComp (60.4 vs 58.1) and TerminalBench.

Timeline

The Advisor-Executor Architecture

  • Claude 3 Opus functions as an advisor that creates plans for smaller models to execute.
  • Sonnet and Haiku act as executors that perform the actual labor and tool usage.
  • Automation via API distinguishes this strategy from manual planning modes in Claude Code.

The system creates a hierarchical workflow where high-level reasoning and execution tasks are split between models. This approach is intended for web applications and background services rather than interactive chat environments. By offloading the bulk of the work to cheaper models, the strategy maximizes the utility of high-end reasoning without the associated high costs.

Dynamic Interaction and Context Sharing

  • The relationship between models is fluid and involves multiple back-and-forth consultations.
  • Opus retains full shared context of the executor's actions without making tool calls itself.
  • Executor models consult the advisor only when they encounter a stumbling block or complex decision.

The interaction is more sophisticated than a one-time handoff. If an executor model like Sonnet reaches a point of uncertainty, it triggers a consultation with Opus to receive updated guidance. This iterative process ensures the system can overcome obstacles that would typically cause a standalone smaller model to fail while keeping Opus from wasting resources on simple execution steps.

Performance Benchmarks and Cost Efficiency

  • The advisor strategy outperforms standalone Sonnet 3.5 on SWE-bench and BrowseComp.
  • Costs per task drop by approximately 13 cents when using the advisor configuration.
  • This configuration fills the performance gap between Sonnet and Opus at a lower price point than normal Sonnet.

Data shows that using Opus as an advisor actually makes the entire process cheaper than using Sonnet alone because the efficiency gains in task completion outweigh the cost of occasional high-tier model consultations. Benchmarks such as SWE-bench show a rise from 72.1 to 74.8 in success rates. This provides a middle ground for developers who need more intelligence than Sonnet offers but find standalone Opus prohibitively expensive.

API Implementation Requirements

  • Integration requires specific adjustments to the API call structure.
  • The 'max_uses' parameter controls the frequency of consultations with the advisor model.
  • The advisor mode serves as a replacement for Opus when the latter is overkill for simple tasks.

Users must modify their existing code to define the model type as 'advisor' within the API request. Setting the 'max_uses' ensures that the budget is controlled by limiting how many times the executor can ask Opus for help on a single issue. This setup is ideal for complex projects outside of the standard Claude Code ecosystem where reliability and cost are competing priorities.

Community Posts

No posts yet. Be the first to write about this video!

Write about this video