00:00:00So opus 4.7 just released and by the numbers,
00:00:04this is a massive upgrade. So let's dive in. So first things first,
00:00:08the benchmarks. Now they do show mythos over here on the right,
00:00:12just to tease us about things that do exist.
00:00:15But what I really want to pay attention to is 4.7 versus 4.6 because who knows
00:00:20when mythos is going to be available and by the numbers,
00:00:23this is a very solid leap forward, especially in things like coding.
00:00:28If we take a look at agentic coding, we see a jump from 53 to 64,
00:00:32from 80 to 87,
00:00:34and then from 65 to 69 on the three big tests being sweet bench
00:00:39pro sweet bench verified in terminal bench 2.0.
00:00:42The only places that we see opus 4.7 benchmarks,
00:00:46not on top of all the other models,
00:00:49except for mythos is agentic search where we look at GPT 5.4.
00:00:54Is it 89.3 versus opus 4.7,
00:00:57which oddly enough has dropped versus 4.6, which, you know,
00:01:01when you see things like that,
00:01:02where they show benchmarks where it's gone down from opus 4.6,
00:01:06you wonder if they kind of just insert those. It's like, Oh no,
00:01:08these benchmarks are actually legit guys. We wouldn't lie about this. See,
00:01:11see this thing. Um,
00:01:12but 5.4 is ahead in agentic search and you also see it ahead in graduate level
00:01:17reasoning. Now, another area we see a massive improvement is visual reasoning.
00:01:21So we jump from 69 to 82,
00:01:25and that might have something to do with the fact that this model has way better
00:01:29vision.
00:01:29So they are telling us that the images that you put into opus 4.7 are three X,
00:01:34the resolution now, which is huge.
00:01:36If you're doing anything with like diagrams or small text,
00:01:38and we see those same sort of numbers reflected here in these graphs.
00:01:42So improvements in knowledge, work, vision, huge jump in document reasoning,
00:01:4657.1 to 80.6, which is a huge plus.
00:01:50If you're someone who uses something like cowork,
00:01:52you're using this in an office scenario and all as you do all day is feed it
00:01:55documents. Long context reasoning is also a big one.
00:01:57We constantly harp on this channel about context rot and the idea that we need to
00:02:02be very focused on session management. I don't think that changes at all. I mean,
00:02:07going from 71 to 75 is great.
00:02:09I don't think you should change how aggressively you clear IE anytime you're at 20%
00:02:13or 25% of the context window, you should be clearing, but this is an improvement.
00:02:17We'd love to see this. And this one is also interesting.
00:02:19This coding benchmark that has to do with multimodal. So they're coding,
00:02:22but this also includes things where they're throwing it context that has stuff
00:02:25like images. And I don't think this is any surprise.
00:02:28And I think a lot of that has to do with the resolution.
00:02:30Now besides the model itself did a few more updates.
00:02:32The biggest one is more effort control. So now there is a level X high,
00:02:37probably stole that from open AI between high and max.
00:02:40And on top of that cloud code now defaults to extra high.
00:02:44I think that's probably in response to a lot of people claiming that Opus 4.6 was
00:02:48nerfed. And then Boris Churney, the creator of Opus, well, not creative Opus,
00:02:52creative cloud code came out and said, well,
00:02:54actually we moved the default reasoning level, the default effort level,
00:02:58the medium. So the fact that came out with X high,
00:03:01I think is a response to that in order to make it quote unquote better and
00:03:05try harder yet not pushing people to max because then it swings to the other side
00:03:10and everyone complains that their usage is filling up. And remember,
00:03:12if you want to change that,
00:03:13all you need to do is do forward slash effort and then set your level.
00:03:16The higher resolution is also on the API.
00:03:19And then they've also released the new forward slash ultra review slash command.
00:03:24So it gets a dedicated review session on top of that.
00:03:28They've extended auto mode as well. And if you don't know about auto mode,
00:03:31it's basically just a alternative to dangerously skip permissions. Now,
00:03:34one thing they note here is that Opus 4.7 is going to use more tokens
00:03:39than 4.6.
00:03:40So they explicitly state that Opus 4.7 uses an updated tokenizer and improves how
00:03:45it processes text, but that that increases the amount of tokens on the input,
00:03:50roughly one to 1.35 times, depending on the content type.
00:03:54And then secondly, Opus 4.7 thinks more at higher effort levels.
00:03:58So not remember that because they're setting the default effort to extra high
00:04:03when before it was on medium and Opus 4.7 uses more tokens.
00:04:07So if you've been on medium this whole time,
00:04:09you never changed it and you were already hitting usage rates or usage limits on
00:04:134.6 be wary of this. Understand that you could definitely run into usage issues.
00:04:18If you're someone who's already doing that,
00:04:19because now it's going to use even more tokens.
00:04:21What's also interesting is that they've removed extended thinking as well.
00:04:25And if you want to read more and get kind of a deep dive on this migration,
00:04:28they put out an entire thing in the documentation.
00:04:30So all at all looks like a really solid upgrade.
00:04:32And I'm excited to jump in there and test it myself.