00:00:00Minimax just dropped M2.5, a coding model that nearly beats Clod Opus 4.6, but costs one tenth as much.
00:00:07It launched just the other day, it has open waits, has 230 billion parameters, and it's built for agent workflows.
00:00:14If you're building AI agents, co-pilots, or automation tools, this will change your costs overnight.
00:00:19And the wild parts are not only the benchmarks, but also the price.
00:00:23We have videos coming out all the time, be sure to subscribe.
00:00:31Minimax M2.5 is a mixture of experts model that has 230 billion total parameters, but only 10 billion are active when it runs.
00:00:39So you get a huge model without paying for the whole thing every time.
00:00:43It's built for real-world development workflows, using Python, Java, Rust, multi-file refactors, tool calling loops, even Word and Excel automation.
00:00:53Now there are two versions with this, you have the standard, which is 50 tokens per second, and then lightning, which is 100 tokens per second.
00:01:01It's multilingual, and it's fully open waits on Hugging Face.
00:01:05That means you can fine-tune it, run it on perm, and avoid lock-ins, and this is where things start to get interesting for agents.
00:01:12I ran the same prompt on both Opus and Minimax to build out a full-stack Kanban board.
00:01:18Nothing too crazy here, just enough to really get them to build something to see how they compare.
00:01:23The exact prompt that I used I put in the description if you guys want to read over it, but first we're going to look here at the Opus version, which took about 4 minutes to run.
00:01:31We get as we would expect, I didn't have to prompt it again, this was the final output.
00:01:37Everything here is super smooth, it works really good, the UI looks also pretty good for being a starter.
00:01:44Drag and drop works as it should, editing tasks also work as it should. I really like this little label here with the correct folder, and it changes as we drag them. That's a cool bonus.
00:01:55All in all, Opus did a really good job here, that's kind of what I expected going into this.
00:02:00Now, on to Minimax. This did take about 8 minutes to finish, maybe because I imported it into Cursor instead of running it on their site, but it's in Cursor, I wanted that.
00:02:10While it did take longer, it cost one tenth of the price, so I'm not going to argue with that.
00:02:14All in all, it did a really good job off only one prompt. The UI lacks a bit compared to Opus, but we still have the same functionality.
00:02:22I can create tasks, drag and drop them into the correct column, so all that works great.
00:02:27The only thing it did not do is add that little label that I liked onto each card as Opus did.
00:02:33Another point it didn't get right was the ability to edit the description of the box.
00:02:38If I edit the description, you see here, nothing changes.
00:02:42So I would have to run this a second time to get that to do what it needs to do, basically.
00:02:48Now that's still okay, because again, one tenth the cost.
00:02:51Now let's talk about what actually matters to developers. M2.5 uses reinforcement learning for task decomposition.
00:02:58So it breaks problems down better, which leads to 20% fewer tool calls and 5% less token waste.
00:03:06If you've built agents before, you know tool calls are where things start to get expensive and they can lead to a mess.
00:03:13It also handles multi-file edits, run, debug, fix loops, those type of things, switching between tools without actually falling apart.
00:03:21On search benchmarks, it reduces search rounds by 20% compared to their previous M2.1.
00:03:27It supports caching too, which means repeated queries can cost less over time.
00:03:32You can plug it right into a llama, local clusters, GitHub automations, or your CI pipelines.
00:03:37Now benchmarks, right? I'm comparing this to Opus here.
00:03:40Well, on SWE bench verified, M2.5 scored over 80%.
00:03:45Clawed Opus 4.6 slightly higher than this at just over 80% too. That's a really small gap here.
00:03:52On the multi-SWE bench, it scores over 51% topping other open models.
00:03:58And on DROID, it actually beats Opus by just .2%, right? So it depends on where you look here.
00:04:05Now speed. It's 37% faster than their previous model. It still took 8 minutes here, okay?
00:04:11Opus 4.6 averages a slightly faster speed, but it does get identical when you run it in the right format.
00:04:18So what does this mean for you? Well, it could mean a few things.
00:04:20It could mean fewer retries, cleaner CI runs, less token churn, more merged pool requests.
00:04:26And in agentic task performance, it's matching things like GPT-5 or Gemini 3 Pro territory,
00:04:32but with open weights, right? So now let's talk about the part that changes things,
00:04:37which really here, even if it took longer, is the pricing.
00:04:40M2.5 standard costs $0.15 per million input tokens and $1.20 per million output tokens.
00:04:47Lightning is double that. So $0.30 per million input, $2.40 per output.
00:04:53Running lightning at 100 tokens per second for an hour, that's about a dollar.
00:04:56If you run standard, which I actually did here, it's about 30 cents per hour.
00:05:00Now compare that to Claude Opus 4.6. Huge difference.
00:05:04$5 per million input tokens, $25 per million output tokens.
00:05:09Per SWE task, it's roughly 10% of Opus costs, helped by efficiency and fewer tool calls.
00:05:15There's also the free API tier, which is live right now. I did pay for this,
00:05:20okay, but they do have that. And that's where the economics really start to shift.
00:05:24So should you switch from Opus 4.6? Well, performance wise, they're nearly identical.
00:05:30Took a bit longer, right? I was on the standard, not lightning, but they're kind of identical.
00:05:34Task completion time is basically the same. Reasoning depth was comparable.
00:05:39Cost wise, though, that's massively cheaper. So you tell me there.
00:05:43It also uses 20% fewer tool calls and wastes those tokens, like I said earlier.
00:05:47So flexibility wise, it's open weights. You can deploy it locally. You can fine tune it,
00:05:52that means. And then Opus still does have an edge at the very top end of premium intelligence.
00:05:57So, right, that's the premium model that we're still working with.
00:06:00Now here's why this matters, because now you can run agents at scale without that price burden.
00:06:05Because M2.5 has a 59% win rate on advanced agent benchmarks, you can build autonomous
00:06:12repo bots, run persistent coding agents, automate enterprise workflows, right? It's not perfect,
00:06:17but it's really, really good for what we saw here. And the pricing is going to allow you to really
00:06:22experiment and put it to the full test. And Minimax is shipping fast, moving on a months
00:06:27to weeks basis here. Ollama and GitHub integrations are already ramping up.
00:06:32Minimax M2.5 delivers Opus level coding performance at a budget price with open weights. That
00:06:38combination is rare, but 2026 who knows what we're going to see. You can test it out for free over on
00:06:43Minimax or run it on Ollama or pick up an API like I did. Is this the new default model for
00:06:48developer agents? I guess we're going to see how that plays out. We'll see you in another video.