00:00:00Wait, I just noticed.
00:00:01The report is based on publicly available information as of January 2025.
00:00:06Oh no, oh no, 2025, no, that's not what I asked for.
00:00:15Moonshot AI has released their newest AI model, Kimi K 2.5, and it's been all the rage this
00:00:22week around the internet.
00:00:24Some people are even going as far as saying this might be their favorite model yet.
00:00:29So naturally I had to test it out to see what the fuss is all about and determine if this
00:00:34really is something fresh or is it just another model that's hyped up by flashing marketing.
00:00:39So that's what we're going to find out in today's video.
00:00:42It's going to be a lot of fun, so let's dive into it.
00:00:49So Kimi K 2.5 is the latest open source model developed by a Chinese company called Moonshot
00:00:55AI.
00:00:56Just six months ago Richard already covered K2 in great detail and today we are back and
00:01:01we're looking at what's new in K2.5.
00:01:05So what's the big deal about this model?
00:01:06How is it different from all other thousand new models out there coming out almost daily?
00:01:12Well there's two things.
00:01:13First of all it claims to be really good at vision and coding.
00:01:17It even goes as far as labeling themselves as open source SOTA.
00:01:21SOTA.
00:01:22Do you realize what that means?
00:01:24Actually I had to look it up myself what it means I don't actually know.
00:01:27Oh okay so it means state of the art.
00:01:30Okay well today I learned.
00:01:32So anyway it is full on state of the art on agentic benchmarks and vision and coding.
00:01:37And the second thing that stands out in this model is that it has a new functionality called
00:01:42Agent Swarm.
00:01:44Where it is capable of spinning up up to hundred subagents and 1,500 tool calls and run them
00:01:51concurrently resulting in a 4.5 times faster performance.
00:01:55For this model they used a new training method called parallel agent reinforcement learning
00:02:00or PARL.
00:02:01And this means that the model can self-direct the whole agent swarm by creating a trainable
00:02:06orchestrator agent which is basically running the show by decomposing tasks into paralyzable
00:02:12subtasks and is keeping an eye on all of these agents to ensure that the whole operation doesn't
00:02:18fall into a serial collapse which tend to happen with these multi-agent workflows.
00:02:23The way they solved this is by giving each subagent rewards at separate critical step
00:02:28stages and this whole system lets k 2.5 achieve noticeable performance gains.
00:02:34So we're definitely going to test it out.
00:02:35Now I'm not going to go into too much detail about all of the different benchmarks because
00:02:40honestly every video I see now is always praising these numbers and I don't even think we can
00:02:44trust these numbers anymore to be honest.
00:02:47They can't even line up their benchmark graphics properly, come on.
00:02:51So instead I'm going to be focusing on the two things they claim this model seems to be
00:02:55good at.
00:02:56Vision and coding and the new agent swarm functionality.
00:03:00So let's put it up for a test.
00:03:02They've also got their own CLI tool called Kimi CLI.
00:03:06So that's what I'm going to be using today to conduct my tests.
00:03:09So one of the most impressive features they claim to have is the ability to take a video
00:03:13recording of a UX of a particular website and replicate that in code.
00:03:19That's pretty impressive.
00:03:20So to test it out, I made a video recording of Apple's iPad Air product page with all their
00:03:25fancy animations and transitions.
00:03:28And I created a folder that contains only the file of this recording.
00:03:32And now I'm going to prompt k 2.5 to make a promotional website for the iPad Air product
00:03:38based on this video.
00:03:39And before executing shell commands, it will ask if we want to allow it, so I'll allow it
00:03:44for this session.
00:03:46And it's now running.
00:03:48And this is interesting.
00:03:49It detected that the file was too large.
00:03:51So it went ahead and used FFmpeg to compress it on its own.
00:03:56And I was really curious to know how does this model process and understand a video file.
00:04:01It turns out that it takes the video file and once again, it's using FFmpeg to extract key
00:04:06frames from the video to use as visual aid for building the website.
00:04:11So it took the model roughly five and a half minutes to finish the task.
00:04:15So it's not the fastest model out there for sure.
00:04:18And mind you, I am using their own API's to call the model, not a local version.
00:04:23But anyway, once that is done, we can see here that it gives us a detailed overview of what
00:04:28it did.
00:04:29So now let's view the site itself.
00:04:30Oh, wow.
00:04:31Look at that.
00:04:32We nailed the whole Apple design aesthetic and it even created this 3D floating iPad in
00:04:38the middle.
00:04:39And it seems to be responding to mouse movements as well.
00:04:42That's pretty cool.
00:04:43Then we get this nice carousel section with different windows, but unfortunately it does
00:04:48not respond when I click on the dots, but it's still quite elegant.
00:04:52Then we get another section with some animations.
00:04:55Oh, and here we actually get a navigatable carousel with different designs.
00:05:00That's pretty cool.
00:05:01And then we get a couple of more sections, which all feel very similar to Apple's aesthetic.
00:05:06Honestly, this is pretty good.
00:05:07The fact that it was able to produce a nice looking website with all these animations just
00:05:12from a reference video and a short prompt is kind of cool.
00:05:16All right.
00:05:17But Apple is a well-known brand.
00:05:18I'm sure the design aesthetic is definitely part of their model's training data.
00:05:23So this is probably a straightforward task for the model.
00:05:26Now let's try something more interesting and a little bit quirky.
00:05:29I've created another folder with a single image of Mr. Burns from the Simpsons.
00:05:34Let's see how creative Kimmy K 2.5 can get.
00:05:37I've added this prompt, Mr. Burns is running for president.
00:05:40I want you to create a presidential campaign website for Mr. Burns, which includes his policies
00:05:45and political agenda based on this character's traits and motivation.
00:05:49Let's see how that does.
00:05:51Once it starts the reasoning process, we can see how it thinks about the design.
00:05:55The asset is clear.
00:05:56Montgomery Burns in his signature dark green suit and peach tie.
00:06:01This is the key visual reference for the campaign's aesthetic.
00:06:05Pretty cool.
00:06:06And this section actually took even longer to finish.
00:06:08This was around six minutes in total.
00:06:11But now that that's done, again, we see a detailed overview of what was produced and we can see
00:06:16here it added a vision section, a policy section, promotional materials, et cetera.
00:06:22And look at that.
00:06:23It even added a hidden Easter egg just for fun.
00:06:26Now that is super cool.
00:06:27Now let's see how the website looks.
00:06:29Wow.
00:06:30Look at that.
00:06:31Excellence in governance.
00:06:33I'm making this country great again for me.
00:06:36Oh, and there's a little nuclear button over there.
00:06:40What happens when I click it?
00:06:41Smithers gave me a coffee.
00:06:43That's cool.
00:06:44And there's even a detailed about page.
00:06:46And then there's prosperity.
00:06:49And the animations are so slick.
00:06:50Wow.
00:06:51So I guess Kimmy K 2.5 really knows how to create punchy graphics.
00:06:55It's obviously a lot better than all of those purplish slop designs that we've seen other
00:07:01models produce.
00:07:02And look at that.
00:07:04Policies for the elite.
00:07:05Oh, my God.
00:07:06There's so many good cheeky jokes here.
00:07:08That's amazing.
00:07:10Health care vouchers redeemable only at burns medical centers.
00:07:14Organ transplant waiting lists sorted by net wealth.
00:07:18Border wall made of gold.
00:07:21What are people saying?
00:07:22OK, here we get some Simpsons character quotes and the contact form and the campaign donation
00:07:29page.
00:07:30It even added a merch shop.
00:07:31OK, but that section is coming soon.
00:07:33Yeah, because this is a static HTML page.
00:07:35All right.
00:07:36Now I want to trigger that Easter egg.
00:07:38How do I do that?
00:07:39Konami code says I have to input the Konami code.
00:07:43What is a Konami code?
00:07:45Oh, OK.
00:07:46The Konami code is a famous video game cheat code.
00:07:49Wow.
00:07:50I didn't know this.
00:07:51Once again, today I learned.
00:07:52So it's up, up, down, down, left, right, left, right, A, B. Oh, OK.
00:07:58There we go.
00:07:59We now get the big ha ha ha text over the page and the slogan turns to excellent.
00:08:06That's pretty cute.
00:08:07But honestly, there are so many cool nuggets here that I'm just going to leave a link in
00:08:10the description for this home page so you can check it out for yourself later.
00:08:14The Simpsons fans might really appreciate this.
00:08:17This is really impressive, honestly.
00:08:19I didn't expect it to create such a fun website from just a single image and a short text prompt.
00:08:24All right.
00:08:25But now I want to try the agent swarm function everybody's been raving about.
00:08:29So looking at their own examples, apparently the swarm feature is very good for tasks like
00:08:33gathering research for a certain topic or any action, really, where you want a multi-threaded
00:08:39approach.
00:08:40But to test out this feature in all of its glory, it's best to use the official Kimi page
00:08:46and run it in their chatbot because they have also added a bunch of cool visual elements
00:08:50and animations that really makes the swarm process look very cool.
00:08:54You'll see that in a second.
00:08:56So for this test, I'm going to ask the agent swarm to gather as much information as it can
00:09:00about different models, which ones are the most used, and I'll ask K2.5 to gather all
00:09:06this information and consolidate it in a well formatted PDF document.
00:09:10And also if you do want the model to use the swarm, it is useful to ask it to do so because
00:09:16in one of my previous tests, I asked it to do a task and K2.5 concluded on its own that
00:09:23it didn't need to employ the swarm and gave me back some token credits.
00:09:27So if you really want to activate the swarm, just be sure to let it know.
00:09:31All right.
00:09:32So let's launch our task.
00:09:33And as soon as it starts, we can see these cool animations Kimi has on their chatbot interface.
00:09:39And this is honestly something that I've noticed Moonshot AI is very good at.
00:09:43They really excel at having a very playful, very gamified user experience, which kind
00:09:49of makes the whole process of using their tools much more fun.
00:09:52And again, Kimi is being cheeky here about the whole process as the model assigns the
00:09:57agents.
00:09:58And it even gives ID badges to each of them.
00:10:01And we can also track their task completion statuses in real time.
00:10:05And as the agents are completing the tasks, we can also follow their progress on the main
00:10:10window.
00:10:11We can see the web pages they are visiting and the code that they are producing.
00:10:15And at this point, you can also place your bets as to which agent will complete its task
00:10:20the fastest.
00:10:21Once the agent completes the task, you can see a little bubble pop up above their avatar.
00:10:26So roughly 10 and a half minutes later, my swarm has finished the given task and we get
00:10:31this PDF document as a result.
00:10:33It appears that there's there's a text here, but I can't quite seem to see it.
00:10:39OK, so I had to copy paste it somewhere to understand it.
00:10:43OK, so it says coding models, comparative analysis.
00:10:46OK, OK.
00:10:47Well, very bad design choice right from the start.
00:10:50But OK, let's let's not jump to conclusions.
00:10:53Let's look at the rest of the report.
00:10:55OK, we have an executive summary here.
00:10:58Major findings.
00:10:59Eighty one percent of devs use or plan to use AI.
00:11:03Fifty nine percent of devs run three AI tools in parallel.
00:11:06OK, OK, interesting.
00:11:08And we see here that Claude Code Opus 4.5 dominates the charts.
00:11:13And then we see market trends here.
00:11:16Forty six percent of devs actively distrusts AI outputs.
00:11:20And wow, this is surprising.
00:11:22GitHub copilot is the market leader with 42 percent of market share.
00:11:26Wow.
00:11:27Llama for Scout seems to have the largest context window with 10 million tokens.
00:11:31That is pretty impressive.
00:11:32OK, here it comes.
00:11:33The juicy parts.
00:11:34Key takeaways.
00:11:35OK, let's see.
00:11:36No single winner.
00:11:37Oh, come on.
00:11:39How lame.
00:11:41Forty five percent of AI generated code has vulnerabilities.
00:11:43Yeah, that is something to worry about for sure.
00:11:46Wait, I just noticed the report is based on publicly available information as of January
00:11:52twenty twenty five.
00:11:54Oh, no.
00:11:56Oh, no.
00:11:57Twenty twenty five.
00:11:59No, that's not what I asked for.
00:12:02I specifically asked it for information about the currently most used models currently.
00:12:09Why didn't you use data from January twenty twenty six?
00:12:14You're absolutely right.
00:12:15I should have researched data from twenty twenty five and January twenty twenty six.
00:12:21Typical LLM behavior.
00:12:23I am very disappointed in you, Kimi.
00:12:25I just wasted a bunch of tokens and 10 minutes of my time for outdated information.
00:12:30Oh, well.
00:12:31So there you have it.
00:12:32That is Kimi K2.5.
00:12:35Despite my utter disappointment in its ability to follow orders in the last test, I still
00:12:40think it's a pretty good model.
00:12:42I wouldn't say it's groundbreaking or state of the art, but it does have its upsides.
00:12:47I would certainly recommend it if you want to make a truly beautiful website, you know,
00:12:51something that you can display on awards dot com.
00:12:55Then I would definitely go with K2.5 opposed to one of the Claude Code models, for instance.
00:13:01And I got to be honest, the swarm feature looks very cool and it's definitely fun to
00:13:06use.
00:13:07But did you know that you can get the same feature using Claude Code?
00:13:10Richard just did a great video exploring that topic, so be sure to check that video out as
00:13:14well.
00:13:15And folks, if you found this video useful or at least entertaining, then let me know by
00:13:19smashing that like button underneath the video.
00:13:22And also be sure to subscribe to our channel so you don't miss out on any of our future
00:13:26technical breakdown videos.
00:13:28This has been Andris from Better Stack and I will see you in the next videos.