00:00:00Can you actually train a model to be a better manager?
00:00:02Moonshot recently released KimiK 2.5 and called it the most powerful open-source model to date.
00:00:08That claim is already off because it's open-weight, not open-source.
00:00:11There's a difference, but that's not the point here.
00:00:13Kimi 2.5 makes two claims that are actually worth testing.
00:00:17First, it says it was trained from the ground up to orchestrate agent swarms,
00:00:21with up to 100 sub-agents running in parallel.
00:00:23The reinforcement learning setup doesn't just reward correct answers,
00:00:27but also how effectively the model distributes work across agents.
00:00:30Second, it claims that it has visual agentic intelligence,
00:00:33and said that it generated extremely high-level animations with just a single prompt.
00:00:37Now, instead of people claiming they built it in one shot, it's the creators themselves claiming it.
00:00:42So, we had one of our team members test both.
00:00:44Some of what we found lived up to the hype, some of it didn't.
00:00:48As I mentioned, Kimi 2.5 claims to be an open-source model.
00:00:51Actually, Kimi 2.5 is not an open-source model.
00:00:54According to the definition given by the Open Source Initiative,
00:00:57open-source models mean the code, training data, and methodologies should be publicly available,
00:01:02allowing anyone to inspect, modify, and distribute them.
00:01:05But for this model, it's just an open-weight model.
00:01:07An open-weight model only makes the final weights available,
00:01:10meaning neither the training code nor the training dataset is publicly released.
00:01:14It only contains the weights, which are released so others can fine-tune, adapt, or deploy the model for their own projects.
00:01:20Now, this model's architecture is very similar to DeepSeek's mixture of expert model architecture.
00:01:25It contains 1 trillion parameters, with only 32 billion parameters activated.
00:01:30Does that mean we're not using the model at full capacity?
00:01:33It answers to the same accuracy as a 1 trillion parameter model would,
00:01:36but with much lower processing power and cost.
00:01:39This difference between the total parameters and the activated parameters
00:01:43is the key reason why this model is claimed to be one of the fastest open-weight models out there.
00:01:47Only a few activated parameters means only a few are being used per query,
00:01:52and this significantly speeds up the model.
00:01:54This is the core reason why it's so cheap compared to other models.
00:01:57They say this is a native multimodal model and delivers state-of-the-art coding and vision capabilities.
00:02:03But this is the same claim every model makes about being state-of-the-art, better than others, and all that.
00:02:08So our team had to test it to verify for ourselves, and we'll show you what we found.
00:02:12But before we move ahead to its actually unique capabilities, let's have a word from the sponsor.
00:02:16Opera Neon. This is the Opera's first agentic browser,
00:02:19designed specifically for power users ready to experience the future.
00:02:23Neon uses Tasks, which replaces chaotic tabs with focused workspaces
00:02:27where the AI can analyze and act across multiple tabs within the same context.
00:02:32Imagine needing a quick utility for work.
00:02:34Instead of opening an IDE, simply use Neon Make.
00:02:37Type prompt like "Make a Cyberpunk Pomodoro Timer"
00:02:40and the browser spawns a virtual machine to generate the agenda,
00:02:43write the code, and deploy the app instantly.
00:02:45It's a massive time saver for daily workflows, allowing you to prototype concepts
00:02:50or automate research via Neon Do without ever breaking your flow.
00:02:53It acts like a junior developer built directly into the interface.
00:02:56I'll definitely be using these Neon cards to automate my prompts.
00:02:59You can subscribe to Opera Neon today. Don't just watch the agentic shift.
00:03:03Be a part of it. The link is in the description.
00:03:05The Kimi model is able to direct a swarm of agents, coordinating tasks amongst them.
00:03:10Now you might think that Claude also does that and spawns multiple sub-agents based on the required task.
00:03:15But here's how this model is different.
00:03:17Kimi 2.5 as a model has learned to self-direct an agent swarm of up to 100 sub-agents,
00:03:23executing parallel workflows across 1,500 coordinated steps by parallel agent reinforcement learning.
00:03:29For those who don't know, reinforcement learning is a process where the model is rewarded
00:03:33when it performs well and penalized when it strays from the objective.
00:03:36Most models are rewarded based on performance alone.
00:03:39But in this case, the model is also rewarded based on how well it can parallelize steps
00:03:43and act as an orchestrator.
00:03:44To put it simply, the Kimi model is trained to be an orchestrator.
00:03:48Its success criteria is its ability to create sub-agents and assign tasks.
00:03:53The orchestrator is built in with tools for creating sub-agents, assigning tasks, and other related functions.
00:03:58It creates sub-agents for various tasks, assigns them those tasks,
00:04:02receives results from them, and then coordinates everything into a final result.
00:04:06According to them, they used this swarm method to improve performance on complex tasks.
00:04:11And in internal evaluations, it resulted in an 80% reduction in end-to-end runtime.
00:04:16This means they were able to execute much more complex, long-horizon tasks.
00:04:20They compared it with the best models for long-range tasks,
00:04:23namely Opus 4.5 and Kimi 2.5 without the swarm,
00:04:26and found that the Kimi 2.5 agent swarm surpassed all models across their benchmarks.
00:04:32They were also able to save considerable time by using agents instead of running a single agent.
00:04:36So those were all claims based on what they said.
00:04:39To test these claims, we installed the KimiCode CLI,
00:04:42which is a new coding agent that was released with this model.
00:04:45We had already built a UI and wanted to migrate it to a different component structure.
00:04:49The UI was built using ShadCN, and we wanted to rebuild it using Material UI.
00:04:53The project had multiple pages,
00:04:55so we asked Kimi to change the UI of the entire project from ShadCN to Material UI,
00:05:00and to use agents to handle each page,
00:05:02so that this migration could be done faster in parallel.
00:05:05It started exploring the directory, similar to how ClodCode does.
00:05:08It created a to-do list containing every page that needed to be converted to Material UI.
00:05:13It grouped similar pages together,
00:05:15such as auth pages like signup, login, and forgot password to handle them more efficiently.
00:05:20However, it spawned more agents than we were expecting,
00:05:23which we later found out was a bug in the CLI.
00:05:26It had just used five agents to perform the task,
00:05:28which was expected for a new product.
00:05:30It took around 15 minutes to complete the task,
00:05:32which we thought would be reduced using the parallel agents.
00:05:35It finished by verifying and cleaning everything.
00:05:38Some components were no longer being used after the migration,
00:05:41and it cleaned those up as well.
00:05:43It made sure all dependencies were installed and updated,
00:05:45including test files, and validated the rest.
00:05:48Once that was done, it ensured that all dependencies required for ShadCN were removed,
00:05:53leaving the project without any unused dependencies,
00:05:55which most agents tend to forget and end up bloating the project unnecessarily.
00:05:59It tweaked the UI slightly.
00:06:01For example, the hero section originally had text and visuals side by side,
00:06:05but it changed them to be stacked vertically.
00:06:07Other than that, everything looked almost exactly the same,
00:06:10with just the components switched.
00:06:12Even though it was a big task, it only used 25% of the context window,
00:06:16meaning it can run effectively on long-running agents.
00:06:19So the agent swarm works, but it's not always faster
00:06:22and will take longer on a large-scale codebase.
00:06:24You've probably noticed we build a lot in these videos.
00:06:27All the prompts, the code, the templates, you know,
00:06:29the stuff you'd normally have to pause and copy from the screen.
00:06:32It's all in our community, this video, and every video before it, too.
00:06:35Links in the description.
00:06:37The key selling point of Kimi 2.5 is its visual agentic intelligence.
00:06:41It's claimed to be particularly strong in front-end capabilities.
00:06:44It can interact with and implement interactive layouts and rich animations,
00:06:48such as scrolling through text.
00:06:50They provided multiple examples of animations which were all created well.
00:06:53Here's where it really stands out.
00:06:55Kimi 2.5 excels at coding with vision, going beyond just text and image prompts.
00:07:00It can even take videos as input and generate code,
00:07:03making it one of the first models able to do so.
00:07:06This made explaining code flows much easier.
00:07:08This multimodal capability was not added later after training.
00:07:12It was integrated during model training.
00:07:14Most models incorporate additional capabilities
00:07:16only after their text capabilities are strong enough,
00:07:19which often leads to a trade-off between vision and text abilities.
00:07:23But with Kimi 2.5's training methodology,
00:07:25this trade-off disappears and both capabilities improve together.
00:07:29Now, we had to test it ourselves.
00:07:30We screen-recorded navigating around the Notion new page interface and using slash commands.
00:07:35We kept the recording small because the documentation mentions that videos are limited to 40 megabytes.
00:07:40We provided the path to the Notion recording and asked it to clone the website shown in the video.
00:07:45We didn't specifically tell it in the prompt what the recording was,
00:07:48so it used the read media file tool to analyze the video.
00:07:52It concluded that the interface was Notion-like, identified all the features,
00:07:56and determined it was a Notion clone with a Mac OS-style window.
00:07:59Once it had listed what was in the file, it started implementing it.
00:08:02If you are using video processing in your own projects, remember this.
00:08:06Videos and images can exhaust the context window quickly,
00:08:09so be careful with large files and watch out for context bloating.
00:08:12When it replicated the interface, it was accurate.
00:08:15The UI was editable, including page icons and features from Notion,
00:08:18even though some weren't fully functional at first.
00:08:21The slash commands weren't working yet, but the overall UI was accurate.
00:08:25It would have been better if the slash commands were implemented, as that's a key part of the workflow.
00:08:29But this was a minor issue that could be fixed by reiteration.
00:08:32So we gave it a prompt, asking it to fix the issues we were having with the implementation.
00:08:37From there, it self-iterated, implementing fixes, checking the results,
00:08:41and ensuring the feature worked correctly without needing any additional prompt from us.
00:08:46This reiteration eventually fixed the slash command issue,
00:08:49making the whole interface feel like a functional Notion clone.
00:08:52So it is living up to the model claims.
00:08:54After working through a few issues, we think it could be a cheaper alternative to Claude code,
00:08:58given Claude's plans are known to be expensive, and Kimi's plans are lower priced.
00:09:03That brings us to the end of this video.
00:09:05If you'd like to support the channel and help us keep making videos like this,
00:09:08you can do so by joining AI Labs Pro.
00:09:10As always, thank you for watching, and I'll see you in the next one.