I Have Never Seen Anything Like This

AAI LABS
컴퓨터/소프트웨어창업/스타트업경영/리더십AI/미래기술

Transcript

00:00:00Can you actually train a model to be a better manager?
00:00:02Moonshot recently released KimiK 2.5 and called it the most powerful open-source model to date.
00:00:08That claim is already off because it's open-weight, not open-source.
00:00:11There's a difference, but that's not the point here.
00:00:13Kimi 2.5 makes two claims that are actually worth testing.
00:00:17First, it says it was trained from the ground up to orchestrate agent swarms,
00:00:21with up to 100 sub-agents running in parallel.
00:00:23The reinforcement learning setup doesn't just reward correct answers,
00:00:27but also how effectively the model distributes work across agents.
00:00:30Second, it claims that it has visual agentic intelligence,
00:00:33and said that it generated extremely high-level animations with just a single prompt.
00:00:37Now, instead of people claiming they built it in one shot, it's the creators themselves claiming it.
00:00:42So, we had one of our team members test both.
00:00:44Some of what we found lived up to the hype, some of it didn't.
00:00:48As I mentioned, Kimi 2.5 claims to be an open-source model.
00:00:51Actually, Kimi 2.5 is not an open-source model.
00:00:54According to the definition given by the Open Source Initiative,
00:00:57open-source models mean the code, training data, and methodologies should be publicly available,
00:01:02allowing anyone to inspect, modify, and distribute them.
00:01:05But for this model, it's just an open-weight model.
00:01:07An open-weight model only makes the final weights available,
00:01:10meaning neither the training code nor the training dataset is publicly released.
00:01:14It only contains the weights, which are released so others can fine-tune, adapt, or deploy the model for their own projects.
00:01:20Now, this model's architecture is very similar to DeepSeek's mixture of expert model architecture.
00:01:25It contains 1 trillion parameters, with only 32 billion parameters activated.
00:01:30Does that mean we're not using the model at full capacity?
00:01:33It answers to the same accuracy as a 1 trillion parameter model would,
00:01:36but with much lower processing power and cost.
00:01:39This difference between the total parameters and the activated parameters
00:01:43is the key reason why this model is claimed to be one of the fastest open-weight models out there.
00:01:47Only a few activated parameters means only a few are being used per query,
00:01:52and this significantly speeds up the model.
00:01:54This is the core reason why it's so cheap compared to other models.
00:01:57They say this is a native multimodal model and delivers state-of-the-art coding and vision capabilities.
00:02:03But this is the same claim every model makes about being state-of-the-art, better than others, and all that.
00:02:08So our team had to test it to verify for ourselves, and we'll show you what we found.
00:02:12But before we move ahead to its actually unique capabilities, let's have a word from the sponsor.
00:02:16Opera Neon. This is the Opera's first agentic browser,
00:02:19designed specifically for power users ready to experience the future.
00:02:23Neon uses Tasks, which replaces chaotic tabs with focused workspaces
00:02:27where the AI can analyze and act across multiple tabs within the same context.
00:02:32Imagine needing a quick utility for work.
00:02:34Instead of opening an IDE, simply use Neon Make.
00:02:37Type prompt like "Make a Cyberpunk Pomodoro Timer"
00:02:40and the browser spawns a virtual machine to generate the agenda,
00:02:43write the code, and deploy the app instantly.
00:02:45It's a massive time saver for daily workflows, allowing you to prototype concepts
00:02:50or automate research via Neon Do without ever breaking your flow.
00:02:53It acts like a junior developer built directly into the interface.
00:02:56I'll definitely be using these Neon cards to automate my prompts.
00:02:59You can subscribe to Opera Neon today. Don't just watch the agentic shift.
00:03:03Be a part of it. The link is in the description.
00:03:05The Kimi model is able to direct a swarm of agents, coordinating tasks amongst them.
00:03:10Now you might think that Claude also does that and spawns multiple sub-agents based on the required task.
00:03:15But here's how this model is different.
00:03:17Kimi 2.5 as a model has learned to self-direct an agent swarm of up to 100 sub-agents,
00:03:23executing parallel workflows across 1,500 coordinated steps by parallel agent reinforcement learning.
00:03:29For those who don't know, reinforcement learning is a process where the model is rewarded
00:03:33when it performs well and penalized when it strays from the objective.
00:03:36Most models are rewarded based on performance alone.
00:03:39But in this case, the model is also rewarded based on how well it can parallelize steps
00:03:43and act as an orchestrator.
00:03:44To put it simply, the Kimi model is trained to be an orchestrator.
00:03:48Its success criteria is its ability to create sub-agents and assign tasks.
00:03:53The orchestrator is built in with tools for creating sub-agents, assigning tasks, and other related functions.
00:03:58It creates sub-agents for various tasks, assigns them those tasks,
00:04:02receives results from them, and then coordinates everything into a final result.
00:04:06According to them, they used this swarm method to improve performance on complex tasks.
00:04:11And in internal evaluations, it resulted in an 80% reduction in end-to-end runtime.
00:04:16This means they were able to execute much more complex, long-horizon tasks.
00:04:20They compared it with the best models for long-range tasks,
00:04:23namely Opus 4.5 and Kimi 2.5 without the swarm,
00:04:26and found that the Kimi 2.5 agent swarm surpassed all models across their benchmarks.
00:04:32They were also able to save considerable time by using agents instead of running a single agent.
00:04:36So those were all claims based on what they said.
00:04:39To test these claims, we installed the KimiCode CLI,
00:04:42which is a new coding agent that was released with this model.
00:04:45We had already built a UI and wanted to migrate it to a different component structure.
00:04:49The UI was built using ShadCN, and we wanted to rebuild it using Material UI.
00:04:53The project had multiple pages,
00:04:55so we asked Kimi to change the UI of the entire project from ShadCN to Material UI,
00:05:00and to use agents to handle each page,
00:05:02so that this migration could be done faster in parallel.
00:05:05It started exploring the directory, similar to how ClodCode does.
00:05:08It created a to-do list containing every page that needed to be converted to Material UI.
00:05:13It grouped similar pages together,
00:05:15such as auth pages like signup, login, and forgot password to handle them more efficiently.
00:05:20However, it spawned more agents than we were expecting,
00:05:23which we later found out was a bug in the CLI.
00:05:26It had just used five agents to perform the task,
00:05:28which was expected for a new product.
00:05:30It took around 15 minutes to complete the task,
00:05:32which we thought would be reduced using the parallel agents.
00:05:35It finished by verifying and cleaning everything.
00:05:38Some components were no longer being used after the migration,
00:05:41and it cleaned those up as well.
00:05:43It made sure all dependencies were installed and updated,
00:05:45including test files, and validated the rest.
00:05:48Once that was done, it ensured that all dependencies required for ShadCN were removed,
00:05:53leaving the project without any unused dependencies,
00:05:55which most agents tend to forget and end up bloating the project unnecessarily.
00:05:59It tweaked the UI slightly.
00:06:01For example, the hero section originally had text and visuals side by side,
00:06:05but it changed them to be stacked vertically.
00:06:07Other than that, everything looked almost exactly the same,
00:06:10with just the components switched.
00:06:12Even though it was a big task, it only used 25% of the context window,
00:06:16meaning it can run effectively on long-running agents.
00:06:19So the agent swarm works, but it's not always faster
00:06:22and will take longer on a large-scale codebase.
00:06:24You've probably noticed we build a lot in these videos.
00:06:27All the prompts, the code, the templates, you know,
00:06:29the stuff you'd normally have to pause and copy from the screen.
00:06:32It's all in our community, this video, and every video before it, too.
00:06:35Links in the description.
00:06:37The key selling point of Kimi 2.5 is its visual agentic intelligence.
00:06:41It's claimed to be particularly strong in front-end capabilities.
00:06:44It can interact with and implement interactive layouts and rich animations,
00:06:48such as scrolling through text.
00:06:50They provided multiple examples of animations which were all created well.
00:06:53Here's where it really stands out.
00:06:55Kimi 2.5 excels at coding with vision, going beyond just text and image prompts.
00:07:00It can even take videos as input and generate code,
00:07:03making it one of the first models able to do so.
00:07:06This made explaining code flows much easier.
00:07:08This multimodal capability was not added later after training.
00:07:12It was integrated during model training.
00:07:14Most models incorporate additional capabilities
00:07:16only after their text capabilities are strong enough,
00:07:19which often leads to a trade-off between vision and text abilities.
00:07:23But with Kimi 2.5's training methodology,
00:07:25this trade-off disappears and both capabilities improve together.
00:07:29Now, we had to test it ourselves.
00:07:30We screen-recorded navigating around the Notion new page interface and using slash commands.
00:07:35We kept the recording small because the documentation mentions that videos are limited to 40 megabytes.
00:07:40We provided the path to the Notion recording and asked it to clone the website shown in the video.
00:07:45We didn't specifically tell it in the prompt what the recording was,
00:07:48so it used the read media file tool to analyze the video.
00:07:52It concluded that the interface was Notion-like, identified all the features,
00:07:56and determined it was a Notion clone with a Mac OS-style window.
00:07:59Once it had listed what was in the file, it started implementing it.
00:08:02If you are using video processing in your own projects, remember this.
00:08:06Videos and images can exhaust the context window quickly,
00:08:09so be careful with large files and watch out for context bloating.
00:08:12When it replicated the interface, it was accurate.
00:08:15The UI was editable, including page icons and features from Notion,
00:08:18even though some weren't fully functional at first.
00:08:21The slash commands weren't working yet, but the overall UI was accurate.
00:08:25It would have been better if the slash commands were implemented, as that's a key part of the workflow.
00:08:29But this was a minor issue that could be fixed by reiteration.
00:08:32So we gave it a prompt, asking it to fix the issues we were having with the implementation.
00:08:37From there, it self-iterated, implementing fixes, checking the results,
00:08:41and ensuring the feature worked correctly without needing any additional prompt from us.
00:08:46This reiteration eventually fixed the slash command issue,
00:08:49making the whole interface feel like a functional Notion clone.
00:08:52So it is living up to the model claims.
00:08:54After working through a few issues, we think it could be a cheaper alternative to Claude code,
00:08:58given Claude's plans are known to be expensive, and Kimi's plans are lower priced.
00:09:03That brings us to the end of this video.
00:09:05If you'd like to support the channel and help us keep making videos like this,
00:09:08you can do so by joining AI Labs Pro.
00:09:10As always, thank you for watching, and I'll see you in the next one.

Key Takeaway

Kimi 2.5 represents a significant advancement in agentic AI by integrating native multimodal vision and a specialized reinforcement learning framework designed to orchestrate massive parallel agent swarms for complex coding tasks.

Highlights

Moonshot released Kimi 2.5, an open-weight model featuring a 1 trillion parameter mixture-of-experts (MoE) architecture with 32 billion active parameters.

The model is specifically trained using parallel reinforcement learning to orchestrate agent swarms of up to 100 sub-agents.

Kimi 2.5 demonstrates native multimodal capabilities, allowing it to process video inputs to generate functional code and complex UI animations.

Internal testing showed an 80% reduction in end-to-end runtime for complex tasks by using the agent swarm methodology.

A real-world test successfully migrated a UI project from ShadCN to Material UI, including automatic dependency cleanup and verified page-by-page conversion.

The model offers a cost-effective alternative to competitors like Claude, maintaining high accuracy with lower processing requirements.

Timeline

Introduction and Open-Weight Architecture

The speaker introduces Kimi 2.5 from Moonshot, clarifying that it is an open-weight model rather than strictly open-source because the training data remains private. The architecture utilizes a Mixture of Experts (MoE) design with 1 trillion total parameters, though only 32 billion are activated per query to ensure high speed and low cost. This section highlights two primary claims: the ability to orchestrate 100 parallel agents and the inclusion of visual agentic intelligence. The speaker emphasizes that the model was trained from the ground up to handle these specific tasks. This technical foundation sets the stage for the practical performance tests that follow in the video.

Sponsor Segment: Opera Neon Browser

This segment features a sponsored look at Opera Neon, described as the first agentic browser designed for power users. It introduces features like 'Tasks' for focused workspaces and 'Neon Make' for instant app deployment via virtual machines. The speaker demonstrates how a user can prompt the browser to create tools, such as a Cyberpunk Pomodoro Timer, without leaving the interface. This section connects the broader theme of AI agents to the tools users interact with daily. It positions the browser as a 'junior developer' built directly into the web navigation experience.

The Mechanics of Agent Swarms

The video explains how Kimi 2.5 uses parallel reinforcement learning to reward the model for effective work distribution across sub-agents. Unlike standard models, Kimi is penalized if it fails to orchestrate tasks efficiently, aiming for up to 1,500 coordinated steps. Moonshot claims this approach leads to an 80% reduction in runtime for long-horizon tasks compared to running a single agent. The orchestrator is equipped with built-in tools to create, assign, and receive results from these sub-agents. This section highlights the shift from performance-only metrics to organizational and management-based AI training.

Coding Test: UI Migration and CLI Performance

The team tests Kimi 2.5 by using its CLI to migrate a web project from ShadCN to Material UI across multiple pages. The model successfully identifies the file structure, creates a to-do list, and groups related authentication pages to process them in parallel. Although a minor bug caused it to spawn more agents than expected, it completed the full migration and cleanup in approximately 15 minutes. Notably, the model removed unused dependencies and verified the final build, which is a common failure point for other AI agents. This demonstration proves that the agent swarm can handle large-scale codebase modifications with minimal human intervention.

Visual Agentic Intelligence and Video Analysis

This section explores Kimi 2.5's ability to take video files as input to generate functional code, a feature integrated during the initial training phase. The presenters test this by providing a screen recording of the Notion interface and asking the model to clone the UI. Kimi accurately identifies the Mac OS-style window and Notion-like features, then self-iterates to fix non-functional slash commands. The speaker warns that video inputs can quickly bloat the context window, but notes the model's high accuracy in replicating complex layouts. The test concludes that Kimi is a viable, lower-priced alternative to expensive tools like Claude for front-end development.

Conclusion and Community Support

The video wraps up by summarizing the impressive capabilities of Kimi 2.5 in both parallel processing and visual intelligence. The speaker invites viewers to join their AI Labs Pro community to access the prompts, code, and templates used during the testing process. This final segment serves as a call to action for viewers interested in implementing these agentic workflows themselves. It reinforces the channel's focus on practical, hands-on AI testing and resource sharing. The host thanks the audience and signals the end of the technical deep dive.

Community Posts

View all posts