Kimi K2.6 vs Claude Code: Why I’m Switching to the $39 Plan (Full Model Breakdown)
BBetter Stack
Computing/SoftwareAdvertising/MarketingSmall Business/StartupsInternet Technology
Transcript
00:00:00So, as you can see by this example, you can basically kickstart your own little web agency
00:00:04business in 40 minutes using this Kimi feature.
00:00:07Well folks, Moonshot AI is back with another update on their flagship model Kimi.
00:00:13Kimi K 2.6 is their latest model which promises state-of-the-art coding,
00:00:18long horizon execution, and agent swarm capabilities.
00:00:22In today's video, we're gonna take a look at this new model, see how it performs on different
00:00:27agentic tasks, and find out if it's really as impressive as advertised.
00:00:32It's gonna be a lot of fun, so let's dive into it.
00:00:34Few months ago, I reviewed Kimi K 2.5 and it did really well on front-end design.
00:00:44And I was genuinely impressed by their agent swarm feature.
00:00:48I also loved the fact that they put so much effort into UX on their own chatbot page.
00:00:54So, in this next iteration, Kimi promises some pretty massive
00:00:57jumps in how we actually use AI agents in a production environment.
00:01:02First up, the agent swarm has basically tripled.
00:01:05In K 2.5, we were looking at about 100 sub-agents, but K 2.6 scales this horizontally
00:01:12to 300 specialized agents that can execute up to 4,000 coordinated steps.
00:01:18So, this is a pretty massive update.
00:01:20So, now you can run more parallel tasks at the same time.
00:01:24They've also added a new preserved thinking mode,
00:01:26which keeps the model's reasoning trace consistent across multi-turn tasks.
00:01:31It stops the memory drift that usually happens when you're deep into a complex workflow.
00:01:36And then we have the long horizon tasks.
00:01:39And in their own tests, it managed a 13-hour engineering task with 185% throughput gain.
00:01:46And when it comes to aesthetics, it has moved into coding-driven design.
00:01:51Instead of just making a pretty landing page, K 2.6 uses their own native vision encoder
00:01:57called MoonVIT to reason about UI and UX structures at a deeper level.
00:02:03It can now handle full stack workflows from authentication to database logging,
00:02:08turning a single visual reference or a prompt into a fully functional interactive prototype
00:02:14with GSAP animations and scrawled triggered effects and all sorts of goodies.
00:02:19And by the way, all of this is open source, including the vision encoder.
00:02:23So, if you wanted to, you could actually run it standalone,
00:02:26detached from Kimmy's architecture if you get the model from Hugging Face.
00:02:30So, all of that sounds very impressive, but let's test it out and see how it actually performs.
00:02:35And Kimmy's models are open source, so you could theoretically use it in any setup you prefer.
00:02:40But in order to test out their Agent Swarm, I'm going to be using their own chatbot interface
00:02:46for the best results. First off, I want to try the new Agent Swarm feature.
00:02:50Looking at their examples on their site, one thing really caught my attention.
00:02:54It was this section where K 2.6 was used to identify 30 retail stores in Los Angeles without
00:03:01official websites from Google Maps and generate high converting landing pages for each of them.
00:03:06I've seen this trend floating around Instagram, so I want to try it out and see if we can actually
00:03:11create our own little web agency. So, for this test, I decided to do something similar.
00:03:16So, I live in Toronto and what I've noticed is that a lot of notaries around my area have either
00:03:21non-existent or very outdated websites. So, I thought it would be a cool idea to identify
00:03:2620 notaries around the greater Toronto area and look them up either on Google Maps or on the
00:03:32Canadian Yellow pages and generate high converting landing pages for each of them.
00:03:37And for this task, Kimmy spun up five sub-agents, each dedicated to one of the sub-tasks.
00:03:43And it was interesting to see how the sub-agents actually navigated the web and visited the
00:03:48websites. And they even tried to estimate whether the website is outdated or not using their own
00:03:53judgment. In total, it took Kimmy roughly 40 minutes to finish the entire workflow.
00:03:58But by the end of it, Kimmy produced this very detailed analysis report on all the findings
00:04:03about each of the websites. And it even generated a sample outreach email I could send to prospective
00:04:09clients for website proposals, along with another report of the overall market size and revenue
00:04:16impact potential. And alongside of it, of course, we also got all of the files generated. And there's
00:04:22also a dedicated page that Kimmy generated where I can preview each of the landing pages. And I do
00:04:27have to say all of these landing pages look exactly the same, which is kind of disappointing. But maybe
00:04:32that was my mistake of not providing a detailed enough prompt, although their own website did have
00:04:38the same prompt I used. But maybe they had some pre-configured harnesses. So as a follow-up task,
00:04:43I asked Kimmy to go through each of these pages and apply a unique style for each of them, and also add
00:04:49some images to make the landing pages more exciting. And here we can see that Kimmy actually generated
00:04:55custom images for each of the sites. And as a funny side note, I also noticed how Kimmy likes
00:05:00to praise itself. For example, here, all the 20 images are stunning. Now I'll build 20 completely
00:05:06unique landing pages. I mean, okay, Kimmy, but let me be the judge of that. But anyway, it took Kimmy
00:05:12around 17 minutes to finish this follow-up task. And once again, we see here Kimmy praising itself
00:05:17that the landing pages look fantastic. Okay, Kimmy, okay. So now we got a new preview page,
00:05:23but this one is weirdly broken. Probably there was a CSS styling issue or something, but that's okay.
00:05:28I'll disregard that. I'm more interested in the web pages themselves. So the new pages do look a lot
00:05:34better because now we got these nice looking CSS animations, and each web page now has an image
00:05:40header, AI generated, but nonetheless, it's a nice header. But I also noticed that each of the page
00:05:45still follows the same pattern. We have the same sections, the same site structure. So although
00:05:51all of these pages do look different now, they still have the same boilerplate, which is kind
00:05:56of disappointing because I was really hoping for a more unique approach. But this is a good start
00:06:02nonetheless. So as you can see by this example, you can basically kickstart your own little web agency
00:06:07business in 40 minutes using this Kimmy feature. Just ask Kimmy's agent swarm to go through your
00:06:13local businesses and contact each of them with a custom tailored website and the custom tailored
00:06:18outreach email and you basically got a good side gig going. I imagine after this video, every little
00:06:25local business is going to start getting like hundreds of these proposal emails with custom
00:06:29tailored AI generated websites. But hey, it is what it is. And I do have to know that to use this agent
00:06:36swarm feature, you have to be on their allegretto plan. But also I do have to say that I'm pretty
00:06:41sure it's a lot cheaper than asking Claude code to do the same task. It's just a shame that Kimmy's
00:06:46usage stats don't provide us more details on how many tokens were spent on this gigantic 40 minute
00:06:53task. But I have a feeling that I would certainly have burned through all of my usage limits by now
00:06:58if I used Claude to do the same thing. So anyway, that is the new improved agent swarm. And by the
00:07:04way, if you recently used Kimmy's agent swarm to conduct some interesting experiments, share your
00:07:09findings in the comments down below. Now I want to test how Kimmy has improved in terms of coding.
00:07:14So they claim that 2.6 has seen strong improvements in long horizon coding tasks with reliable
00:07:20generalization. So for this task, I decided to ask Kimmy to create a simple web app with a front end
00:07:26and a back end interface that also handles web scraping. So we probably all heard how insanely
00:07:31expensive RAM prices have become in the recent months. So I thought it would be a cool idea to
00:07:36create a price comparison website that actively scrapes price data for various RAMs and gives you
00:07:42a comparison table to find the cheapest options out there. So it took Kimmy roughly 12 minutes to
00:07:47finish this task. And I can see that they've now actually added a token counter in their newest
00:07:52CLI version. So we can now keep track on our actual token spend, which is pretty cool. So here's the
00:07:59end result. And as you can see here, it shows a nice dark theme for the site. And we can toggle
00:08:05through individual brands. And we can also see different price options from different stores for
00:08:11each of the RAMs. And what's even cooler is that we can trigger a live refresh, which actively
00:08:16rescapes the store data. It's a shame it couldn't fetch most of the product images,
00:08:21but most of the functionality is there. And it also has a comparison section. But there's no way to add
00:08:27anything to it. So then I had to ask Kimmy a follow up task to fix this issue. And now we get this nice
00:08:34add to compare button. And if we move to the compare tab, we now get this cool comparison table
00:08:41of all the RAMs selected. So that's pretty good. And looking at the code, I see that it chose to
00:08:46build the site using bare bones Node JS and Express. And it didn't even use react but instead opted for
00:08:53this vanilla JavaScript version where every change modifies the inner HTML element directly, which is
00:08:59an interesting choice. But hey, if it works, I can't complain. And lastly, it even added these nice
00:09:05scraper functions that use axios and cheerio to scrape Amazon, new egg and Best Buy. So that's
00:09:13pretty cool. So there you have it, folks, that is the new Kimmy K 2.6 model. And to be honest,
00:09:19judging by all of the tests we've done today, I wouldn't say it's a massively forward from 2.5.
00:09:25But there are some really nice quality of life improvements. And I appreciate the fact that
00:09:30moonshot AI keeps improving their platform a lot. And I also love the fact that they provide a solid
00:09:36cheaper alternative to some of the more expensive behemoths out there like Claude code. So overall,
00:09:43great job moonshot AI keep up the good work. And I'm certainly excited to see how Kimmy improves in
00:09:48the future. And folks, if you found this video useful or informative, please let me know by
00:09:53smashing that like button underneath the video. And also be sure to subscribe to our channel so you
00:09:58don't miss out on any of our future technical breakdown videos. This has been Andres from
00:10:04better stack and I will see you in the next videos.