00:00:00The guys over at ZAI just dropped GLM 4.7, and at $29 a year, this is absurdly cheap for a model they claim hits 73% on SWE bench, right up there with Sonnet 4.5.
00:00:11The timing isn't random. They're going public and need to show western traction.
00:00:15They even did a live Q&A on Reddit, which I've never seen a Chinese AI lab do.
00:00:19But 4.6 had real problems. Is 4.7 actually fixed?
00:00:23Hey everyone, if you're new here, this is AI Labs, and welcome to another episode of Debunked,
00:00:27a series where we actually take AI tools and AI models, strip away the marketing hype, and show what they can actually do with real testing and honest results.
00:00:35The new model is mainly improved through post-training, not architecture change.
00:00:40It's heavily optimized for Claude code, and the ZAI team explicitly said this is their priority framework.
00:00:46Currently, it's actually beating a lot of the top tier models, including GPT-5, especially on coding benchmarks.
00:00:52In all of their coding plans, one additional thing they've added is these new MCP tools, which are not integrated directly.
00:00:58They're separate MCP servers. They've listed three right now.
00:01:01And for all of them to work, you just need an API key. That's why they're included with the plan, but separate from the model.
00:01:07As far as the usage limits go, they're pretty much the same as they were on 4.6.
00:01:11But if you don't know what they were before, I actually generated a report on that.
00:01:15What's funny is I first tried to generate it with Gemini 3, and for some reason it wasn't able to give me a proper comparison of the plans.
00:01:21I went to Claude again, and it researched it nicely.
00:01:24Basically, all you need to know is that for the entry-level plan, you get 10 to 40 prompts in Claude code,
00:01:29while in GLM coding, you're getting 120 prompts for just $3, which is a huge difference.
00:01:34This only increases as you go into the higher tiers, where the $200 plan gets you up to 800 prompts in that 5-hour window with Claude,
00:01:42while $30 gets you 2,400.
00:01:44All of these rates are discounted for the first month, then they double.
00:01:47But if you're on the yearly plan, it's much more affordable.
00:01:50Another significant benchmark was humanity's last exam.
00:01:53For those who don't know, it's one of those unsaturated benchmarks,
00:01:56and most newer models still score low on it because it's genuinely difficult.
00:02:00To actually test the UI, we do have this prompt, which doesn't really focus on the architecture.
00:02:05It mainly focuses on the design logic the model is supposed to implement, while also providing some design options.
00:02:11We can then see, based on the company I'm proposing, which in this case is an AI-powered code review platform, what it makes.
00:02:18We also subscribe to the MAX plan, and there are two ways you can actually connect it with Claude code.
00:02:22In both cases, you do change the settings.json, but one is located in the root of your project, which changes the global settings.
00:02:29If you do it inside your project, then it just changes it for that project.
00:02:33We did this so we could actually compare it with Sonnet 4.5.
00:02:36This is what Sonnet 4.5 came up with.
00:02:38The prompt is actually pretty good, and we've been using it to really identify which of these models build UI and how creative they are in doing that.
00:02:45It's simple vanilla JS, so we're not looking at the architecture right now, just the design.
00:02:49This is what GLM 4.7 came up with.
00:02:52In terms of the design, it's pretty good, but it did make an error here where it didn't really account for the length, which is why the artifacts are breaking up a little bit.
00:02:59Other than that, the design is solid, but I do not like these emojis at all.
00:03:02Sonnet did not use any emojis, which is good and does match the design language.
00:03:06To actually test them both out, I have this premade Next.js project, which has this context initialized that it needs to build a scalable and backend-ready UI.
00:03:15This part is important because, as I'm going to evaluate the reasons why GLM surprisingly performed better, it's going to come back to this point.
00:03:22Framer Motion and ShadCN components have been pre-installed for it to build the UI.
00:03:27Both of them have been asked to build the main browser page for a Netflix-like streaming platform.
00:03:31They've been specified on what to actually build and what needs to be on the page.
00:03:35If you're talking about the usability of the GLM model with Claude code, one problem with GLM 4.6 was that it was extremely slow in code generation.
00:03:43Here, that issue, in my experience, has not been solved. It's still extremely slow.
00:03:48But there is one change. With GLM 4.6, the model actually didn't think, meaning it didn't think inside Claude code.
00:03:54The detailed transcript you get here clearly shows thinking, but that wasn't showing in 4.6.
00:03:59You can clearly see here that it does think with the 4.7 model, so that's been fixed.
00:04:04Other than that, there are some quirks you need to know. GLM 4.7 is not that autonomous.
00:04:09I found this during my testing. As you can see here, this GLM folder already has a UI benchmark folder in which it needs to implement the app, but it chose to ignore that.
00:04:18Although it was clearly written inside the context, it went ahead and made another Next.js app on its own.
00:04:22It didn't even initialize it, it just started writing code. Sometimes it does act really dumb.
00:04:27But after I corrected it and steered it in the right direction, in terms of the implementation, this is what Claude created.
00:04:32Again, being the higher model, it's pretty good at UI.
00:04:35This is what GLM 4.7 created. Claude obviously created a better UI because, in our opinion, it's still better at design.
00:04:42For the price, that is okay. But after I looked at the code and dug into it, since they were told this was supposed to be back and ready and that for now they need to use mock data,
00:04:50the GLM model actually implemented a better architecture by placing all the mock data in one file.
00:04:56Then when we need to swap it out, we just need to change that file because the imports are connected there, as opposed to what Claude implemented where every other component has its own import.
00:05:05When we actually do implement the backend, we'll have to change all of those files one by one.
00:05:09In terms of basic architecture and code quality, GLM actually did pretty well, and it surprised me because 4.6 wasn't this good in my testing.
00:05:17The previous plan wasn't really justified by how much I had to steer it and how many mistakes it made, but this one is definitely a huge leap.
00:05:24Those benchmarks are definitely justified by the testing I've done.
00:05:27I've also looked at a few other small things in the code, and GLM 4.7 is actually a good model.
00:05:32Given these unexpected results, we're honestly recommending everyone getting the $29 per year plan.
00:05:38If you already have the $20 Claude plan, this is basically nothing in comparison.
00:05:42That said, it's still not a model you'd use for completely autonomous coding.
00:05:46Even though Claude really messed up the architecture here, it's good enough that it can correct and improve on that later.
00:05:52But with the small quirks GLM still has, we don't think it's a good idea to solely depend on it.
00:05:57That brings us to the end of this video.
00:05:58If you'd like to support the channel and help us keep making videos like this, you can do so by using the super thanks button below.
00:06:05As always, thank you for watching and I'll see you in the next one.