GLM 5.2 is my new favorite model...

BBetter Stack
Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00best open model in the world right now isn't from a company called open ai it's of course from a
00:00:04chinese lab and this one is glm 5.2 from zai this model is seriously impressive matching gpt 5.5 on
00:00:10certain benchmarks and there's even a category where it appears to be beating fable all while
00:00:15being mit licensed open way let's check it out so glm 5.2 is a 744 billion total parameter model
00:00:26with 40 billion active parameters and it's actually the same size as its predecessor glm 5.1
00:00:31which is why it's very impressive that they made such a leap on the intelligence index
00:00:35from artificial analysis this is a combined score across a bunch of benchmarks so reasoning coding
00:00:40science the whole lot glm 5.2 here got a score of 51 which is 11 ahead of its previous iteration
00:00:45and the top open model by a pretty healthy margin you can see quen 3.7 is next then minimax m3
00:00:51followed by kimmy k 2.6 this actually places it in the same realm as gemini 3.5 flash and gpt 5.4 on
00:00:57a max effort which is pretty insane and on a few of the benchmarks included in this index like gpt val
00:01:03actually outscores gpt 5.5 if we focus on coding specifically it's still great on the coding index
00:01:09it scores the same as gemini 3.1 pro and actually beats sonic 4.6 and it isn't even that far off the
00:01:14top frontier models it's also a fair bit ahead of kimmy k 2.7 code which is our newest model which i know a
00:01:19lot of people myself included are big fans off i've just always found the kimmy models have a really
00:01:23nice feel to them outside of the coding index another benchmark people seem to like a lot these
00:01:27days is deep swe so if we take a look there it actually outscores opus 4.7 on a medium effort
00:01:33that is genuinely super impressive it is worth noting here though that not every single model has
00:01:38been tested on this one and the harness used was actually clawed code you just do a bit of api
00:01:42trickery to point to zai instead of anthropic the final set of benchmarks i like are design arenas
00:01:47and this is where things get interesting glm 5.2 just took first place overall on design arena's
00:01:53single turn html web design leaderboard becoming the first model ever to beat the clawed line
00:01:58including fable 5 it seems this may have been a focus area of the model as a further investigation
00:02:02by design arena seems to show that glm 5.2 has a strong set of expert templates that avoid common
00:02:08ai anti-patterns so you should get less purple gradients and it also seems to work really well
00:02:12with common libraries like chart.js 3.js and tailwind it does come with a little trade-off that
00:02:18it's a bit slower but i'll come back to that later it's also not number one everywhere on design arena
00:02:22it sits second on game dev data viz and 3d and fourth when it comes to ui components but that is
00:02:28still super impressive i thought i'd try this out on a few demo apps then and the first one was actually
00:02:32recreating linear but one of the annoying things about glm 5.2 which is a bit of a disadvantage
00:02:37is it only accepts text modalities so you can't upload a screenshot and say recreate this
00:02:42so what i actually did was sent a screenshot to claude and said give me a prompt to recreate this
00:02:46and that is the prompt i've ended up giving glm 5.2 regardless of that the results i got back were super
00:02:51impressive on the left here i've got the real linear web page and on the right here we have the glm
00:02:55recreation you can see it got the overall elements right and for the screenshot here actually just
00:02:59recreated the ui which i think was very cool as we scroll down you can see that it got overall the
00:03:04feel of the linear website and i do think this looks really good so it does have some strong ui design
00:03:09skills obviously it's not perfect since it couldn't take in a screenshot so it's sort of doing this as
00:03:14a recreation of that text prompt that i showed you but this web page looks really nice for comparison
00:03:19on the left here i have what claude opus 4.8 gave me with the exact same prompt and this one is
00:03:23kimmy k 2.7 code and again they all did a pretty good job of recreating the website just from that
00:03:29prompt and i actually think i might like kimmy k 2.7s the most it just has sort of the overall
00:03:34best feel and it looks the most complete in my opinion next up i then thought it'd be good to
00:03:38give these models a new website that it probably hasn't seen before as linear is probably in the
00:03:42training data of a lot of these models so i just said design and build a beautiful single page website
00:03:46for a fictional product called north star it's an ai powered personal planning app you can see
00:03:50there's also some design direction down here like we want a hero section some social proof pricing
00:03:56section all of the usual things and down here the design direction is clean premium sas aesthetic
00:04:00soft gradient strong typography rounded cards and so on this is the result i got back from two of the
00:04:06models and i'll tell you which is which at the end here but you can see as we scroll down i think this
00:04:10looks really nice and i think it's done a pretty good job it's a pretty basic startup website with your
00:04:15normal pricing section and so on and same on the right over here i do maybe like this style a little
00:04:20bit more but you can see it has gone for that sort of purple gradient ai look but i think there's just
00:04:25something about this website that looks a little cleaner and more complete to me but that is
00:04:29completely opinionated if you have a favorite one let me know in the comments below and also subscribe
00:04:33while you're there the one on the left here was actually glm 5.2 and this one was clawed opus 4.8
00:04:39for completion this is what kimmy k 2.7 code gave me and i do think this one does fall into that sort
00:04:43of ai look and feel with these purple gradients it's a little similar to the clawed one just with less
00:04:48animations and less polish i also quickly wanted to see here what glm 5.2 would do if i gave it no
00:04:53design direction so i've just given it the initial part of the prompt and i don't think
00:04:56the output looks bad but i'm not sure that i can agree with design arena that this doesn't have the
00:05:01usual ai look this is really using those purple gradients to the max for the next test i then
00:05:05thought i'd test them out on one-shotting 3gs applications and i simply said build a 3gs game
00:05:10where i can race an f1 car around silverstone you can see this one got to work here and this took
00:05:15overall about 10 minutes if we scroll all the way down to the bottom used 40 000 tokens and cost 32
00:05:20cents this is the output that glm 5.2 gave us then you can see it says silverstone f1 and start your
00:05:25engine by the way lewis hamilton just won for ferrari that's absolutely awesome i'm glad to see we've got
00:05:30a red car here as ferrari as well although we're definitely a little slower than i would like to be
00:05:35and one thing i'm noticing here is if i press a i seem to go right and d left so the controls are
00:05:40inverted but not on the arrow keys it seems and this definitely isn't the speed that i would like
00:05:45a ferrari to go around silverstone at but i mean it's it's not too bad for a first pass actually
00:05:51seems i go quicker if i reverse so maybe if i just reverse around the track that'll be better i tried
00:05:55the same test with kimmy k 2.7 code but i didn't actually get back a working example in a single
00:05:59prompt somewhere down here i had a few console errors that were constantly looping so i did have
00:06:04to tell it that i had a few errors but then it did fix those in the second prompt and you can see
00:06:08this one actually used more tokens at 110 000 and cost 81 cents the result i got back was also
00:06:14a little less playable it seems we have a little more speed but our turning circle is terrible i
00:06:19don't think i've ever seen an f1 driver turn like this and we can also drive through a few buildings
00:06:23here it's cool they got the names of the corners at silverstone but there's also no track it's
00:06:27seemingly just bollards the final one then is called opus 4.8 and this one is a little more playable
00:06:33besides the fact that i don't think there's just trees in the middle of the silverstone track i mean
00:06:37last time i checked there wasn't and yeah it's overall a fairly good game we've got some camera
00:06:42controls here my wheels probably wouldn't like them if i was an f1 driver but it seems to be handling
00:06:47all right and the track itself though is also one of the most confusing tracks that i think i've ever
00:06:52seen anyone race around there's a lot of overlapping going on here and i don't actually know which way to
00:06:57go but i would say that opus 4.8 gave us the most playable demo in a single prompt the final test i did
00:07:02is a little more involved it's a front end and a back end from scratch of a personal finance management
00:07:07dashboard with a few features that you can see listed here and just the general idea here is to
00:07:11see what stack it picks when it starts brand new and also if it can link up a front end and a back end
00:07:16all in that single prompt without any errors here's glm 5.2's attempt and i've got to say yeah it's a
00:07:22pretty basic looking dashboard there's nothing fancy but there's also not too many fancy things that you can
00:07:26do with sort of the prompt that i gave it everything appears to be working i've added things to the database
00:07:32i paid for my fable 5 subscription here all of these pages are clickable and everything does transfer
00:07:37between them when i click on these i have tested it so it seems to have done a very good job from
00:07:41that single prompt i'm always curious what stack it picked as well and this one went with a next js
00:07:46application and it used prisma for the database and we can see that in here we also have a development
00:07:50database i probably would have preferred that it used drizzle and maybe tan stack but i can't really
00:07:55complain i gave it no direction this is actually what kimmy k 2.7 code gave me and you can see it's
00:07:59almost the exact same application it's just i would say not as fancy they've definitely got some of
00:08:04the same templates in their training somewhere that looks exactly like this and again yeah i can't
00:08:09complain too much about this but it's missing sort of all of the extras with the buttons to be able
00:08:13to transfer i've got the ad account features and add transactions they do work but i just say the
00:08:18overall ui of this and the user experience is a little worse since it doesn't have that information
00:08:23clickable up here the default stack it chose i would also argue is a little worse it used react here with
00:08:28just a normal vt setup and react router which i have no issue with but the back end it did go with
00:08:33express and if we take a look at the actual database file it's just using node sqlite to write to it and
00:08:39writing in the schemas in the text here which i think is going to be a little less scalable if i was
00:08:43completely vibe coding and didn't know anything about the stack i would probably want glm 5.2 but if i was
00:08:48using kimike 2.7 code i probably would have given it directions to use drizzle next yes and various
00:08:53other things as well so it just varies based on what you like talking of opinionated as well this
00:08:58is actually what claude opus 4.8 gave me it definitely went with a completely different style
00:09:03to the ones that we've seen before but it's sort of this style of text that claude seems to like at
00:09:07the moment it's definitely what they put in the training data or are pushing it towards and all of
00:09:11this works really well and yeah i think it looks really good i'd probably prompt this to maybe use
00:09:16different fonts and a different color scheme but sort of the overall base is very good didn't
00:09:20actually do separate pages for this it just did separate sections so maybe that's worse but again
00:09:25that comes down to the prompt all of the features and everything like that is working taking a look
00:09:29at the actual code that opus gave me i actually think glm 5.2 may have won this one what opus
00:09:34did is it just used a normal react application it didn't even bother with react router since it was
00:09:38all on that single page there and it also went with express for its back end but then it didn't
00:09:43actually do any connection to a database all of it is actually just an in-memory store that we can see
00:09:48here where it seeds the data and it just runs all of it off of a javascript object which again probably
00:09:53isn't what i want if i'm going to be scaling this in the future but does come down to the prompt i think
00:09:58that's sort of my key takeaway when testing this model over the last few days i think for a lot of
00:10:02tasks you could secretly swap glm 5.2 in the place of sonnet or even opus for simpler tasks and i
00:10:07probably wouldn't notice it is a really capable model and if you give it the right steering you get
00:10:12really good results it's one of the first open models that i haven't felt like i'm fighting to
00:10:16use and also one of the first open models where using it i haven't had that feeling of i know claude
00:10:21could do this better or faster the last things to mention then to round this out are tokens cost and
00:10:25speed one of the downsides of glm 5.2 could be that it's a little more token hungry when compared to
00:10:31other models in its class it used an average of 43 000 tokens a task which is more than kimmy k 2.6
00:10:37minimax and deep seek but the good news is it doesn't actually cost that much depending on the
00:10:41provider it's around 1.40 for a million input tokens and 4.40 for a million output tokens and on the
00:10:47benchmarks of artificial analysis it actually cost around 50 cents a task and you can see this is a
00:10:52pretty good spot when we do cost versus intelligence ignore the gemini label here it's actually this blue
00:10:57dot and you can see it's quite a crowded chart but what this actually shows is at its intelligence
00:11:02level glm 5.2 is the cheapest model although i will say here if you can take a hit to the intelligence
00:11:07i do think minimax and especially deep seek v4 are very good for that price when it comes to speed
00:11:12glm 5.2 is actually not bad at all it outperformed most of the open models near its intelligence level
00:11:17so deep seek v4 kimmy 2.7 code and minimax and it's a bit behind a frontier model like gemini 3.1 pro
00:11:24which has the same intelligence level but that is a frontier model and i'd also love to see gemini
00:11:283.5 pro added to this list google please release that when it comes to speed as well design arena
00:11:33actually seemingly got a bit of a different result where they say that glm 5.2 scores the highest on
00:11:38user preference of the design but it was also the slowest out of the top models although it is also
00:11:42worth noting there that all of those top models are frontier ones and not open ones overall it really
00:11:47feels like we're at a point where these open models are let's say four to six months behind so
00:11:51perhaps too optimistically we could be looking at a fable model by next year and i mean they themselves
00:11:56are actually promising by q1 and i hate to agree with this next person on anything but he does make
00:12:01a good point here that maybe on the benchmarks they could catch fable but actual usefulness does feel
00:12:06a little bit different and this is what anthropic is very good at it's very rare to see him actually
00:12:10giving them a compliment there but i do have to agree with that sentiment where actually using
00:12:14these models feels a little bit different but i think glm 5.2 is one of the first ones that's broken
00:12:19that cycle for me i think if you told me a year ago that these open models would be anywhere near
00:12:23this good i would have been absolutely shocked and probably not believed you and i'm not actually
00:12:27a doomsday prepper but i feel like with the recent fable ban i just want to download glm 5.2 and store
00:12:31it on an ssd just in case i need it later let me know what you think of this model in the comments
00:12:36down below and also tell me what your favorite open model is to use while you're there subscribe
00:12:40and as always see you in the next one

Key Takeaway

GLM 5.2 establishes itself as a highly capable, MIT-licensed open model that matches the performance of frontier models like GPT-5.4 and Gemini 3.5 Flash while maintaining cost-efficiency.

Highlights

  • GLM 5.2 contains 744 billion total parameters with 40 billion active parameters.

  • The model achieved a score of 51 on the artificial intelligence index, outperforming its predecessor by 11 points.

  • GLM 5.2 holds the top overall position on the Design Arena single-turn HTML web design leaderboard.

  • At an intelligence level comparable to frontier models, GLM 5.2 costs approximately 50 cents per task.

  • The model is MIT licensed, enabling open access for developers.

Timeline

GLM 5.2 Model Overview and Benchmarks

  • GLM 5.2 utilizes 744 billion total parameters with 40 billion active parameters.
  • The model reached an intelligence index score of 51, marking an 11-point improvement over version 5.1.
  • Performance on benchmarks like GPT-Val and Deep SWE matches frontier models such as Gemini 3.5 Flash and GPT-5.4.

This model represents a significant jump in capabilities despite retaining the same parameter size as its predecessor. It leads the current open model landscape by a clear margin across reasoning, coding, and science benchmarks. It successfully competes with major proprietary models in both code generation and complex task evaluation.

Web Design and UI Performance

  • GLM 5.2 ranked first on the Design Arena single-turn HTML web design leaderboard.
  • The model demonstrates proficiency with libraries including Chart.js, Three.js, and Tailwind CSS.
  • Recreation tests show the model produces high-quality UI results despite lacking direct image input capabilities.

Design capabilities are a primary strength, with the model avoiding common AI aesthetic anti-patterns. While it requires text-based prompts rather than screenshot inputs, it effectively recreates complex UI structures. It consistently produces cleaner, more polished results compared to other models tested in side-by-side comparisons.

Application Development and Coding Capabilities

  • The model successfully generates functional 3D applications and full-stack web dashboards.
  • Development tasks involving Next.js and Prisma demonstrate the model's ability to handle integrated backend and frontend logic.
  • Coding performance matches or exceeds that of other specialized coding models in single-prompt scenarios.

Tests reveal that the model handles complex coding tasks, such as creating finance dashboards, with minimal errors in a single prompt. It effectively selects common development stacks, though user intervention may be required to specify exact library preferences. While some minor logic inversions occurred in game development tasks, the overall output remained highly functional compared to competitors.

Efficiency, Cost, and Future Outlook

  • Operating costs average 50 cents per task, making it the most cost-efficient model at its intelligence tier.
  • Token consumption is higher than some smaller models, averaging 43,000 tokens per task.
  • The model maintains competitive speeds compared to other open-source models near its performance level.

Data indicates that GLM 5.2 provides a high intelligence-to-cost ratio. While it is more token-hungry than some peers, its overall performance warrants its use as a replacement for proprietary models in many scenarios. The availability of this model under an MIT license provides a significant resource for those seeking high-tier performance without reliance on restricted frontier providers.

Community Posts

View all posts