GLM 5.2 is my new favorite model...

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

컴퓨터/소프트웨어창업/스타트업AI/미래기술

Transcript

00:00:00best open model in the world right now isn't from a company called open ai it's of course from a

00:00:04chinese lab and this one is glm 5.2 from zai this model is seriously impressive matching gpt 5.5 on

00:00:10certain benchmarks and there's even a category where it appears to be beating fable all while

00:00:15being mit licensed open way let's check it out so glm 5.2 is a 744 billion total parameter model

00:00:26with 40 billion active parameters and it's actually the same size as its predecessor glm 5.1

00:00:31which is why it's very impressive that they made such a leap on the intelligence index

00:00:35from artificial analysis this is a combined score across a bunch of benchmarks so reasoning coding

00:00:40science the whole lot glm 5.2 here got a score of 51 which is 11 ahead of its previous iteration

00:00:45and the top open model by a pretty healthy margin you can see quen 3.7 is next then minimax m3

00:00:51followed by kimmy k 2.6 this actually places it in the same realm as gemini 3.5 flash and gpt 5.4 on

00:00:57a max effort which is pretty insane and on a few of the benchmarks included in this index like gpt val

00:01:03actually outscores gpt 5.5 if we focus on coding specifically it's still great on the coding index

00:01:09it scores the same as gemini 3.1 pro and actually beats sonic 4.6 and it isn't even that far off the

00:01:14top frontier models it's also a fair bit ahead of kimmy k 2.7 code which is our newest model which i know a

00:01:19lot of people myself included are big fans off i've just always found the kimmy models have a really

00:01:23nice feel to them outside of the coding index another benchmark people seem to like a lot these

00:01:27days is deep swe so if we take a look there it actually outscores opus 4.7 on a medium effort

00:01:33that is genuinely super impressive it is worth noting here though that not every single model has

00:01:38been tested on this one and the harness used was actually clawed code you just do a bit of api

00:01:42trickery to point to zai instead of anthropic the final set of benchmarks i like are design arenas

00:01:47and this is where things get interesting glm 5.2 just took first place overall on design arena's

00:01:53single turn html web design leaderboard becoming the first model ever to beat the clawed line

00:01:58including fable 5 it seems this may have been a focus area of the model as a further investigation

00:02:02by design arena seems to show that glm 5.2 has a strong set of expert templates that avoid common

00:02:08ai anti-patterns so you should get less purple gradients and it also seems to work really well

00:02:12with common libraries like chart.js 3.js and tailwind it does come with a little trade-off that

00:02:18it's a bit slower but i'll come back to that later it's also not number one everywhere on design arena

00:02:22it sits second on game dev data viz and 3d and fourth when it comes to ui components but that is

00:02:28still super impressive i thought i'd try this out on a few demo apps then and the first one was actually

00:02:32recreating linear but one of the annoying things about glm 5.2 which is a bit of a disadvantage

00:02:37is it only accepts text modalities so you can't upload a screenshot and say recreate this

00:02:42so what i actually did was sent a screenshot to claude and said give me a prompt to recreate this

00:02:46and that is the prompt i've ended up giving glm 5.2 regardless of that the results i got back were super

00:02:51impressive on the left here i've got the real linear web page and on the right here we have the glm

00:02:55recreation you can see it got the overall elements right and for the screenshot here actually just

00:02:59recreated the ui which i think was very cool as we scroll down you can see that it got overall the

00:03:04feel of the linear website and i do think this looks really good so it does have some strong ui design

00:03:09skills obviously it's not perfect since it couldn't take in a screenshot so it's sort of doing this as

00:03:14a recreation of that text prompt that i showed you but this web page looks really nice for comparison

00:03:19on the left here i have what claude opus 4.8 gave me with the exact same prompt and this one is

00:03:23kimmy k 2.7 code and again they all did a pretty good job of recreating the website just from that

00:03:29prompt and i actually think i might like kimmy k 2.7s the most it just has sort of the overall

00:03:34best feel and it looks the most complete in my opinion next up i then thought it'd be good to

00:03:38give these models a new website that it probably hasn't seen before as linear is probably in the

00:03:42training data of a lot of these models so i just said design and build a beautiful single page website

00:03:46for a fictional product called north star it's an ai powered personal planning app you can see

00:03:50there's also some design direction down here like we want a hero section some social proof pricing

00:03:56section all of the usual things and down here the design direction is clean premium sas aesthetic

00:04:00soft gradient strong typography rounded cards and so on this is the result i got back from two of the

00:04:06models and i'll tell you which is which at the end here but you can see as we scroll down i think this

00:04:10looks really nice and i think it's done a pretty good job it's a pretty basic startup website with your

00:04:15normal pricing section and so on and same on the right over here i do maybe like this style a little

00:04:20bit more but you can see it has gone for that sort of purple gradient ai look but i think there's just

00:04:25something about this website that looks a little cleaner and more complete to me but that is

00:04:29completely opinionated if you have a favorite one let me know in the comments below and also subscribe

00:04:33while you're there the one on the left here was actually glm 5.2 and this one was clawed opus 4.8

00:04:39for completion this is what kimmy k 2.7 code gave me and i do think this one does fall into that sort

00:04:43of ai look and feel with these purple gradients it's a little similar to the clawed one just with less

00:04:48animations and less polish i also quickly wanted to see here what glm 5.2 would do if i gave it no

00:04:53design direction so i've just given it the initial part of the prompt and i don't think

00:04:56the output looks bad but i'm not sure that i can agree with design arena that this doesn't have the

00:05:01usual ai look this is really using those purple gradients to the max for the next test i then

00:05:05thought i'd test them out on one-shotting 3gs applications and i simply said build a 3gs game

00:05:10where i can race an f1 car around silverstone you can see this one got to work here and this took

00:05:15overall about 10 minutes if we scroll all the way down to the bottom used 40 000 tokens and cost 32

00:05:20cents this is the output that glm 5.2 gave us then you can see it says silverstone f1 and start your

00:05:25engine by the way lewis hamilton just won for ferrari that's absolutely awesome i'm glad to see we've got

00:05:30a red car here as ferrari as well although we're definitely a little slower than i would like to be

00:05:35and one thing i'm noticing here is if i press a i seem to go right and d left so the controls are

00:05:40inverted but not on the arrow keys it seems and this definitely isn't the speed that i would like

00:05:45a ferrari to go around silverstone at but i mean it's it's not too bad for a first pass actually

00:05:51seems i go quicker if i reverse so maybe if i just reverse around the track that'll be better i tried

00:05:55the same test with kimmy k 2.7 code but i didn't actually get back a working example in a single

00:05:59prompt somewhere down here i had a few console errors that were constantly looping so i did have

00:06:04to tell it that i had a few errors but then it did fix those in the second prompt and you can see

00:06:08this one actually used more tokens at 110 000 and cost 81 cents the result i got back was also

00:06:14a little less playable it seems we have a little more speed but our turning circle is terrible i

00:06:19don't think i've ever seen an f1 driver turn like this and we can also drive through a few buildings

00:06:23here it's cool they got the names of the corners at silverstone but there's also no track it's

00:06:27seemingly just bollards the final one then is called opus 4.8 and this one is a little more playable

00:06:33besides the fact that i don't think there's just trees in the middle of the silverstone track i mean

00:06:37last time i checked there wasn't and yeah it's overall a fairly good game we've got some camera

00:06:42controls here my wheels probably wouldn't like them if i was an f1 driver but it seems to be handling

00:06:47all right and the track itself though is also one of the most confusing tracks that i think i've ever

00:06:52seen anyone race around there's a lot of overlapping going on here and i don't actually know which way to

00:06:57go but i would say that opus 4.8 gave us the most playable demo in a single prompt the final test i did

00:07:02is a little more involved it's a front end and a back end from scratch of a personal finance management

00:07:07dashboard with a few features that you can see listed here and just the general idea here is to

00:07:11see what stack it picks when it starts brand new and also if it can link up a front end and a back end

00:07:16all in that single prompt without any errors here's glm 5.2's attempt and i've got to say yeah it's a

00:07:22pretty basic looking dashboard there's nothing fancy but there's also not too many fancy things that you can

00:07:26do with sort of the prompt that i gave it everything appears to be working i've added things to the database

00:07:32i paid for my fable 5 subscription here all of these pages are clickable and everything does transfer

00:07:37between them when i click on these i have tested it so it seems to have done a very good job from

00:07:41that single prompt i'm always curious what stack it picked as well and this one went with a next js

00:07:46application and it used prisma for the database and we can see that in here we also have a development

00:07:50database i probably would have preferred that it used drizzle and maybe tan stack but i can't really

00:07:55complain i gave it no direction this is actually what kimmy k 2.7 code gave me and you can see it's

00:07:59almost the exact same application it's just i would say not as fancy they've definitely got some of

00:08:04the same templates in their training somewhere that looks exactly like this and again yeah i can't

00:08:09complain too much about this but it's missing sort of all of the extras with the buttons to be able

00:08:13to transfer i've got the ad account features and add transactions they do work but i just say the

00:08:18overall ui of this and the user experience is a little worse since it doesn't have that information

00:08:23clickable up here the default stack it chose i would also argue is a little worse it used react here with

00:08:28just a normal vt setup and react router which i have no issue with but the back end it did go with

00:08:33express and if we take a look at the actual database file it's just using node sqlite to write to it and

00:08:39writing in the schemas in the text here which i think is going to be a little less scalable if i was

00:08:43completely vibe coding and didn't know anything about the stack i would probably want glm 5.2 but if i was

00:08:48using kimike 2.7 code i probably would have given it directions to use drizzle next yes and various

00:08:53other things as well so it just varies based on what you like talking of opinionated as well this

00:08:58is actually what claude opus 4.8 gave me it definitely went with a completely different style

00:09:03to the ones that we've seen before but it's sort of this style of text that claude seems to like at

00:09:07the moment it's definitely what they put in the training data or are pushing it towards and all of

00:09:11this works really well and yeah i think it looks really good i'd probably prompt this to maybe use

00:09:16different fonts and a different color scheme but sort of the overall base is very good didn't

00:09:20actually do separate pages for this it just did separate sections so maybe that's worse but again

00:09:25that comes down to the prompt all of the features and everything like that is working taking a look

00:09:29at the actual code that opus gave me i actually think glm 5.2 may have won this one what opus

00:09:34did is it just used a normal react application it didn't even bother with react router since it was

00:09:38all on that single page there and it also went with express for its back end but then it didn't

00:09:43actually do any connection to a database all of it is actually just an in-memory store that we can see

00:09:48here where it seeds the data and it just runs all of it off of a javascript object which again probably

00:09:53isn't what i want if i'm going to be scaling this in the future but does come down to the prompt i think

00:09:58that's sort of my key takeaway when testing this model over the last few days i think for a lot of

00:10:02tasks you could secretly swap glm 5.2 in the place of sonnet or even opus for simpler tasks and i

00:10:07probably wouldn't notice it is a really capable model and if you give it the right steering you get

00:10:12really good results it's one of the first open models that i haven't felt like i'm fighting to

00:10:16use and also one of the first open models where using it i haven't had that feeling of i know claude

00:10:21could do this better or faster the last things to mention then to round this out are tokens cost and

00:10:25speed one of the downsides of glm 5.2 could be that it's a little more token hungry when compared to

00:10:31other models in its class it used an average of 43 000 tokens a task which is more than kimmy k 2.6

00:10:37minimax and deep seek but the good news is it doesn't actually cost that much depending on the

00:10:41provider it's around 1.40 for a million input tokens and 4.40 for a million output tokens and on the

00:10:47benchmarks of artificial analysis it actually cost around 50 cents a task and you can see this is a

00:10:52pretty good spot when we do cost versus intelligence ignore the gemini label here it's actually this blue

00:10:57dot and you can see it's quite a crowded chart but what this actually shows is at its intelligence

00:11:02level glm 5.2 is the cheapest model although i will say here if you can take a hit to the intelligence

00:11:07i do think minimax and especially deep seek v4 are very good for that price when it comes to speed

00:11:12glm 5.2 is actually not bad at all it outperformed most of the open models near its intelligence level

00:11:17so deep seek v4 kimmy 2.7 code and minimax and it's a bit behind a frontier model like gemini 3.1 pro

00:11:24which has the same intelligence level but that is a frontier model and i'd also love to see gemini

00:11:283.5 pro added to this list google please release that when it comes to speed as well design arena

00:11:33actually seemingly got a bit of a different result where they say that glm 5.2 scores the highest on

00:11:38user preference of the design but it was also the slowest out of the top models although it is also

00:11:42worth noting there that all of those top models are frontier ones and not open ones overall it really

00:11:47feels like we're at a point where these open models are let's say four to six months behind so

00:11:51perhaps too optimistically we could be looking at a fable model by next year and i mean they themselves

00:11:56are actually promising by q1 and i hate to agree with this next person on anything but he does make

00:12:01a good point here that maybe on the benchmarks they could catch fable but actual usefulness does feel

00:12:06a little bit different and this is what anthropic is very good at it's very rare to see him actually

00:12:10giving them a compliment there but i do have to agree with that sentiment where actually using

00:12:14these models feels a little bit different but i think glm 5.2 is one of the first ones that's broken

00:12:19that cycle for me i think if you told me a year ago that these open models would be anywhere near

00:12:23this good i would have been absolutely shocked and probably not believed you and i'm not actually

00:12:27a doomsday prepper but i feel like with the recent fable ban i just want to download glm 5.2 and store

00:12:31it on an ssd just in case i need it later let me know what you think of this model in the comments

00:12:36down below and also tell me what your favorite open model is to use while you're there subscribe

00:12:40and as always see you in the next one

Key Takeaway

GLM 5.2 establishes itself as a highly capable, MIT-licensed open model that matches the performance of frontier models like GPT-5.4 and Gemini 3.5 Flash while maintaining cost-efficiency.

Highlights

GLM 5.2 contains 744 billion total parameters with 40 billion active parameters.
The model achieved a score of 51 on the artificial intelligence index, outperforming its predecessor by 11 points.
GLM 5.2 holds the top overall position on the Design Arena single-turn HTML web design leaderboard.
At an intelligence level comparable to frontier models, GLM 5.2 costs approximately 50 cents per task.
The model is MIT licensed, enabling open access for developers.

Timeline

GLM 5.2 Model Overview and Benchmarks

GLM 5.2 utilizes 744 billion total parameters with 40 billion active parameters.
The model reached an intelligence index score of 51, marking an 11-point improvement over version 5.1.
Performance on benchmarks like GPT-Val and Deep SWE matches frontier models such as Gemini 3.5 Flash and GPT-5.4.

This model represents a significant jump in capabilities despite retaining the same parameter size as its predecessor. It leads the current open model landscape by a clear margin across reasoning, coding, and science benchmarks. It successfully competes with major proprietary models in both code generation and complex task evaluation.

Web Design and UI Performance

GLM 5.2 ranked first on the Design Arena single-turn HTML web design leaderboard.
The model demonstrates proficiency with libraries including Chart.js, Three.js, and Tailwind CSS.
Recreation tests show the model produces high-quality UI results despite lacking direct image input capabilities.

Design capabilities are a primary strength, with the model avoiding common AI aesthetic anti-patterns. While it requires text-based prompts rather than screenshot inputs, it effectively recreates complex UI structures. It consistently produces cleaner, more polished results compared to other models tested in side-by-side comparisons.

Application Development and Coding Capabilities

The model successfully generates functional 3D applications and full-stack web dashboards.
Development tasks involving Next.js and Prisma demonstrate the model's ability to handle integrated backend and frontend logic.
Coding performance matches or exceeds that of other specialized coding models in single-prompt scenarios.

Tests reveal that the model handles complex coding tasks, such as creating finance dashboards, with minimal errors in a single prompt. It effectively selects common development stacks, though user intervention may be required to specify exact library preferences. While some minor logic inversions occurred in game development tasks, the overall output remained highly functional compared to competitors.

Efficiency, Cost, and Future Outlook

Operating costs average 50 cents per task, making it the most cost-efficient model at its intelligence tier.
Token consumption is higher than some smaller models, averaging 43,000 tokens per task.
The model maintains competitive speeds compared to other open-source models near its performance level.

Data indicates that GLM 5.2 provides a high intelligence-to-cost ratio. While it is more token-hungry than some peers, its overall performance warrants its use as a replacement for proprietary models in many scenarios. The availability of this model under an MIT license provides a significant resource for those seeking high-tier performance without reliance on restricted frontier providers.

Community Posts

Write about this video