Transcript
00:00:00best open model in the world right now isn't from a company called open ai it's of course from a
00:00:04chinese lab and this one is glm 5.2 from zai this model is seriously impressive matching gpt 5.5 on
00:00:10certain benchmarks and there's even a category where it appears to be beating fable all while
00:00:15being mit licensed open way let's check it out so glm 5.2 is a 744 billion total parameter model
00:00:26with 40 billion active parameters and it's actually the same size as its predecessor glm 5.1
00:00:31which is why it's very impressive that they made such a leap on the intelligence index
00:00:35from artificial analysis this is a combined score across a bunch of benchmarks so reasoning coding
00:00:40science the whole lot glm 5.2 here got a score of 51 which is 11 ahead of its previous iteration
00:00:45and the top open model by a pretty healthy margin you can see quen 3.7 is next then minimax m3
00:00:51followed by kimmy k 2.6 this actually places it in the same realm as gemini 3.5 flash and gpt 5.4 on
00:00:57a max effort which is pretty insane and on a few of the benchmarks included in this index like gpt val
00:01:03actually outscores gpt 5.5 if we focus on coding specifically it's still great on the coding index
00:01:09it scores the same as gemini 3.1 pro and actually beats sonic 4.6 and it isn't even that far off the
00:01:14top frontier models it's also a fair bit ahead of kimmy k 2.7 code which is our newest model which i know a
00:01:19lot of people myself included are big fans off i've just always found the kimmy models have a really
00:01:23nice feel to them outside of the coding index another benchmark people seem to like a lot these
00:01:27days is deep swe so if we take a look there it actually outscores opus 4.7 on a medium effort
00:01:33that is genuinely super impressive it is worth noting here though that not every single model has
00:01:38been tested on this one and the harness used was actually clawed code you just do a bit of api
00:01:42trickery to point to zai instead of anthropic the final set of benchmarks i like are design arenas
00:01:47and this is where things get interesting glm 5.2 just took first place overall on design arena's
00:01:53single turn html web design leaderboard becoming the first model ever to beat the clawed line
00:01:58including fable 5 it seems this may have been a focus area of the model as a further investigation
00:02:02by design arena seems to show that glm 5.2 has a strong set of expert templates that avoid common
00:02:08ai anti-patterns so you should get less purple gradients and it also seems to work really well
00:02:12with common libraries like chart.js 3.js and tailwind it does come with a little trade-off that
00:02:18it's a bit slower but i'll come back to that later it's also not number one everywhere on design arena
00:02:22it sits second on game dev data viz and 3d and fourth when it comes to ui components but that is
00:02:28still super impressive i thought i'd try this out on a few demo apps then and the first one was actually
00:02:32recreating linear but one of the annoying things about glm 5.2 which is a bit of a disadvantage
00:02:37is it only accepts text modalities so you can't upload a screenshot and say recreate this
00:02:42so what i actually did was sent a screenshot to claude and said give me a prompt to recreate this
00:02:46and that is the prompt i've ended up giving glm 5.2 regardless of that the results i got back were super
00:02:51impressive on the left here i've got the real linear web page and on the right here we have the glm
00:02:55recreation you can see it got the overall elements right and for the screenshot here actually just
00:02:59recreated the ui which i think was very cool as we scroll down you can see that it got overall the
00:03:04feel of the linear website and i do think this looks really good so it does have some strong ui design
00:03:09skills obviously it's not perfect since it couldn't take in a screenshot so it's sort of doing this as
00:03:14a recreation of that text prompt that i showed you but this web page looks really nice for comparison
00:03:19on the left here i have what claude opus 4.8 gave me with the exact same prompt and this one is
00:03:23kimmy k 2.7 code and again they all did a pretty good job of recreating the website just from that
00:03:29prompt and i actually think i might like kimmy k 2.7s the most it just has sort of the overall
00:03:34best feel and it looks the most complete in my opinion next up i then thought it'd be good to
00:03:38give these models a new website that it probably hasn't seen before as linear is probably in the
00:03:42training data of a lot of these models so i just said design and build a beautiful single page website
00:03:46for a fictional product called north star it's an ai powered personal planning app you can see
00:03:50there's also some design direction down here like we want a hero section some social proof pricing
00:03:56section all of the usual things and down here the design direction is clean premium sas aesthetic
00:04:00soft gradient strong typography rounded cards and so on this is the result i got back from two of the
00:04:06models and i'll tell you which is which at the end here but you can see as we scroll down i think this
00:04:10looks really nice and i think it's done a pretty good job it's a pretty basic startup website with your
00:04:15normal pricing section and so on and same on the right over here i do maybe like this style a little
00:04:20bit more but you can see it has gone for that sort of purple gradient ai look but i think there's just
00:04:25something about this website that looks a little cleaner and more complete to me but that is
00:04:29completely opinionated if you have a favorite one let me know in the comments below and also subscribe
00:04:33while you're there the one on the left here was actually glm 5.2 and this one was clawed opus 4.8
00:04:39for completion this is what kimmy k 2.7 code gave me and i do think this one does fall into that sort
00:04:43of ai look and feel with these purple gradients it's a little similar to the clawed one just with less
00:04:48animations and less polish i also quickly wanted to see here what glm 5.2 would do if i gave it no
00:04:53design direction so i've just given it the initial part of the prompt and i don't think
00:04:56the output looks bad but i'm not sure that i can agree with design arena that this doesn't have the
00:05:01usual ai look this is really using those purple gradients to the max for the next test i then
00:05:05thought i'd test them out on one-shotting 3gs applications and i simply said build a 3gs game
00:05:10where i can race an f1 car around silverstone you can see this one got to work here and this took
00:05:15overall about 10 minutes if we scroll all the way down to the bottom used 40 000 tokens and cost 32
00:05:20cents this is the output that glm 5.2 gave us then you can see it says silverstone f1 and start your
00:05:25engine by the way lewis hamilton just won for ferrari that's absolutely awesome i'm glad to see we've got
00:05:30a red car here as ferrari as well although we're definitely a little slower than i would like to be
00:05:35and one thing i'm noticing here is if i press a i seem to go right and d left so the controls are
00:05:40inverted but not on the arrow keys it seems and this definitely isn't the speed that i would like
00:05:45a ferrari to go around silverstone at but i mean it's it's not too bad for a first pass actually
00:05:51seems i go quicker if i reverse so maybe if i just reverse around the track that'll be better i tried
00:05:55the same test with kimmy k 2.7 code but i didn't actually get back a working example in a single
00:05:59prompt somewhere down here i had a few console errors that were constantly looping so i did have
00:06:04to tell it that i had a few errors but then it did fix those in the second prompt and you can see
00:06:08this one actually used more tokens at 110 000 and cost 81 cents the result i got back was also
00:06:14a little less playable it seems we have a little more speed but our turning circle is terrible i
00:06:19don't think i've ever seen an f1 driver turn like this and we can also drive through a few buildings
00:06:23here it's cool they got the names of the corners at silverstone but there's also no track it's
00:06:27seemingly just bollards the final one then is called opus 4.8 and this one is a little more playable
00:06:33besides the fact that i don't think there's just trees in the middle of the silverstone track i mean
00:06:37last time i checked there wasn't and yeah it's overall a fairly good game we've got some camera
00:06:42controls here my wheels probably wouldn't like them if i was an f1 driver but it seems to be handling
00:06:47all right and the track itself though is also one of the most confusing tracks that i think i've ever
00:06:52seen anyone race around there's a lot of overlapping going on here and i don't actually know which way to
00:06:57go but i would say that opus 4.8 gave us the most playable demo in a single prompt the final test i did
00:07:02is a little more involved it's a front end and a back end from scratch of a personal finance management
00:07:07dashboard with a few features that you can see listed here and just the general idea here is to
00:07:11see what stack it picks when it starts brand new and also if it can link up a front end and a back end
00:07:16all in that single prompt without any errors here's glm 5.2's attempt and i've got to say yeah it's a
00:07:22pretty basic looking dashboard there's nothing fancy but there's also not too many fancy things that you can
00:07:26do with sort of the prompt that i gave it everything appears to be working i've added things to the database
00:07:32i paid for my fable 5 subscription here all of these pages are clickable and everything does transfer
00:07:37between them when i click on these i have tested it so it seems to have done a very good job from
00:07:41that single prompt i'm always curious what stack it picked as well and this one went with a next js
00:07:46application and it used prisma for the database and we can see that in here we also have a development
00:07:50database i probably would have preferred that it used drizzle and maybe tan stack but i can't really
00:07:55complain i gave it no direction this is actually what kimmy k 2.7 code gave me and you can see it's
00:07:59almost the exact same application it's just i would say not as fancy they've definitely got some of
00:08:04the same templates in their training somewhere that looks exactly like this and again yeah i can't
00:08:09complain too much about this but it's missing sort of all of the extras with the buttons to be able
00:08:13to transfer i've got the ad account features and add transactions they do work but i just say the
00:08:18overall ui of this and the user experience is a little worse since it doesn't have that information
00:08:23clickable up here the default stack it chose i would also argue is a little worse it used react here with
00:08:28just a normal vt setup and react router which i have no issue with but the back end it did go with
00:08:33express and if we take a look at the actual database file it's just using node sqlite to write to it and
00:08:39writing in the schemas in the text here which i think is going to be a little less scalable if i was
00:08:43completely vibe coding and didn't know anything about the stack i would probably want glm 5.2 but if i was
00:08:48using kimike 2.7 code i probably would have given it directions to use drizzle next yes and various
00:08:53other things as well so it just varies based on what you like talking of opinionated as well this
00:08:58is actually what claude opus 4.8 gave me it definitely went with a completely different style
00:09:03to the ones that we've seen before but it's sort of this style of text that claude seems to like at
00:09:07the moment it's definitely what they put in the training data or are pushing it towards and all of
00:09:11this works really well and yeah i think it looks really good i'd probably prompt this to maybe use
00:09:16different fonts and a different color scheme but sort of the overall base is very good didn't
00:09:20actually do separate pages for this it just did separate sections so maybe that's worse but again
00:09:25that comes down to the prompt all of the features and everything like that is working taking a look
00:09:29at the actual code that opus gave me i actually think glm 5.2 may have won this one what opus
00:09:34did is it just used a normal react application it didn't even bother with react router since it was
00:09:38all on that single page there and it also went with express for its back end but then it didn't
00:09:43actually do any connection to a database all of it is actually just an in-memory store that we can see
00:09:48here where it seeds the data and it just runs all of it off of a javascript object which again probably
00:09:53isn't what i want if i'm going to be scaling this in the future but does come down to the prompt i think
00:09:58that's sort of my key takeaway when testing this model over the last few days i think for a lot of
00:10:02tasks you could secretly swap glm 5.2 in the place of sonnet or even opus for simpler tasks and i
00:10:07probably wouldn't notice it is a really capable model and if you give it the right steering you get
00:10:12really good results it's one of the first open models that i haven't felt like i'm fighting to
00:10:16use and also one of the first open models where using it i haven't had that feeling of i know claude
00:10:21could do this better or faster the last things to mention then to round this out are tokens cost and
00:10:25speed one of the downsides of glm 5.2 could be that it's a little more token hungry when compared to
00:10:31other models in its class it used an average of 43 000 tokens a task which is more than kimmy k 2.6
00:10:37minimax and deep seek but the good news is it doesn't actually cost that much depending on the
00:10:41provider it's around 1.40 for a million input tokens and 4.40 for a million output tokens and on the
00:10:47benchmarks of artificial analysis it actually cost around 50 cents a task and you can see this is a
00:10:52pretty good spot when we do cost versus intelligence ignore the gemini label here it's actually this blue
00:10:57dot and you can see it's quite a crowded chart but what this actually shows is at its intelligence
00:11:02level glm 5.2 is the cheapest model although i will say here if you can take a hit to the intelligence
00:11:07i do think minimax and especially deep seek v4 are very good for that price when it comes to speed
00:11:12glm 5.2 is actually not bad at all it outperformed most of the open models near its intelligence level
00:11:17so deep seek v4 kimmy 2.7 code and minimax and it's a bit behind a frontier model like gemini 3.1 pro
00:11:24which has the same intelligence level but that is a frontier model and i'd also love to see gemini
00:11:283.5 pro added to this list google please release that when it comes to speed as well design arena
00:11:33actually seemingly got a bit of a different result where they say that glm 5.2 scores the highest on
00:11:38user preference of the design but it was also the slowest out of the top models although it is also
00:11:42worth noting there that all of those top models are frontier ones and not open ones overall it really
00:11:47feels like we're at a point where these open models are let's say four to six months behind so
00:11:51perhaps too optimistically we could be looking at a fable model by next year and i mean they themselves
00:11:56are actually promising by q1 and i hate to agree with this next person on anything but he does make
00:12:01a good point here that maybe on the benchmarks they could catch fable but actual usefulness does feel
00:12:06a little bit different and this is what anthropic is very good at it's very rare to see him actually
00:12:10giving them a compliment there but i do have to agree with that sentiment where actually using
00:12:14these models feels a little bit different but i think glm 5.2 is one of the first ones that's broken
00:12:19that cycle for me i think if you told me a year ago that these open models would be anywhere near
00:12:23this good i would have been absolutely shocked and probably not believed you and i'm not actually
00:12:27a doomsday prepper but i feel like with the recent fable ban i just want to download glm 5.2 and store
00:12:31it on an ssd just in case i need it later let me know what you think of this model in the comments
00:12:36down below and also tell me what your favorite open model is to use while you're there subscribe
00:12:40and as always see you in the next one