00:00:00Earlier this month Alibaba released Qwend 3.5 with a 400 billion parameter model and
00:00:05a max thinking one that claims to have better benchmarks than Opus 4.5 with beefy requirements
00:00:11to run locally.
00:00:12But just this week they released the Medium Series Qwend 3.5 models that are almost as
00:00:17powerful as their max ones and have the ability to run locally on a modern MacBook Pro, claiming
00:00:22to also have better benchmarks than Sonnet 4.5, which I don't believe, so hit subscribe
00:00:27and let's put these two models to the test.
00:00:31Most developers will admit that Sonnet 4.5 is a great model, working well with Claude
00:00:35Code, Co-Work and the whole Anthropic suite making the experience feel premium.
00:00:40But you have to be online for these models to work and they're not that cheap.
00:00:44Qwend 3.5's Medium Series aims to change all of that by making it possible to run a
00:00:49model as good as Sonnet 4.5 locally and people on Twitter are going crazy.
00:00:54But I'm not convinced it's actually as good as Sonnet 4.5.
00:00:58So I'm going to test both these models on an easy, medium and hard task and see which
00:01:02one performs better.
00:01:04But before we get into the testing, I have a small confession to make.
00:01:07I'm not actually going to run Qwend 3.5 locally because my measly M1 MacBook Pro doesn't
00:01:12have the unified memory to run inference properly.
00:01:15So I'm going to be using Qwend 3.5 35b on OpenRouter connected to OpenCode and I'm
00:01:21going to be running Sonnet 4.5 in Claude Code on clean mode, so it's not using any of my
00:01:25skills, plugins or MCP tools.
00:01:27We'll start simple and ask the models to build a to-do list from scratch using React and VeeT.
00:01:32So if we look at what Sonnet 4.5 produced, we can see it has this AI purple.
00:01:36I can add a to-do item and I can mark it as completed, I have the ability to clear and
00:01:40if I refresh the page, it all stays there because it's used local storage.
00:01:44If you look at Qwend 3.5, they both have a similar styling and haven't overwritten the
00:01:48default styling that comes with VeeT.
00:01:51But again, I can add a to-do item.
00:01:53And here we have a few other options.
00:01:54So we can choose the category that it goes into, we can choose the I think severity and
00:01:59maybe a to-do date or a date that it's due.
00:02:02So I can say something like do shopping and it shows the to-do date and the severity and
00:02:06the category that it's in, which is really cool.
00:02:08Let's take a look at the code.
00:02:09So this is from Sonnet and over here, it's using a use of Flex, which I think is to do
00:02:13with the local storage down here.
00:02:15I guess it's fine, but I'd rather find it a different way.
00:02:17We have an add to-do being used here and we have some functions over here to perform actions.
00:02:22So toggle the to-do, here we have delete to-do.
00:02:25All of this looks good.
00:02:26And one thing that I'm a bit shocked about is the bit up here that mentions the JSON passing.
00:02:32So it looks like it's saving it in local storage as JSON and then passing it.
00:02:35And it would have been nice to have this code in a separate function so that if you want
00:02:38to add more things to it, it wouldn't clog up the top of the code over here.
00:02:42Now, if we look at Qwend, we have some categories, it doesn't look like any use effect is being
00:02:46used, which is good.
00:02:48And if we scroll down, we have handle submit, which is a name I would prefer to use.
00:02:51And we also have handle updates, handle delete and handle toggle completed.
00:02:55And one thing I really like about this is it put the to-do items in a separate component.
00:02:59So instead of clocking up the main components, so the main to-do app component, it created
00:03:03a new component over here, which is used down here in the app section since there are multiple
00:03:07to-do items.
00:03:08So the win goes to Qwend because it produced a to-do list with many more features.
00:03:13But after I ran these tests, I realised that Qwend had the superpower skill enabled in open
00:03:18code.
00:03:19So I ran it again without the skill and this is the result we got.
00:03:23So I guess the win goes to Sonnet.
00:03:25Let's move on to the second test, which is to build an interactive solar system using
00:03:29React, Veeet and 3JS.
00:03:31Claude did a much better job in one shot.
00:03:33Okay, it is missing a few planets, but I can click on the ones that exist.
00:03:37I click on the sun and get some information about it.
00:03:39I click on Uranus down here and also get some information about it.
00:03:44The movement on the site is also flawless, so I can pan, rotate, zoom in and out and so
00:03:48on.
00:03:49And here is what Qwend produced.
00:03:50Yes, a blank page.
00:03:51If we take a look at the console, we can see there's an error here that I did pass to Qwend
00:03:56multiple times, but it wasn't able to solve.
00:03:58In fact, the whole process of creating this was quite cumbersome.
00:04:01Qwend did go to sleep a few times and I had to wake it up and it also struggled to fix
00:04:05errors over and over again.
00:04:06Not to mention, if we take a look at the files produced by Qwend, we have a package JSON here,
00:04:10a package lock and a node modules directory, which was not used at all because the main
00:04:15project is inside the solar system directory and a proper package JSON as well as a proper
00:04:20node modules directory.
00:04:21So for test number two, Claude also wins.
00:04:23For the final test, I got these models to modify an existing code base to take a screenshot
00:04:28of a tweet when the user posts the URL inside the app.
00:04:32We'll start off with Claude, which produced the screen page over here.
00:04:35Give me the option to change the background and padding.
00:04:38Now, the first time I ran this, I did get an error, which I asked Claude to fix.
00:04:42I'm going to copy the URL for this tweet by JSON, paste it in here and click capture.
00:04:47And after a few seconds, we get the image down here with the option to download it.
00:04:51And here is the result from Qwend with a screen page over here.
00:04:54Again, I'm going to copy this tweet, paste it here.
00:04:56It says extract video instead of extract screenshot and it starts to capture it, which looks promising.
00:05:01But after a while, we hit a 60 second timeout, which is similar to the error we experienced
00:05:06with Sonnet.
00:05:07But I did ask Qwend to fix it and it did extend the timeout, but it didn't fix the issue
00:05:11that caused it in the first place.
00:05:13So it looks like Sonnet 4.5 wins all three tests.
00:05:17So even though on paper, Qwend 3.5/35b should outperform Sonnet 4.5, in real world testing
00:05:24that doesn't seem to be the case.
00:05:26And don't get me wrong, it's really impressive that you can run a 35 billion or even 27 billion
00:05:31parameter model locally on a modern MacBook.
00:05:34But regardless of what people on Twitter are saying about it, there's no way it can outproduce
00:05:38Sonnet 4.5 on coding tasks, as you can see from the tests I ran earlier.
00:05:42So why do the benchmarks make it look so good?
00:05:45Well, there is a huge chance that Qwend 3.5 was post trained on specific benchmark questions
00:05:51like Sweebench verified so that it performs well on those questions.
00:05:55But a model like Sonnet 4.5 would have been post trained on a much broader and robust dataset,
00:06:01making it handle more nuanced tasks.
00:06:03But not to mention the Qwend model I tested had 35 billion parameters, but only use 3 billion
00:06:08during inference.
00:06:09Whereas even though Anthropic don't publish their numbers, looking at estimations, Sonnet
00:06:143 could have been trained on 70 billion parameters, and there's no doubt Sonnet 4.5 would have
00:06:18much more.
00:06:19So it's not really fair to compare these models on benchmarks alone.
00:06:23It's always important to do your own research and run your own evals.
00:06:26I mean, there is a reason why Qwend 3.5 wasn't included on the model list for OpenCode Go.
00:06:31While we're on the topic of Qwend, their TTS model was recently released and Joss has
00:06:35a great video covering it for voice cloning, emotions in voice and so much more, which you
00:06:39can check out here.