00:00:00Last week, Google did something unexpected.
00:00:02They released a truly open-source model under Apache 2.0 license.
00:00:08It's called Gemma 4 and it features specialized edge versions as small as 2.3 billion parameters
00:00:14that are designed to run entirely offline on devices like your iPhone, Android flagship
00:00:21phones, or even on a Raspberry Pi.
00:00:23It seems like the race to build the ultimate small model is really heating up.
00:00:28Just a few weeks ago I did some tests on QWAN 3.5 to see how it was pushing the limits of
00:00:33local AI, but now Google is promising even higher intelligence density.
00:00:39So in this video, we're gonna perform similar tests on Gemma 4 to see if this model is truly
00:00:44the best small model out there.
00:00:47It's gonna be a lot of fun, so let's dive into it.
00:00:53So what's so unique about these new Gemma 4 models?
00:00:57Well, the real technical shift here is something Google calls per-layer embeddings.
00:01:03In traditional transformers, a token gets one embedding at the start that has to carry
00:01:08all its meaning through every layer.
00:01:11But in Gemma 4, each layer has its own set of embeddings, allowing the model to introduce
00:01:16new information exactly where it's needed.
00:01:19This is why you see the E in the E2B and E4B model names.
00:01:24It stands for effective parameters.
00:01:27While the model acts with the reasoning depth of a 5 billion parameter model, it only uses
00:01:32about 2.3 billion active parameters during inference.
00:01:36This results in a much higher intelligence density, allowing it to handle complex logic
00:01:42while using less than 1.5 gigabytes of RAM.
00:01:46And beyond the text performance, Gemma 4 is natively multimodal.
00:01:50This means vision, text, and even audio are processed within the same unified architecture
00:01:56rather than being bolted on as separate modules.
00:01:59This architecture enables a new thinking mode that uses an internal reasoning chain to verify
00:02:05its own logic before giving you an answer.
00:02:08This is specifically designed to prevent the infinite loops and logic errors that often
00:02:13plague small models.
00:02:15It also ships with 128K context window and support for over 140 languages, which should
00:02:22make it significantly more capable at tasks like complex OCR or localized language identification.
00:02:29And to showcase these abilities, Google released some eye-opening benchmarks.
00:02:34In their internal tests, the E4B model achieved a score of 42.5% on the AIME 2026 mathematics
00:02:43benchmark, which is more than double the score of much larger previous generation models.
00:02:49They also demonstrated the model's agentic potential on the T2 bench, where it showed
00:02:54a massive jump in tool use accuracy.
00:02:57They also demonstrated the model's agentic potential through a feature called agent skills.
00:03:02Instead of just generating static text, the model was shown using native function calling
00:03:07to handle multi-step workflows like querying Wikipedia for live data or building an end-to-end
00:03:13animal calls widget.
00:03:15Now all of that sounds impressive, but let's try it on our own and see how it works.
00:03:20In my previous QUEN 3.5 video, I tested the small models by running them locally without
00:03:25internet connection using LMStudio and CLINE.
00:03:28I will use the same setup for testing GEMMA 4.
00:03:32First we have to download the models on LMStudio, then increase the available context window
00:03:37and start the server.
00:03:39We can then jump into CLINE and hook up our local LMStudio server, choose the E2B model,
00:03:45turn off our internet connection and begin our tests.
00:03:49Last time we saw that QUEN 3.5 was quite decent at generating a simple CAFE website using HTML,
00:03:55CSS and JavaScript with two of their smallest parameter models.
00:04:00Let's reuse the same prompt and see if GEMMA 4 is just as good at this coding task.
00:04:05So it took the E2B model roughly 1.5 minutes to complete this task.
00:04:10And for a model with 2.3 billion active parameters, the results were honestly a bit underwhelming
00:04:16if compared to the result of QUEN's output which used only 0.8 billion parameters.
00:04:22The most annoying thing was that GEMMA appended the task list at the end of the HTML file as
00:04:28well as at the end of the CSS file so I had to manually delete it from both files before
00:04:33opening the page.
00:04:34And it also claimed it had written a JavaScript file, when in fact there was no JS file produced
00:04:40in the final output, so the E2B test results were a bit disappointing.
00:04:45But this situation did improve quite a lot when switching to the E4B model version.
00:04:50It took this version roughly 3.5 minutes to finish the task, but the end result was notably
00:04:55better.
00:04:56Maybe not in terms of design, it still looks very bland, but this version actually had a
00:05:00working card functionality which none of the previous tests, both for QUEN and GEMMA, were
00:05:06able to produce successfully.
00:05:08So the E4B version is already a big step up from the E2B version, but obviously no one
00:05:15would seriously consider using such small models for complex or serious coding.
00:05:20I just conducted these tests out of curiosity to see if such a small parameter count can
00:05:25still produce a meaningful result for a given coding task.
00:05:29Alright now let's see how GEMMA 4 performs on edge devices like an iPhone.
00:05:34So in my QUEN 3.5 video, I built a custom iOS app which was capable of running the model
00:05:40on the native Metal GPU using Swift's MLX framework.
00:05:44Although GEMMA 4 is open source, unfortunately there are no MLX bindings available for this
00:05:49model as of now, which would be capable of running this model on iOS with multimodal capabilities.
00:05:56And Google themselves are running GEMMA 4 on their AI Edge Gallery app using their own
00:06:01inference framework called Lite RTLM, which sadly also doesn't support iOS bindings at
00:06:07the moment.
00:06:08So to try it out on an iPhone, our best option right now is to use their Edge Gallery app.
00:06:13So we're going to conduct our tests on their own app and see how it performs.
00:06:18So let's go to the AI chat section.
00:06:20And here we will be prompted to download the E2B version of GEMMA 4.
00:06:25And you also have the option to download the E4B version, but for some reason the app says
00:06:29I don't have sufficient space to download it, which I'm sure is not true, so maybe that's
00:06:34a bug in the app.
00:06:36But anyway, now that I've downloaded the model, we can finally start using it.
00:06:41And let's start by typing a simple hello.
00:06:43Wow, did you see how fast the response was?
00:06:46A lot faster than QUEN 3.5.
00:06:48Maybe this is the magic of the Lite RTLM framework they're using.
00:06:53So now let's try the famous car wash test and see if GEMMA gets it correctly.
00:06:57Wow, it gives me a really long response.
00:07:00And at the end of it, we see that the final recommendation is to drive, which is correct,
00:07:06but I do have to take into account the fact that it's looking at convenience and comfort
00:07:10and not the actual logical fact.
00:07:13So I don't know, it kind of passes the test, but it kind of doesn't at the same time.
00:07:18All right, now let's hop on to the ask image section and let's see if GEMMA can identify
00:07:24the dog in this picture.
00:07:26So it did identify that it is indeed a dog and it gives some other details about the image.
00:07:31So that's pretty cool.
00:07:32But if I ask it, what's the breed of the dog?
00:07:35It replies saying that it's a Border Collie, which is not true.
00:07:39It is actually a Corgi.
00:07:40But I do have to say, for just over 2 billion active parameters, this response is pretty
00:07:45good nonetheless.
00:07:46Lastly, let's try the OCR test.
00:07:48So if you watched my previous video with Quen 3.5, you will recall that I tested it with
00:07:54an image that had text in it, which was in Latvian, which is also my native language.
00:07:59Now GEMMA touts as being able to understand up to 140 languages.
00:08:05So I assume it should pass this test easily.
00:08:08And yes, indeed, it does identify that the language is Latvian.
00:08:13And I'm surprised that most of the text is actually pretty spot on.
00:08:16With some minor exceptions, I see that some words are nonexistent and some of the grammatical
00:08:22structures are just very bizarre.
00:08:24But it's still very impressive.
00:08:26So I'll give this test a pass.
00:08:28Now, this actually begs the question, can I chat with this model in Latvian?
00:08:32So let me try that next.
00:08:33So I see that the response is actually in Latvian.
00:08:36But once again, the grammatical structures are very bizarre.
00:08:39And nobody talks like that.
00:08:41But still, Latvian is a very small language.
00:08:44So this is already impressive that it has all that knowledge in such a small model.
00:08:48And while I'm at it, I'm going to ask it, what is the current US president to see what
00:08:53is the knowledge cutoff of GEMMA 4?
00:08:56And it replies that it is Joe Biden.
00:08:58And then if I actually ask, what is your knowledge cutoff?
00:09:02It will tell me that it's January 2025, which checks out.
00:09:06So there you have it.
00:09:07That is GEMMA 4, the newest open source model by Google.
00:09:10And I got to be honest, this model does seem pretty good.
00:09:14It does what it advertises, albeit it lacks some creativity in web design.
00:09:19But other than that, the small models, as we just saw, are more than capable of successfully
00:09:24completing all the tasks I was giving it.
00:09:27It's a shame we still don't have the MLX bindings for this model, because I would really love
00:09:32to use GEMMA 4 locally on a custom iOS app.
00:09:36But I'm sure that it won't take long for Google to get this release out to the public.
00:09:41And in the meantime, I'm keeping a close eye on community projects like SwiftLM, which are
00:09:46already working on unofficial native bindings for these models.
00:09:50So those are my two cents on the model.
00:09:52What do you think about GEMMA 4?
00:09:54Have you tried it?
00:09:55Will you use it?
00:09:56Let us know in the comment section down below.
00:09:59And folks, if you like these types of technical breakdowns, please let me know by smashing
00:10:03that like button underneath the video.
00:10:05And also don't forget to subscribe to our channel.
00:10:07This has been Andres from BetterStack and I will see you in the next videos.