Gemini 3.5 Flash is just... fine

Englishالعربية Deutsch Español Français हिन्दी 한국어 Português 中文

Computing/SoftwareBusiness NewsInternet Technology

Transcript

00:00:00Google just released Gemini 3.5 flash and they're making some pretty bold claims. Frontier

00:00:04performance at four times the speed, often at less than half the cost. Which all sounds

00:00:09incredible, but the reality is a lot worse than Google is advertising.

00:00:12And that was only half of what they released. They also released Anti-Gravity 2, which is

00:00:16their new standalone agent app, basically their answer to Codex, as well as the Anti-Gravity

00:00:20CLI, which actually replaces the Gemini CLI, so that's another one for Killed by Google.

00:00:30Let's start with the headline stats. This has a million token context window, 64,000

00:00:34output tokens and it takes in text, images, video, audio and PDFs as input. Google has

00:00:39always been pretty good at these multimodal models.

00:00:42As for actual performance, Google's own benchmarks have this model being in line with GPT 5.5

00:00:46when it comes to coding, being only a few percent behind on SWBench Pro and Terminal

00:00:50Bench and in fact it's actually beating Opus 4.7 on Terminal Bench by around 10%, but Claude

00:00:56Opus does get its own back on SWBench Pro by beating Gemini by around 10% as well.

00:01:01For agentic workflows, this model is actually winning on both the MCP and Toolathon benchmarks

00:01:06and overall these benchmarks are not bad results, but all of this is according to Google.

00:01:11If instead we take a look at third-party benchmarks, like artificial analysis, it's not doing

00:01:15too great. That coding index has Gemini 3.5 flash scoring 45, which is actually below models

00:01:21like Kimi K2.6 and it's not even beating Gemini 3.1 Pro even though on all of their own benchmarks

00:01:27it was ahead in everything. It's actually only a few points ahead of Gemini 3 flash as

00:01:31well.

00:01:32The story does get a little bit better when you look at agentic performance. It's made

00:01:35a nice jump over Gemini 3.1 Pro and yes, technically it is up there competing with the Frontier

00:01:41models.

00:01:42Looking at our benchmarks, it appears that 75% of you watching this aren't subscribed

00:01:45so I'm going to ask you nicely to do so. Please subscribe.

00:01:48The one key highlight of this model is definitely its speed. They actually got 278 tokens per

00:01:53second out of this model, which massively outperforms Opus 4.7 and GPT 5.5 and even models

00:01:59like Haiku and the open-source OpenAI ones. So when it comes to intelligence vs speed,

00:02:04this model definitely is the best.

00:02:06Overall it's just a mixed bag of results. It's not the best model and it's not the worst,

00:02:10but it is really really fast and I wouldn't mind these results if it was actually half

00:02:14the cost of the other models, but this is where things start to fall apart.

00:02:18The price of this model is $1.50 for a million input tokens and $9 for a million output tokens,

00:02:23which is actually 3 times more than Gemini 3 flash was, but it is still way cheaper than

00:02:27the likes of Opus 4.7 and GPT 5.5, at least on paper that is.

00:02:32When actually running their benchmarks though, artificial analysis found that Gemini 3.5 flash

00:02:36cost $1,552 to run the intelligence index, which is actually 5.5 times more expensive

00:02:42than Gemini 3 flash and 75% more expensive than Gemini 3.1 Pro. What's even worse though

00:02:48is this is more expensive than GPT 5.5 when on high reasoning, which massively beats flash

00:02:54when it comes to coding performance, and in fact I'll just highlight every model on this

00:02:57chart that is cheaper and outperforms flash when it comes to coding. It just does not look

00:03:02good at all and it's certainly not at half the cost like their marketing claimed.

00:03:06Digging a bit deeper into this, it seems like the problem with this model is while fast,

00:03:10it is token hungry. On agentic evaluations it averaged 49 turns per task, which is one

00:03:15of the highest of any models they've tested. It just really likes to burn through your

00:03:19input tokens. So overall I'm just not really sure where this actually leaves us. This model

00:03:23just feels meh. The speed is super cool, so if you value that over everything else, perhaps

00:03:28this is the model to use. The same if you want great multimodal capabilities, but the

00:03:33coding performance is just not enough for me to even consider testing this for a longer

00:03:37period of time than I have in this video. So let's just move on to talk about the other

00:03:41big announcement which was anti-gravity 2 and the new CLI.

00:03:44This is anti-gravity 2? Wait no sorry that's t3 code. Maybe this one then? Wait nope that's

00:03:50codecs. What about this one? Nope that's cursor. This one is actually anti-gravity 2 and I think

00:03:55you can see my point. Basically all of these apps have started to look the same. A funny

00:03:59part of one of our demos is when the developer tries to create a new project and you can just

00:04:03see the codecs folder right there. So to be honest I won't spend much time going through

00:04:07this app. It's exactly the same as all of the other ones. We have our conversations on the

00:04:11left, we have our projects, we have scheduled tasks and in here you can click into any of

00:04:15these files if you want to see the diff view. The only thing to note is that this is not

00:04:18the anti-gravity IDE anymore. This is just a completely standalone app. What you're seeing

00:04:22is what you get. Now I did actually try out a couple of test prompts in here. One of them

00:04:26was to create a full stack personal finance dashboard and the other one was much simpler

00:04:30just testing out the UI of how it would build me out a cafe website in a single index.html.

00:04:35This is the result of the very simple cafe prompt and I've got to say I do really like

00:04:39the website that it's built here so it does seem like 3.5 flash is pretty good at UI design.

00:04:44I'd say this is overall just a very nice site. It does still have a little bit of an AI feel

00:04:48to it. I think it's mostly that card and gradient style that AI seems to like at the moment but

00:04:53the site is pretty functional and does look how I would expect it to. For context this

00:04:58is what Opus 4.7 gave me when I gave it the exact same prompt and I do think Gemini 3.5

00:05:03flash wins on this one but obviously this is just a one-off test. As for the more complicated

00:05:07finance dashboard prompt that's a full stack application it's done well to actually make

00:05:11the application work but I definitely don't like the UI design. It's not bad but it just

00:05:16has that I've been designed by AI look and feel and also minus points for calling this

00:05:20aura wealth. When you compare that to what Opus 4.7 gave me it's just a world of difference.

00:05:25Opus 4.7 here looks really nice and to be honest I don't have that many notes on how

00:05:29I would change this UI. Opus actually spent 20 minutes on that prompt whereas Gemini took

00:05:33five minutes so yes it's definitely quicker but it also could have used the extra 15 to

00:05:38make it look better. Moving on from that though we also got the anti-gravity CLI and this one's

00:05:42probably gonna anger some people because they're actually shutting down the Gemini CLI you won't

00:05:46be able to use it after June 18th this year and the new CLI is basically the same at the

00:05:51moment except it's been rewritten in Go and it's also closed source now which does suck

00:05:56and I didn't actually install this one as again it's just Claude code but for Gemini

00:06:00there is nothing new to show you. To summarise all of my thoughts on this then right now 3.5

00:06:05flash is good for agents but it's expensive and too weak on coding to be the whole package

00:06:10so I do hope we see a bit more from Gemini 3.5 Pro which is apparently coming next month

00:06:15but for now it just seems like Google is not going to be the leader for coding and to be

00:06:19honest with you I don't really think they need to be. It seems that Google's market is more

00:06:23the everyday person building this into all of your experiences like Gmail search Workspace

00:06:28Android and everything else so maybe developers just aren't going to be that focus. Let me

00:06:33know what you think in the comments down below while you're there subscribe and as always

00:06:36see you in the next one.

Key Takeaway

While Gemini 3.5 Flash offers industry-leading speed and competent multimodal UI design, its high token consumption and lackluster coding performance compared to rivals make it a niche tool rather than a comprehensive development solution.

Highlights

Gemini 3.5 Flash achieves a throughput of 278 tokens per second, exceeding the speed of GPT 5.5 and Opus 4.7.
Artificial Analysis benchmarks indicate the actual cost of running Gemini 3.5 Flash in agentic workflows is 5.5 times higher than its predecessor, Gemini 3 Flash.
Gemini 3.5 Flash averages 49 turns per task in agentic evaluations, reflecting a tendency to consume more tokens than other models.
The model performs well in basic UI design tasks but lacks the coding reasoning necessary for complex full-stack applications when compared to Opus 4.7.
Google is replacing the open-source Gemini CLI with a closed-source version written in Go, effective June 18th.

Timeline

Gemini 3.5 Flash Capabilities and Benchmarks

The model supports a one-million token context window and 64,000 output tokens.
Internal benchmarks show the model competing with GPT 5.5 in coding and outperforming Opus 4.7 on Terminal Bench.
Third-party coding indices rank the model below Kimi K2.6 and Gemini 3.1 Pro.
The model achieves a peak speed of 278 tokens per second.

Google claims frontier performance for the new model, highlighting multimodal inputs including video and audio. While internal data shows strong results, third-party testing suggests lower coding proficiency. Speed remains the standout feature, significantly outpacing established frontier models.

Cost and Agentic Efficiency

The model is priced at $1.50 per million input tokens and $9 per million output tokens.
Actual operational costs during benchmark testing proved to be 75% higher than Gemini 3.1 Pro.
High token consumption stems from an average of 49 turns per task in agentic workflows.

Marketing claims suggest cost-effectiveness, but real-world testing reveals that the model's high turn count per agentic task inflates overall expenses. It currently ranks as more expensive to run than some higher-reasoning models that deliver superior coding results.

Anti-Gravity 2 App and CLI Changes

Anti-Gravity 2 functions as a standalone agent app similar to existing tools like Codecs and Cursor.
The model excels at generating simple UI elements but struggles with complex, full-stack application logic.
Google is discontinuing the current Gemini CLI in favor of a closed-source Go-based implementation.

The new application interface mirrors common industry standards for AI-assisted coding tools. While it produces functional UI code, it lacks the depth required for complex project structures compared to longer-running models. The shift to a closed-source CLI signals a change in the developer experience strategy for Google.

Community Posts

No posts yet. Be the first to write about this video!

Write about this video