Transcript
00:00:00Google just released Gemini 3.5 flash and they're making some pretty bold claims. Frontier
00:00:04performance at four times the speed, often at less than half the cost. Which all sounds
00:00:09incredible, but the reality is a lot worse than Google is advertising.
00:00:12And that was only half of what they released. They also released Anti-Gravity 2, which is
00:00:16their new standalone agent app, basically their answer to Codex, as well as the Anti-Gravity
00:00:20CLI, which actually replaces the Gemini CLI, so that's another one for Killed by Google.
00:00:30Let's start with the headline stats. This has a million token context window, 64,000
00:00:34output tokens and it takes in text, images, video, audio and PDFs as input. Google has
00:00:39always been pretty good at these multimodal models.
00:00:42As for actual performance, Google's own benchmarks have this model being in line with GPT 5.5
00:00:46when it comes to coding, being only a few percent behind on SWBench Pro and Terminal
00:00:50Bench and in fact it's actually beating Opus 4.7 on Terminal Bench by around 10%, but Claude
00:00:56Opus does get its own back on SWBench Pro by beating Gemini by around 10% as well.
00:01:01For agentic workflows, this model is actually winning on both the MCP and Toolathon benchmarks
00:01:06and overall these benchmarks are not bad results, but all of this is according to Google.
00:01:11If instead we take a look at third-party benchmarks, like artificial analysis, it's not doing
00:01:15too great. That coding index has Gemini 3.5 flash scoring 45, which is actually below models
00:01:21like Kimi K2.6 and it's not even beating Gemini 3.1 Pro even though on all of their own benchmarks
00:01:27it was ahead in everything. It's actually only a few points ahead of Gemini 3 flash as
00:01:31well.
00:01:32The story does get a little bit better when you look at agentic performance. It's made
00:01:35a nice jump over Gemini 3.1 Pro and yes, technically it is up there competing with the Frontier
00:01:41models.
00:01:42Looking at our benchmarks, it appears that 75% of you watching this aren't subscribed
00:01:45so I'm going to ask you nicely to do so. Please subscribe.
00:01:48The one key highlight of this model is definitely its speed. They actually got 278 tokens per
00:01:53second out of this model, which massively outperforms Opus 4.7 and GPT 5.5 and even models
00:01:59like Haiku and the open-source OpenAI ones. So when it comes to intelligence vs speed,
00:02:04this model definitely is the best.
00:02:06Overall it's just a mixed bag of results. It's not the best model and it's not the worst,
00:02:10but it is really really fast and I wouldn't mind these results if it was actually half
00:02:14the cost of the other models, but this is where things start to fall apart.
00:02:18The price of this model is $1.50 for a million input tokens and $9 for a million output tokens,
00:02:23which is actually 3 times more than Gemini 3 flash was, but it is still way cheaper than
00:02:27the likes of Opus 4.7 and GPT 5.5, at least on paper that is.
00:02:32When actually running their benchmarks though, artificial analysis found that Gemini 3.5 flash
00:02:36cost $1,552 to run the intelligence index, which is actually 5.5 times more expensive
00:02:42than Gemini 3 flash and 75% more expensive than Gemini 3.1 Pro. What's even worse though
00:02:48is this is more expensive than GPT 5.5 when on high reasoning, which massively beats flash
00:02:54when it comes to coding performance, and in fact I'll just highlight every model on this
00:02:57chart that is cheaper and outperforms flash when it comes to coding. It just does not look
00:03:02good at all and it's certainly not at half the cost like their marketing claimed.
00:03:06Digging a bit deeper into this, it seems like the problem with this model is while fast,
00:03:10it is token hungry. On agentic evaluations it averaged 49 turns per task, which is one
00:03:15of the highest of any models they've tested. It just really likes to burn through your
00:03:19input tokens. So overall I'm just not really sure where this actually leaves us. This model
00:03:23just feels meh. The speed is super cool, so if you value that over everything else, perhaps
00:03:28this is the model to use. The same if you want great multimodal capabilities, but the
00:03:33coding performance is just not enough for me to even consider testing this for a longer
00:03:37period of time than I have in this video. So let's just move on to talk about the other
00:03:41big announcement which was anti-gravity 2 and the new CLI.
00:03:44This is anti-gravity 2? Wait no sorry that's t3 code. Maybe this one then? Wait nope that's
00:03:50codecs. What about this one? Nope that's cursor. This one is actually anti-gravity 2 and I think
00:03:55you can see my point. Basically all of these apps have started to look the same. A funny
00:03:59part of one of our demos is when the developer tries to create a new project and you can just
00:04:03see the codecs folder right there. So to be honest I won't spend much time going through
00:04:07this app. It's exactly the same as all of the other ones. We have our conversations on the
00:04:11left, we have our projects, we have scheduled tasks and in here you can click into any of
00:04:15these files if you want to see the diff view. The only thing to note is that this is not
00:04:18the anti-gravity IDE anymore. This is just a completely standalone app. What you're seeing
00:04:22is what you get. Now I did actually try out a couple of test prompts in here. One of them
00:04:26was to create a full stack personal finance dashboard and the other one was much simpler
00:04:30just testing out the UI of how it would build me out a cafe website in a single index.html.
00:04:35This is the result of the very simple cafe prompt and I've got to say I do really like
00:04:39the website that it's built here so it does seem like 3.5 flash is pretty good at UI design.
00:04:44I'd say this is overall just a very nice site. It does still have a little bit of an AI feel
00:04:48to it. I think it's mostly that card and gradient style that AI seems to like at the moment but
00:04:53the site is pretty functional and does look how I would expect it to. For context this
00:04:58is what Opus 4.7 gave me when I gave it the exact same prompt and I do think Gemini 3.5
00:05:03flash wins on this one but obviously this is just a one-off test. As for the more complicated
00:05:07finance dashboard prompt that's a full stack application it's done well to actually make
00:05:11the application work but I definitely don't like the UI design. It's not bad but it just
00:05:16has that I've been designed by AI look and feel and also minus points for calling this
00:05:20aura wealth. When you compare that to what Opus 4.7 gave me it's just a world of difference.
00:05:25Opus 4.7 here looks really nice and to be honest I don't have that many notes on how
00:05:29I would change this UI. Opus actually spent 20 minutes on that prompt whereas Gemini took
00:05:33five minutes so yes it's definitely quicker but it also could have used the extra 15 to
00:05:38make it look better. Moving on from that though we also got the anti-gravity CLI and this one's
00:05:42probably gonna anger some people because they're actually shutting down the Gemini CLI you won't
00:05:46be able to use it after June 18th this year and the new CLI is basically the same at the
00:05:51moment except it's been rewritten in Go and it's also closed source now which does suck
00:05:56and I didn't actually install this one as again it's just Claude code but for Gemini
00:06:00there is nothing new to show you. To summarise all of my thoughts on this then right now 3.5
00:06:05flash is good for agents but it's expensive and too weak on coding to be the whole package
00:06:10so I do hope we see a bit more from Gemini 3.5 Pro which is apparently coming next month
00:06:15but for now it just seems like Google is not going to be the leader for coding and to be
00:06:19honest with you I don't really think they need to be. It seems that Google's market is more
00:06:23the everyday person building this into all of your experiences like Gmail search Workspace
00:06:28Android and everything else so maybe developers just aren't going to be that focus. Let me
00:06:33know what you think in the comments down below while you're there subscribe and as always
00:06:36see you in the next one.
Community Posts
No posts yet. Be the first to write about this video!
Write about this video