00:00:00You can reset the days since counter because there is another new best model.
00:00:03This time it's GPT 5.4 and I've been testing it, so here's what you need to know as well
00:00:07as the pros and cons in 5 minutes and 40 seconds.
00:00:11So here are the bullet points.
00:00:17GPT 5.4 is better at knowledge work and web search, it has native computer use capabilities,
00:00:22there's a new tool search feature which I'll explain in a bit, it can be steered mid response,
00:00:26there's a new fast mode and it also has a 1 million token context window.
00:00:30Seemingly that goal with 5.4 was to combine codex 5.3 coding capabilities with the knowledge,
00:00:34web search and professional work skills of GPT 5.2 to make 5.4 the all rounder, do everything
00:00:40model.
00:00:41And according to artificial analysis third party benchmarks, they've actually achieved
00:00:45that goal.
00:00:46It's ranked as the best coding model, the best agentic model and it also draws with
00:00:49gemini for the best intelligence model.
00:00:51If we focus in on what I found to be the most interesting bullet point though, it was their
00:00:55native computer use.
00:00:56OpenAI have apparently designed this as their first general purpose model with built in computer
00:01:00use capabilities so it should excel at writing code to operate computers via libraries like
00:01:04playwright as well as issuing mouse and keyboard commands in response to screenshots.
00:01:08They released an experimental playwright skill so I gave it a go.
00:01:12In codex using 5.4 and higher reasoning I gave it a prompt to create an interactive 3d experience
00:01:16of Tower Bridge in London.
00:01:18I also used the new skill as well as an image generation skill so it can generate its own
00:01:22assets to use as textures.
00:01:24Now the experience itself was pretty similar to codex 5.3 which until now was my favourite
00:01:29model.
00:01:30After around 20 minutes or so of working through the task, that's when it started to use the
00:01:33new playwright skill and that experience was pretty good.
00:01:37It would open up the browser, click around, navigate the scene, identify any visual issues
00:01:41it needed to fix like this background not blending into the scene and then it would jump back
00:01:45into the code, fix it and rinse and repeat and it all felt very smooth and natural.
00:01:50The first iteration of this project actually took around 30 minutes to complete all from
00:01:54that single prompt and after that I sent some follow ups asking for a few more details and
00:01:58a few fixes like boats being sideways and the road clipping with other textures and it
00:02:03again just went off and worked on those tasks for around 30 minutes on each prompt, opening
00:02:07up chrome, verifying and making changes and giving me this final version in about an hour
00:02:11and a half of work and 3 prompts and it's not perfect by any means but for a completely
00:02:16hands off development I don't think it's too bad and to me this model is just a no brainer
00:02:20upgrade for those of you that already like 5.3 codex.
00:02:24I did find it quite funny though that after about 2 hours of using this it prompted me
00:02:27that I could have saved an hour of my time if I switch over to that new fast mode.
00:02:31This is actually the exact same model, same intelligence and same experience it just delivers
00:02:35up to 1.5x faster token speed because it's billed at 2x your usage so it's essentially
00:02:41just a priority tier and it's not a different model at all.
00:02:44Now the other bullet point that I found particularly interesting in this release was the tool search.
00:02:48This solves the problem of having all of your tool definitions loaded into the system prompt
00:02:52up front so if you have too many tools and too many MCP servers you end up wasting tokens
00:02:56and causing context blow which can impact the quality of your output.
00:03:00Now with GPT 5.4 the prompt instead has a lightweight list of available tools and the model actually
00:03:05has a tool search capability so when the model does need a tool it can simply look up that
00:03:09tool definition and append it to the conversation right when it's needed.
00:03:13OpenAI actually says this reduces token usage by up to 47% and they showed that in a benchmark
00:03:18with 36 MCP servers where it maintained the same accuracy.
00:03:22Besides those new features that we just looked at this model is really focused around improving
00:03:26tools both improving how the model uses them as well as when it chooses to use them and
00:03:30this has paid off in that benchmarks but to be honest with you there's not too much to
00:03:34report here besides yes the new model is better than the last model.
00:03:38I think you can summarise the pros of this model as it's smarter, it runs for longer and
00:03:42it uses tools better meaning it can complete harder tasks than the last model could.
00:03:47Yes newsflash everyone this model is better than the last version but now let's talk about
00:03:51some of the cons.
00:03:52The most noticeable one to me was the speed.
00:03:54While I do like my models to think for a little bit longer sometimes it feels like GPT 5.4
00:03:59does this a little too much or maybe it's just slow at the actual thinking and I'm definitely
00:04:04not the only one.
00:04:05Artificial analysis results actually showed that GPT 5.4 takes the longest to return a
00:04:09token by a pretty large margin and the same applies for the first to return 500 tokens
00:04:14as well.
00:04:15I'm not sure if this is a model issue or a provider issue at the moment so maybe this
00:04:19will improve over time but maybe a more pessimistic view is that this one is slower so you use
00:04:24the new fast mode.
00:04:26Another con is the price bump for those of you that use the API.
00:04:29The base model is actually $2.50 per a million input tokens and $15 per a million output tokens
00:04:34but the pro model that is a pricey one.
00:04:37It is charged at $30 per a million input tokens and $180 for a million output tokens and even
00:04:43worse if you do want to take advantage of that new $1 million token context window any input
00:04:47beyond 272,000 tokens will be billed at double the normal rate.
00:04:52So I'd maybe consider compacting your context for now.
00:04:55The final con though is UI design and while this one is a little bit subjective I asked
00:04:59Opus 4.6 and GPT 5.4 for a CAFE website and I think I prefer Opus here although neither
00:05:05of these blew me away.
00:05:07I think the main thing that I struggle with with GPT 5.4 and some of the other GPT models
00:05:11is that they all seem to have a very similar UI.
00:05:14It seems to love this sort of frosted card like UI and it of course loves a gradient.
00:05:19And obviously this was just one test that I did but on Design Arena this model isn't ranking
00:05:23too highly either so it's just something that OpenAI aren't that strong at the moment.
00:05:27Overall though I will say I will be daily driving this model as I am a codex fan but I'm curious
00:05:32what your thoughts are.
00:05:33What is your model of choice?
00:05:34Let me know in the comments down below, while you're there subscribe and as always see you
00:05:37in the next one.