The New Best Model Is Here (GPT-5.4)

BBetter Stack
컴퓨터/소프트웨어창업/스타트업경제 뉴스AI/미래기술

Transcript

00:00:00You can reset the days since counter because there is another new best model.
00:00:03This time it's GPT 5.4 and I've been testing it, so here's what you need to know as well
00:00:07as the pros and cons in 5 minutes and 40 seconds.
00:00:11So here are the bullet points.
00:00:17GPT 5.4 is better at knowledge work and web search, it has native computer use capabilities,
00:00:22there's a new tool search feature which I'll explain in a bit, it can be steered mid response,
00:00:26there's a new fast mode and it also has a 1 million token context window.
00:00:30Seemingly that goal with 5.4 was to combine codex 5.3 coding capabilities with the knowledge,
00:00:34web search and professional work skills of GPT 5.2 to make 5.4 the all rounder, do everything
00:00:40model.
00:00:41And according to artificial analysis third party benchmarks, they've actually achieved
00:00:45that goal.
00:00:46It's ranked as the best coding model, the best agentic model and it also draws with
00:00:49gemini for the best intelligence model.
00:00:51If we focus in on what I found to be the most interesting bullet point though, it was their
00:00:55native computer use.
00:00:56OpenAI have apparently designed this as their first general purpose model with built in computer
00:01:00use capabilities so it should excel at writing code to operate computers via libraries like
00:01:04playwright as well as issuing mouse and keyboard commands in response to screenshots.
00:01:08They released an experimental playwright skill so I gave it a go.
00:01:12In codex using 5.4 and higher reasoning I gave it a prompt to create an interactive 3d experience
00:01:16of Tower Bridge in London.
00:01:18I also used the new skill as well as an image generation skill so it can generate its own
00:01:22assets to use as textures.
00:01:24Now the experience itself was pretty similar to codex 5.3 which until now was my favourite
00:01:29model.
00:01:30After around 20 minutes or so of working through the task, that's when it started to use the
00:01:33new playwright skill and that experience was pretty good.
00:01:37It would open up the browser, click around, navigate the scene, identify any visual issues
00:01:41it needed to fix like this background not blending into the scene and then it would jump back
00:01:45into the code, fix it and rinse and repeat and it all felt very smooth and natural.
00:01:50The first iteration of this project actually took around 30 minutes to complete all from
00:01:54that single prompt and after that I sent some follow ups asking for a few more details and
00:01:58a few fixes like boats being sideways and the road clipping with other textures and it
00:02:03again just went off and worked on those tasks for around 30 minutes on each prompt, opening
00:02:07up chrome, verifying and making changes and giving me this final version in about an hour
00:02:11and a half of work and 3 prompts and it's not perfect by any means but for a completely
00:02:16hands off development I don't think it's too bad and to me this model is just a no brainer
00:02:20upgrade for those of you that already like 5.3 codex.
00:02:24I did find it quite funny though that after about 2 hours of using this it prompted me
00:02:27that I could have saved an hour of my time if I switch over to that new fast mode.
00:02:31This is actually the exact same model, same intelligence and same experience it just delivers
00:02:35up to 1.5x faster token speed because it's billed at 2x your usage so it's essentially
00:02:41just a priority tier and it's not a different model at all.
00:02:44Now the other bullet point that I found particularly interesting in this release was the tool search.
00:02:48This solves the problem of having all of your tool definitions loaded into the system prompt
00:02:52up front so if you have too many tools and too many MCP servers you end up wasting tokens
00:02:56and causing context blow which can impact the quality of your output.
00:03:00Now with GPT 5.4 the prompt instead has a lightweight list of available tools and the model actually
00:03:05has a tool search capability so when the model does need a tool it can simply look up that
00:03:09tool definition and append it to the conversation right when it's needed.
00:03:13OpenAI actually says this reduces token usage by up to 47% and they showed that in a benchmark
00:03:18with 36 MCP servers where it maintained the same accuracy.
00:03:22Besides those new features that we just looked at this model is really focused around improving
00:03:26tools both improving how the model uses them as well as when it chooses to use them and
00:03:30this has paid off in that benchmarks but to be honest with you there's not too much to
00:03:34report here besides yes the new model is better than the last model.
00:03:38I think you can summarise the pros of this model as it's smarter, it runs for longer and
00:03:42it uses tools better meaning it can complete harder tasks than the last model could.
00:03:47Yes newsflash everyone this model is better than the last version but now let's talk about
00:03:51some of the cons.
00:03:52The most noticeable one to me was the speed.
00:03:54While I do like my models to think for a little bit longer sometimes it feels like GPT 5.4
00:03:59does this a little too much or maybe it's just slow at the actual thinking and I'm definitely
00:04:04not the only one.
00:04:05Artificial analysis results actually showed that GPT 5.4 takes the longest to return a
00:04:09token by a pretty large margin and the same applies for the first to return 500 tokens
00:04:14as well.
00:04:15I'm not sure if this is a model issue or a provider issue at the moment so maybe this
00:04:19will improve over time but maybe a more pessimistic view is that this one is slower so you use
00:04:24the new fast mode.
00:04:26Another con is the price bump for those of you that use the API.
00:04:29The base model is actually $2.50 per a million input tokens and $15 per a million output tokens
00:04:34but the pro model that is a pricey one.
00:04:37It is charged at $30 per a million input tokens and $180 for a million output tokens and even
00:04:43worse if you do want to take advantage of that new $1 million token context window any input
00:04:47beyond 272,000 tokens will be billed at double the normal rate.
00:04:52So I'd maybe consider compacting your context for now.
00:04:55The final con though is UI design and while this one is a little bit subjective I asked
00:04:59Opus 4.6 and GPT 5.4 for a CAFE website and I think I prefer Opus here although neither
00:05:05of these blew me away.
00:05:07I think the main thing that I struggle with with GPT 5.4 and some of the other GPT models
00:05:11is that they all seem to have a very similar UI.
00:05:14It seems to love this sort of frosted card like UI and it of course loves a gradient.
00:05:19And obviously this was just one test that I did but on Design Arena this model isn't ranking
00:05:23too highly either so it's just something that OpenAI aren't that strong at the moment.
00:05:27Overall though I will say I will be daily driving this model as I am a codex fan but I'm curious
00:05:32what your thoughts are.
00:05:33What is your model of choice?
00:05:34Let me know in the comments down below, while you're there subscribe and as always see you
00:05:37in the next one.

Key Takeaway

GPT 5.4 is a powerful multi-modal successor designed for high-level agentic tasks and coding, though it faces challenges regarding high API costs and slow generation speeds.

Highlights

GPT 5.4 introduces native computer use, allowing it to control mouse and keyboard inputs and write Playwright code.

The model features a 1 million token context window and a new 'Tool Search' capability to reduce token bloat.

A new 'Fast Mode' is available that increases token generation speed by 1.5x at double the cost of standard usage.

GPT 5.4 is positioned as an 'all-rounder' combining the coding strengths of Codex 5.3 with the reasoning of GPT 5.2.

Significant price increases have been introduced for API users, especially for the 'Pro' tier and long-context inputs.

Third-party benchmarks from Artificial Analysis rank it as the top model for coding and agentic tasks.

Timeline

Introduction and Core Features of GPT 5.4

The speaker introduces GPT 5.4 as the new industry leader, replacing previous models in performance rankings. Key features mentioned include enhanced knowledge work, native computer use, a tool search feature, and a massive 1 million token context window. The model aims to bridge the gap between the coding-centric Codex 5.3 and the general-purpose intelligence of GPT 5.2. According to Artificial Analysis, it successfully claims the title of the best model for coding and agentic tasks. This section establishes the model's identity as a versatile 'do-everything' AI tool for professionals.

Testing Native Computer Use and Playwright Skills

This section dives into the most innovative feature: native computer use, which allows the AI to navigate interfaces using screenshots. The speaker demonstrates this by prompting the model to build an interactive 3D experience of London's Tower Bridge. Using the 'Playwright skill,' the model independently opened a browser to identify visual bugs and then returned to the code to fix them. While the total development time was about 90 minutes across three prompts, the process was almost entirely hands-off. The speaker notes that while not perfect, the agentic capabilities are a 'no-brainer' upgrade for developers.

Tool Search Efficiency and Fast Mode

The speaker explains the 'Tool Search' feature, which solves the problem of 'context blow' caused by loading too many tool definitions into a system prompt. Instead of loading everything at once, GPT 5.4 uses a lightweight list and searches for specific tool definitions only when they are actually needed. This optimization reportedly reduces token usage by up to 47% while maintaining high accuracy across dozens of servers. Additionally, the video covers 'Fast Mode,' a priority tier that delivers 1.5x faster speeds for users willing to pay a premium. These technical refinements focus on making the model more efficient and capable of handling complex, long-running tasks.

Analysis of Cons: Latency and Pricing

Despite the impressive features, the speaker highlights significant drawbacks, primarily centered around speed and cost. Third-party data shows that GPT 5.4 is currently the slowest model to return tokens, which might be a strategic move to encourage 'Fast Mode' subscriptions. The API pricing has also seen a substantial jump, with the 'Pro' model reaching $30 per million input tokens and $180 per million output tokens. Furthermore, using the extended 1 million token context window triggers double billing for any input exceeding 272,000 tokens. This section serves as a warning for budget-conscious developers and high-volume enterprise users.

UI Design Comparison and Final Verdict

The final section critiques the model's aesthetic output, specifically regarding UI and website design. When compared to Opus 4.6, GPT 5.4 produced a cafe website that the speaker found somewhat generic, featuring a 'frosted card' and gradient style. Design benchmarks suggest that OpenAI still lags behind competitors in creative and visual layout tasks. However, the speaker concludes that GPT 5.4 will become their 'daily driver' due to its superior coding and reasoning capabilities. The video ends with an invitation for viewers to share their preferred models in the comments.

Community Posts

View all posts