Google Just Solved The Greatest Limitation of AI Agents

AAI LABS
AI/미래기술경제 뉴스컴퓨터/소프트웨어

Transcript

00:00:00AI agents have started to integrate with every part of our lives now.
00:00:03And one of the biggest areas that's happened in is the browser.
00:00:06Every major AI company has realized that the browser is the one tool everyone uses every
00:00:11single day.
00:00:12So why not put AI into that?
00:00:14But the truth is they all suck.
00:00:15And it's not a matter of optimization.
00:00:17There's a fundamental problem that no amount of it is going to fix.
00:00:20But Google in collaboration with Microsoft just released something called WebMCP.
00:00:24And instead of trying to make agents better at using websites, it makes websites better
00:00:29at talking to agents.
00:00:30That's a completely different approach.
00:00:32And what it enables is something we haven't seen before.
00:00:35So this is a simple HTML page running on a local server.
00:00:38Opening the extensions tab, we have the WebMCP extension.
00:00:41Opening it, below the name of this site, we have one tool, BookTable.
00:00:45We connected this WebMCP bridge to Clod code and told it that we had a restaurant booking
00:00:49form open with WebMCP tools available.
00:00:52We gave it the task of booking a table for two with a date, a name and a special request.
00:00:57All of those fields are there in the form.
00:00:59It confirmed the date, used the WebMCP tool that the site provided, filled out the fields
00:01:03and successfully made the reservation.
00:01:06Right now, an agent has two ways to figure out what's on screen.
00:01:09The first way is vision-based.
00:01:11The agent takes a screenshot of the entire page, annotates every element it can see and
00:01:15feeds that image to a model that tries to figure out what to click.
00:01:19The second way is DOM parsing.
00:01:21The agent pulls the raw HTML of the page.
00:01:24And if you've ever opened Inspect Element on any website, you know what that looks like.
00:01:28Thousands of lines of code.
00:01:29The agent reads through all of that and tries to identify the right button.
00:01:33Both of these approaches have the same fundamental problem.
00:01:35They're non-deterministic.
00:01:36The agent is making its best guess every single time.
00:01:39The reason none of this works consistently is because the entire internet was built for
00:01:43human eyes.
00:01:45Every website assumes a person is looking at it.
00:01:47There's no structure for machines.
00:01:48So every agent, no matter how good the model is, is stuck trying to interpret something
00:01:53that was never designed to be interpreted by a machine.
00:01:55With WebMCP, instead of the agent trying to figure out your website, your website registers
00:02:00its available actions as tools.
00:02:01When an agent lands on a page, it doesn't guess.
00:02:04It just reads the available tools and calls them directly.
00:02:07Right now, WebMCP is available for early preview only.
00:02:10As the agentic web evolves, websites also need to evolve with it.
00:02:13And as you already saw, by defining those tools, we give these agents better access to interact
00:02:18with our sites.
00:02:19The demo worked because it was a simple HTML form.
00:02:21But most real websites aren't that simple.
00:02:23So WebMCP actually has two different approaches depending on what you're working with.
00:02:28There are two ways that allow agents to take control of the browser.
00:02:31The declarative API is for simple workflows like the HTML forms you just saw.
00:02:35The imperative API is for full scale web apps with multiple pages and those require some
00:02:40extra implementation that we'll get into further on.
00:02:43As of right now, there's no official documentation, but they have a repository of WebMCP tools
00:02:48in Google Chrome labs with two demos and only one of them is actually hosted.
00:02:52There's a simple flight search demo and an official Marvel context tool inspector extension.
00:02:56After you install that, whatever websites have WebMCP implemented, you'll be able to detect
00:03:01those tools via the extension and you'll be able to do some other cool stuff as well.
00:03:05The input schema for the tools shows up right there.
00:03:07Right now, there's only one tool on this page, the search flights tool.
00:03:10They've given two options to use this.
00:03:12You can either give custom input arguments that the AI model has to fill out or you can
00:03:16set your Gemini API key, give a user prompt in simple English and the page will be controlled
00:03:21according to that.
00:03:22So right now it has these default inputs.
00:03:24We swapped them out and it actually searched for flights and got a bunch of results.
00:03:28I went back and this time the WebMCP travel site had four tools available where three of
00:03:32them are now filters that can be applied.
00:03:35The input arguments for the page had also changed.
00:03:37I added another argument and it gave us a notification that the filter settings were updated.
00:03:41No flights matched those filter settings, but all of them were applied.
00:03:44We switched between Zen browser and Chrome throughout this and that's because while they've
00:03:48released WebMCP as an open protocol that any browser could use, right now it only works
00:03:54on Chrome's Canary version.
00:03:55That's until they release the standard so that everyone can use it.
00:03:58So that's as far as the official tooling goes right now.
00:04:01No documentation, only two demos and it only works on Chrome Canary and you can't use it
00:04:05with Claude code because it's actually intended to be used by browser agents.
00:04:09So we found this custom WebMCP bridge that you can install on your system and it gives
00:04:14you an MCP and an extension as well.
00:04:16This is what allows Claude code to use WebMCP and navigate and use the tools that any website
00:04:22offers.
00:04:23To show how sites actually implement this, we'll start with the simpler approach.
00:04:27In the declarative API, which you saw with the HTML form, all you really have to do is
00:04:31declare three things inside the HTML form, the tool name, tool description and tool para
00:04:36description.
00:04:37You don't need to dive deep into them.
00:04:39You just need to make sure your agent adds them in.
00:04:41We had two guides made reverse engineered from the demos in the WebMCP repo and we gave Claude
00:04:46code access to those.
00:04:47Now during that process, we actually ran into some common problems and had to fix these
00:04:51guides along the way.
00:04:53Both of them are available in AI Labs Pro, which is our community where you get ready
00:04:57to use templates.
00:04:58You can plug directly into your projects for this video and all previous ones.
00:05:01The main teaching is all here in the video, but if you want the actual files, the links
00:05:05in the description.
00:05:06If your agent adds in these declarations, the rest is up to the browser, which reads
00:05:10them from the HTML.
00:05:12The second way was the imperative API for cases where you need more complex interactions and
00:05:17JavaScript execution.
00:05:18We had a Next.js app initialized, gave Claude code the Next.js guide and that was all it
00:05:23needed to implement it.
00:05:24In React apps, it creates a new file in the library folder where it declares all the tools
00:05:29the site needs.
00:05:30These are all the functions and these are their definitions.
00:05:33But since these web apps can become so big and even have potentially more than 100 tools,
00:05:38we get the same problem we get in Claude code where the context just overloads everything
00:05:41and breaks the whole thing.
00:05:43So instead of loading all the tools a website has, it's better to load only the tools a single
00:05:47page has.
00:05:48This concept is called contextual loading.
00:05:50So this is the Next.js app we had Claude code make.
00:05:53It's a fully functional small demo app with the backend implemented.
00:05:57Right now we're on the main homepage and this site only has 3 tools available.
00:06:01I went into the cart page and this time we had 4 tools and the names had also changed.
00:06:05The availability of tools changes based on the page you're on.
00:06:09This is where the registration functions come in.
00:06:11Whenever you land on a page, like the homepage, it runs the register home tools function and
00:06:15when you leave it runs unregister home tools.
00:06:18Based on which tools belong to that page, it just registers and then unregisters them.
00:06:23This is why it doesn't depend on the browser alone in this case, but the code also handles
00:06:27the integration.
00:06:28We're not actually using WebMCP with a browser agent, which is what Google wants and what
00:06:32each browser would implement themselves.
00:06:34We're actually using a bridge that connects Claude code to WebMCP and this is how we control
00:06:39websites.
00:06:40If you want to get more out of Claude code itself, we actually have a video on the 10
00:06:44most updated ways to gain an advantage with it.
00:06:46This bridge is a community project and with the imperative API, it has a problem where
00:06:51tool switching doesn't really work with this MCP server.
00:06:54When I opened the site, we were on the checkout page and initialized the Claude code session
00:06:58there.
00:06:59When we asked it to navigate back to the homepage, it couldn't see the tools available on the
00:07:03homepage.
00:07:04We were on the homepage and I went into the product page and we got an add to cart button.
00:07:08But when it was on the product page, it couldn't really see that button.
00:07:11So we had to manually add an item to the cart to demo this.
00:07:14But when we asked it to complete the checkout, it automatically filled in the details, placed
00:07:18the order and completed the whole shopping flow.
00:07:21So that's one limitation of this MCP, which brings us to another point.
00:07:25WebMCP is open source with major browser vendors and tech companies listed as participants.
00:07:30But right now, the only browser that supports it is Chrome Canary and the intended agent
00:07:34is Gemini, Google's own AI built directly into the browser.
00:07:38If you're a website owner and you implement WebMCP today, the only agent that can use
00:07:42your tools natively is Gemini.
00:07:44Claude code needs a community built bridge that breaks when contextual loading kicks in.
00:07:49Every non Google agent is at a disadvantage.
00:07:51Now could Claude catch up?
00:07:52Sure, they have their own browser extension.
00:07:55And since that's also a browser agent, it could potentially discover these tools the same way
00:07:59Gemini does.
00:08:00But the question is how many people are going to deliberately install a Claude browser extension
00:08:04versus just using the Gemini that's already built into Chrome.
00:08:08Chrome has billions of users, they don't need to install anything.
00:08:11In our opinion, Google isn't locking anyone out.
00:08:13They're just taking advantage of the architecture and the audience they already have.
00:08:17An open standard that works best inside the browser they already own with the agent they
00:08:21already ship.
00:08:22That doesn't mean you shouldn't implement it.
00:08:23The standard itself is genuinely useful and making your site agent accessible is smart
00:08:28regardless of which agent benefits first.
00:08:30There are a few things worth knowing if you implement this.
00:08:33The spec recommends no more than 50 tools per page.
00:08:36This isn't meant to expose your entire application.
00:08:38It's meant for focused, specific actions, the things someone would actually want to do on
00:08:42that page.
00:08:43Tool descriptions also matter more than you'd think.
00:08:46Agents read those descriptions to decide which tool to call.
00:08:49Vague descriptions mean the agent picks the wrong tool or skips it entirely.
00:08:53Write them like you're explaining the action to someone who's never seen your site.
00:08:57And this is still experimental.
00:08:58The API surface will change.
00:09:00Chrome 146 ships in March with broader support.
00:09:03But until then, this is a dev trial.
00:09:05Don't ship it to production yet.
00:09:06If you follow this channel, you know that keeping up with AI requires a strong technical foundation.
00:09:11That is why I love Brilliant.
00:09:13It's an interactive platform with hands on lessons crafted by world class teachers from
00:09:17MIT, Harvard, and Stanford.
00:09:19I highly recommend their clustering and classification and how AI works courses.
00:09:23They teach you to uncover hidden patterns and understand the logic behind large language
00:09:27models interactively.
00:09:28As you can see in the catalog on screen, they offer a massive variety of courses covering
00:09:33everything from foundational math to advanced data science and computer science.
00:09:37Brilliant is also giving our viewers 20% off an annual premium subscription, providing unlimited
00:09:42daily access to everything on the platform.
00:09:44To learn for free on Brilliant for a full 30 days, go to brilliant.org/ailabs, scan the
00:09:50QR code on screen, or click the link in the description.
00:09:53Build a real learning habit today and take your skills to the next level by heading over
00:09:56to Brilliant.
00:09:57That brings us to the end of this video.
00:09:59If you'd like to support the channel and help us keep making videos like this, you can do
00:10:03so by using the super thanks button below.
00:10:06As always, thank you for watching and I'll see you in the next one.

Key Takeaway

WebMCP transforms the internet from a human-visual medium into a structured toolset for AI agents, replacing unreliable screen parsing with deterministic, machine-readable actions.

Highlights

Google and Microsoft's WebMCP (Model Context Protocol) shifts the AI agent paradigm from interpreting human-centric websites to websites providing machine-readable tools.

Current AI agents rely on non-deterministic methods like vision-based screenshots or complex DOM parsing, which often lead to errors and inconsistencies.

WebMCP offers two integration paths: a Declarative API for simple HTML forms and an Imperative API for complex, multi-page web applications.

The protocol introduces 'contextual loading,' allowing websites to register or unregister specific tools based on the active page to prevent LLM context window overload.

While WebMCP is open-source, it is currently optimized for Google Chrome Canary and the Gemini AI agent, potentially creating a first-mover advantage for Google.

Best practices for implementation include limiting tools to 50 per page and writing precise tool descriptions for better AI interpretation.

Timeline

The Fundamental Flaw of Current AI Browser Agents

The speaker explains that while AI agents are increasingly integrated into browsers, their performance is currently hindered by a foundational design problem. Existing agents attempt to navigate the web using vision-based screenshots or by parsing thousands of lines of raw HTML code, both of which are non-deterministic and prone to error. This occurs because the internet was built specifically for human eyes rather than machine interpretation. Google and Microsoft have addressed this by introducing WebMCP, a bridge that allows websites to talk directly to agents. The section concludes with a demonstration of an AI successfully booking a restaurant table using these new machine-readable tools.

Understanding WebMCP vs. Traditional Methods

This segment details why traditional AI agent methods like vision-based annotation and DOM parsing fail to provide consistent results. The speaker emphasizes that these methods force models to make 'best guesses' on every interaction because websites lack a machine-focused structure. WebMCP changes this by allowing a website to register its actions as specific tools that an agent can call directly without guessing. The protocol distinguishes between the Declarative API for simple workflows and the Imperative API for full-scale web applications. This shift represents a move toward an 'agentic web' where sites evolve to accommodate AI visitors.

Early Preview and Technical Limitations

The speaker highlights the current experimental state of WebMCP, noting a lack of official documentation and a limited number of live demos. Currently, the protocol is primarily supported on Chrome Canary and is intended for use with Google's Gemini agent rather than third-party tools like Claude. To bypass these limitations, the presenters utilize a custom community-built bridge to connect Claude Code to the WebMCP extension. This allows them to demonstrate how the tool inspector can reveal input schemas and flight search tools on a travel site. The section underscores that while the potential is vast, the technology is still in a developer trial phase.

Implementing Declarative and Imperative APIs

In this technical breakdown, the speaker demonstrates how to implement WebMCP using both the Declarative and Imperative APIs. The Declarative API is simple, requiring only three specific tags within an HTML form: name, description, and parameter description. For complex React or Next.js applications, the Imperative API is used to define functions that agents can execute. A critical concept introduced here is 'contextual loading,' which prevents context window overload by only registering tools relevant to the current page. The speaker shows how tools for a homepage are unregistered when a user navigates to a checkout page to maintain efficiency.

Strategic Implications and Market Advantage

The analysis shifts to the competitive landscape of AI, noting that while WebMCP is an open standard, Google holds a significant advantage. Because Google owns the Chrome browser and the Gemini agent, they can provide a seamless, built-in experience that billions of users can access without extra installations. Non-Google agents like Claude currently require community bridges that may break during complex tasks like tool switching or contextual loading. The speaker argues that Google isn't necessarily locking others out, but is strategically leveraging its massive user base and architecture. Ultimately, making a site agent-accessible is presented as a smart move for web owners regardless of the dominant agent.

Best Practices and Future Outlook

The video concludes with practical advice for developers looking to experiment with WebMCP before its broader release. It is recommended to limit tool counts to 50 per page and prioritize clear, descriptive text so AI models can accurately identify tool functions. The speaker warns that the API surface will likely change and suggests waiting until Chrome version 146 in March for more stable support. A brief sponsorship message for Brilliant is included, emphasizing the importance of a strong technical foundation in math and computer science for understanding AI logic. The segment ends by reminding viewers that WebMCP is a work in progress and should not yet be deployed to production environments.

Community Posts

View all posts