Google & Microsoft Want To Fix AI Browsing (With WebMCP)

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

Internet TechnologyBusiness NewsComputing/Software

Transcript

00:00:00There's a new proposal backed by Google and Microsoft that could be shaping the future

00:00:03of how we use the web, and I kinda like it. It's called WebMCP, but don't get that confused

00:00:08for a normal MCP server. Instead, WebMCP is actually a browser API and it will let front

00:00:13end developers expose features of their sites as tools to AI agents, essentially letting

00:00:18every site become a mini MCP server. And while you may have already seen some sites start

00:00:23off with their own MCP servers already, this is a little bit different. Its goal is to actually

00:00:27let your agents use the website for you instead of just accessing your APIs and showing that

00:00:32in a chat. It will be entirely front end based. Now, if that distinction does sound a little

00:00:37confusing, let's just jump in and see a demo and talk about why I like it.

00:00:46Now the first thing I want to admit is this demo isn't going to look too exciting, but

00:00:49that's kinda the point of WebMCP. It's taking something that's already possible, but just

00:00:54making it way better. So stick with me on this. What I have here is I have the Canary version

00:00:58of Chrome that they're testing this proposal in, and also a site that's been set up with

00:01:02some WebMCP tools. You can see on the right I have an extension which is able to interact

00:01:06with these WebMCP tools, but imagine in the future this would just be your normal browser

00:01:10built in AI, whether that's Gemini, chatgbt atlas or whatever ARC has now turned into.

00:01:15You can see if I want to send a user prompt while I'm on this site here, saying I want

00:01:19to book a round trip flight for two people from London to New York on specific dates and

00:01:23I hit send, it is going to take me to the search result page, so it's used the website

00:01:28for me. Wow, crazy stuff right? Yeah, as I said this demo was going to look very basic,

00:01:33but the key thing about WebMCP is how it used that site for me. The current approach to AI

00:01:38using websites tends to be using tools like Playwright, HTML passing or even taking screenshots

00:01:42of your site and trying to use it as a human. But all of that is pretty inefficient, especially

00:01:48token wise, and it's still prone to a lot of errors. So this is what WebMCP is here to

00:01:53fix. WebMCP instead lets the developer of the website expose certain MCP tools that then

00:01:58interact with the client side JavaScript. So that's all that's happening when an AI chooses

00:02:03to use one of these WebMCP tools. It's simply running a JavaScript function on your site

00:02:07that you the developer have set to run. So you can see on the example of this demo flight

00:02:12page I have one WebMCP tool available called search flights and you can see this takes in

00:02:16some input arguments like origin, destination and trip type that matches one to one with

00:02:20the form that we have over here. The crucial bit is the AI now knows that it can use this

00:02:25MCP tool. So when we hit send on a prompt like this, it's not going to fill in the form

00:02:29by doing anything like Playwright or HTML passing. In fact, it doesn't need to know what the website

00:02:34looks like at all or what the HTML looks like either. It simply knows it has that WebMCP

00:02:38tool and it calls it with those input arguments and either developer have set what happens

00:02:43when I take in those input arguments and I run a JavaScript function, which in this case

00:02:47simply updates the react state and that causes a navigation to the search flight page. It

00:02:52would take a look at the front end code for this. It is incredibly simple and hopefully

00:02:55it will start to make a lot more sense. You can see the first thing we need to do is register

00:02:59the WebMCP tools that are available for a given page and we can do that by using window.navigator.model_context.

00:03:04So this is the API that's going to need to be built into the browsers if this proposal

00:03:09passes and it's currently in Chrome Canary so they can test this one out. We can see once

00:03:13we do have our model context API, we can register our tools by simply using the register tool

00:03:18function and in this case I'm registering the search flights tool that we saw being used

00:03:22earlier. If we check out what an actual tool is, you can see it's a very simple object definition.

00:03:26We have a name, we have a description, so this is passed to the AI so it knows when to use

00:03:30this tool and we also have an input schema if we want to take in any arguments. In my

00:03:34case I had things like origin destination to match that form. You can see we also have some

00:03:38more context that we can give to the AI to understand what those arguments should actually

00:03:42be. The important part about a tool definition is the execute function. This is the client

00:03:47side JavaScript that is going to run on your site when this MCP tool is used. So it can

00:03:51basically be anything that you want. In my case I'm using the search flights function

00:03:55and we don't have to worry about this implementation too much but essentially all I'm doing is taking

00:03:59in the parameters the AI has filled in for those input arguments and I'm dispatching an

00:04:03event called search flights with those parameters. Then in my react code all I'm doing is simply

00:04:08adding an event listener for that search flights event and when we have that I'm simply running

00:04:12the function handle search flights and this is where we can essentially do anything that

00:04:15we can in react and in my case I'm taking in the parameters and just setting them as

00:04:19the search parameters which cause the navigation. It really is that simple and that's why I really

00:04:24like this approach as not only is it incredibly token efficient but it also allows me as the

00:04:29developer to define the interactions of the site and the AI can follow my guardrails. It's

00:04:34just a really neat solution to building sites with both a human and an AI assistant in mind

00:04:39instead of the current approach which is to build a site for a human and then an MCP server

00:04:43for the AI and if the AI then needs to use the website well you better hope it just figures

00:04:48it out somehow. It's also worth noting that these web MCP tools aren't just useful for

00:04:51causing some event on your page like a navigation or filling in a form but they're also really

00:04:55useful when you need to parse information that's on the page. Say I as the human came in here

00:05:00now and started adjusting some of these filters like I want a price less than $500 and a departure

00:05:05time before mid day. There are still quite a lot of flights on this page so I want AI

00:05:11to help me choose the best one. So I can say what flight would you recommend on this page.

00:05:15Now current approaches would simply use playwright or HTML parsing to actually take in the entire

00:05:20page and try and understand the information here and turn it into some form of structured

00:05:24data but we don't need to do that with web MCP. Instead I as the developer have simply

00:05:29set up a web MCP tool called list flights and this has access to the current react state

00:05:33so it has access to all of the information that's displayed to the user here but in nice

00:05:38JSON format. So this way if I do actually ask the AI for this prompt you can see it calls

00:05:42that tool, lists out all of the flights that are currently showing on this page and it gives

00:05:46us a recommendation here for flight 56. And I can find that flight showing on the page

00:05:51here. That process has used way less tokens and is going to be way more accurate. Now the

00:05:56final thing I want to showcase is how you can actually take advantage of web MCP with no

00:06:00JavaScript. Up until now we've actually been using the imperative API which is where I the

00:06:05developer have written the JavaScript to handle the tool calls and also register specific tools.

00:06:10There's also a second approach called the declarative API. This approach is much simpler as it's

00:06:14meant for the simple use case of filling in HTML forms. So you can see I have a very simple

00:06:19booking reservation one and I can simply ask my AI to book me a table with some of the information

00:06:23that's needed to fill in the form and it will go ahead and actually fill that form in for

00:06:27me. That's because it has access to a web MCP tool called book table. But the important

00:06:32part here is I wrote no JavaScript to actually have access to this web MCP tool. And that's

00:06:36because the way that the declarative API of web MCP works is you simply need to add in

00:06:40a tool name and a tool description attribute onto your HTML form and the browser will then

00:06:44try convert that form into a web MCP tool for you trying to understand what each of the inputs

00:06:49should be for the argument of the MCP tool. And we see that here we have a tool name of

00:06:53book table on that booking form that we saw and a tool description. So the AI knows when

00:06:57to call it and we simply have a normal HTML form. The only other differences in some of

00:07:02the inputs here. We also use the attribute tool param description to give the AI a bit

00:07:06more context on how it should fill in that information. But for the rest of it, the browser

00:07:10is going to pick up the input, the input type, the input name, and use that to create the

00:07:14MCP tool. And we can see that back on our inspector here where it's picked up the input arguments

00:07:18are correctly name, phone, date, time, guests, seating, and requests. And it's done all of

00:07:23that just using simple HTML form logic with me writing zero JavaScript. That's pretty much

00:07:27all there is to the web MCP proposal at the moment. And as I said, I'm pretty positive

00:07:31on this one. I like the way that it bridges the gap between web apps and AI agents, and

00:07:34it removes any of the guesswork when agents are trying to use a site and it makes sure

00:07:38that any interactions are defined explicitly by the websites developers. Plus I'm also not

00:07:43fully AI pill yet. I like it when there's a tool that helps an AI agent work alongside

00:07:47me instead of replacing me. I don't like the idea of booking my flights or restaurants in

00:07:51chat GPTs interface. And I much prefer going to the actual website myself in a browser.

00:07:56And if I want to, I can have the AI help me out on that page. It's a much better system

00:08:00at keeping a human in the loop and also allowing the website developers to define how that experience

00:08:05goes. But it's also worth remembering that this is just a proposal at the moment. So it

00:08:08might take some time to appear in the browsers. And there's also still some limitations that

00:08:12you need to deal with. Like the classic one of security, there could be poison tools and

00:08:16descriptions on certain websites. So how much access it's given to user information and

00:08:21how much control will the browser AI have over the entire browser. So if one of these poison

00:08:25tools does go out of control, how much damage can it do? Hopefully they find an answer for

00:08:29that as I'm pretty positive on this proposal. Let me know what you think in the comments

00:08:33down below while you're there. Subscribe. And as always, see you in the next one.

Key Takeaway

WebMCP is a proposed browser standard from Google and Microsoft that allows developers to explicitly expose website functionalities as efficient, secure tools for AI agents to interact with using either JavaScript or standard HTML attributes.

Highlights

WebMCP is a proposed browser API backed by Google and Microsoft to bridge the gap between AI agents and websites.
The technology moves beyond inefficient methods like HTML parsing or screenshots by letting developers expose specific site features as tools.
It features both an Imperative API for JavaScript-based control and a Declarative API for zero-config HTML form integration.
WebMCP is highly token-efficient because the AI interacts with structured JSON data and predefined functions rather than raw DOM elements.
The proposal emphasizes a "human-in-the-loop

Timeline

Introduction to WebMCP and AI Browsing

The speaker introduces WebMCP, a new browser API proposal supported by industry giants Google and Microsoft. This technology aims to transform how AI agents interact with the web by allowing every website to function as a mini Model Context Protocol (MCP) server. Unlike traditional MCP servers that focus on back-end APIs, WebMCP is entirely front-end based, enabling agents to use the website as a user would. This section establishes the distinction between simple API access and true front-end interaction. The goal is to provide a more integrated experience where AI understands the site's capabilities directly through the browser.

Demonstrating the Imperative API for Flight Booking

A live demo using a Canary version of Chrome shows an AI agent booking a flight on a specialized test site. The speaker explains that while current AI browsing relies on heavy tools like Playwright or error-prone HTML parsing, WebMCP uses simple JavaScript functions. When the user asks to book a flight, the AI identifies a specific 'search flights' tool exposed by the developer and calls it with structured arguments. This process is significantly more efficient in terms of token usage because the AI doesn't need to 'see' the website or parse the HTML structure. By running client-side code directly, the developer maintains control over how the AI interacts with the site's internal state.

Technical Implementation and Code Walkthrough

The video dives into the technical details of the Imperative API using the `window.navigator.model_context` object. Developers register tools by defining a name, a description for the AI's understanding, and an input schema for required arguments. The core of the implementation is the `execute` function, which runs custom JavaScript on the site to trigger actions like React state updates or event dispatching. This section highlights how easy it is for developers to set up guardrails for AI interactions. By defining exactly what a tool does, developers ensure the AI follows the intended logic of the application. This approach bridges the gap between building for humans and building for automated assistants.

Data Parsing and Information Retrieval with WebMCP

WebMCP is shown to be equally useful for extracting information from a page without the need for complex scraping. The speaker demonstrates a 'list flights' tool that allows the AI to access the current React state of the page in a clean JSON format. When a user asks for a flight recommendation, the AI uses this tool to see the filtered results and provides an accurate suggestion based on real-time data. This method avoids the high token costs associated with sending entire HTML documents to a Large Language Model. It ensures that the AI's recommendations are based on the exact same data the human user sees on the screen. The accuracy of the AI improves because it is working with structured data provided directly by the application.

The Declarative API and Zero-JavaScript Integration

The second major part of the proposal is the Declarative API, which requires no JavaScript to implement. Developers can simply add attributes like `tool-name` and `tool-description` directly to standard HTML forms. The browser then automatically converts these forms into MCP tools that the AI can understand and fill out. This section shows a restaurant booking example where the AI identifies form fields like name, phone, and date based on these HTML attributes. It is a powerful 'low-code' solution that allows even simple websites to become AI-ready with minimal effort. The browser handles the mapping of AI arguments to the correct input fields, making the web more accessible to agents.

Strategic Outlook and Security Concerns

In the concluding segment, the speaker expresses a positive outlook on WebMCP's potential to keep a 'human in the loop' during the AI era. He notes that users might prefer interacting with their favorite websites directly rather than being confined to a single AI chat interface. However, the proposal faces challenges, particularly regarding the security of user data and the risk of 'poisoned' tool descriptions. There are unanswered questions about how much control the browser should give an AI and how to prevent malicious sites from exploiting these tools. The speaker encourages viewers to follow the proposal as it moves through the standardization process in browsers like Chrome. He concludes by emphasizing that while it is still a proposal, it represents a significant step toward a more cooperative web environment.

Community Posts

Write about this video