00:00:00There's a new proposal backed by Google and Microsoft that could be shaping the future
00:00:03of how we use the web, and I kinda like it. It's called WebMCP, but don't get that confused
00:00:08for a normal MCP server. Instead, WebMCP is actually a browser API and it will let front
00:00:13end developers expose features of their sites as tools to AI agents, essentially letting
00:00:18every site become a mini MCP server. And while you may have already seen some sites start
00:00:23off with their own MCP servers already, this is a little bit different. Its goal is to actually
00:00:27let your agents use the website for you instead of just accessing your APIs and showing that
00:00:32in a chat. It will be entirely front end based. Now, if that distinction does sound a little
00:00:37confusing, let's just jump in and see a demo and talk about why I like it.
00:00:46Now the first thing I want to admit is this demo isn't going to look too exciting, but
00:00:49that's kinda the point of WebMCP. It's taking something that's already possible, but just
00:00:54making it way better. So stick with me on this. What I have here is I have the Canary version
00:00:58of Chrome that they're testing this proposal in, and also a site that's been set up with
00:01:02some WebMCP tools. You can see on the right I have an extension which is able to interact
00:01:06with these WebMCP tools, but imagine in the future this would just be your normal browser
00:01:10built in AI, whether that's Gemini, chatgbt atlas or whatever ARC has now turned into.
00:01:15You can see if I want to send a user prompt while I'm on this site here, saying I want
00:01:19to book a round trip flight for two people from London to New York on specific dates and
00:01:23I hit send, it is going to take me to the search result page, so it's used the website
00:01:28for me. Wow, crazy stuff right? Yeah, as I said this demo was going to look very basic,
00:01:33but the key thing about WebMCP is how it used that site for me. The current approach to AI
00:01:38using websites tends to be using tools like Playwright, HTML passing or even taking screenshots
00:01:42of your site and trying to use it as a human. But all of that is pretty inefficient, especially
00:01:48token wise, and it's still prone to a lot of errors. So this is what WebMCP is here to
00:01:53fix. WebMCP instead lets the developer of the website expose certain MCP tools that then
00:01:58interact with the client side JavaScript. So that's all that's happening when an AI chooses
00:02:03to use one of these WebMCP tools. It's simply running a JavaScript function on your site
00:02:07that you the developer have set to run. So you can see on the example of this demo flight
00:02:12page I have one WebMCP tool available called search flights and you can see this takes in
00:02:16some input arguments like origin, destination and trip type that matches one to one with
00:02:20the form that we have over here. The crucial bit is the AI now knows that it can use this
00:02:25MCP tool. So when we hit send on a prompt like this, it's not going to fill in the form
00:02:29by doing anything like Playwright or HTML passing. In fact, it doesn't need to know what the website
00:02:34looks like at all or what the HTML looks like either. It simply knows it has that WebMCP
00:02:38tool and it calls it with those input arguments and either developer have set what happens
00:02:43when I take in those input arguments and I run a JavaScript function, which in this case
00:02:47simply updates the react state and that causes a navigation to the search flight page. It
00:02:52would take a look at the front end code for this. It is incredibly simple and hopefully
00:02:55it will start to make a lot more sense. You can see the first thing we need to do is register
00:02:59the WebMCP tools that are available for a given page and we can do that by using window.navigator.model_context.
00:03:04So this is the API that's going to need to be built into the browsers if this proposal
00:03:09passes and it's currently in Chrome Canary so they can test this one out. We can see once
00:03:13we do have our model context API, we can register our tools by simply using the register tool
00:03:18function and in this case I'm registering the search flights tool that we saw being used
00:03:22earlier. If we check out what an actual tool is, you can see it's a very simple object definition.
00:03:26We have a name, we have a description, so this is passed to the AI so it knows when to use
00:03:30this tool and we also have an input schema if we want to take in any arguments. In my
00:03:34case I had things like origin destination to match that form. You can see we also have some
00:03:38more context that we can give to the AI to understand what those arguments should actually
00:03:42be. The important part about a tool definition is the execute function. This is the client
00:03:47side JavaScript that is going to run on your site when this MCP tool is used. So it can
00:03:51basically be anything that you want. In my case I'm using the search flights function
00:03:55and we don't have to worry about this implementation too much but essentially all I'm doing is taking
00:03:59in the parameters the AI has filled in for those input arguments and I'm dispatching an
00:04:03event called search flights with those parameters. Then in my react code all I'm doing is simply
00:04:08adding an event listener for that search flights event and when we have that I'm simply running
00:04:12the function handle search flights and this is where we can essentially do anything that
00:04:15we can in react and in my case I'm taking in the parameters and just setting them as
00:04:19the search parameters which cause the navigation. It really is that simple and that's why I really
00:04:24like this approach as not only is it incredibly token efficient but it also allows me as the
00:04:29developer to define the interactions of the site and the AI can follow my guardrails. It's
00:04:34just a really neat solution to building sites with both a human and an AI assistant in mind
00:04:39instead of the current approach which is to build a site for a human and then an MCP server
00:04:43for the AI and if the AI then needs to use the website well you better hope it just figures
00:04:48it out somehow. It's also worth noting that these web MCP tools aren't just useful for
00:04:51causing some event on your page like a navigation or filling in a form but they're also really
00:04:55useful when you need to parse information that's on the page. Say I as the human came in here
00:05:00now and started adjusting some of these filters like I want a price less than $500 and a departure
00:05:05time before mid day. There are still quite a lot of flights on this page so I want AI
00:05:11to help me choose the best one. So I can say what flight would you recommend on this page.
00:05:15Now current approaches would simply use playwright or HTML parsing to actually take in the entire
00:05:20page and try and understand the information here and turn it into some form of structured
00:05:24data but we don't need to do that with web MCP. Instead I as the developer have simply
00:05:29set up a web MCP tool called list flights and this has access to the current react state
00:05:33so it has access to all of the information that's displayed to the user here but in nice
00:05:38JSON format. So this way if I do actually ask the AI for this prompt you can see it calls
00:05:42that tool, lists out all of the flights that are currently showing on this page and it gives
00:05:46us a recommendation here for flight 56. And I can find that flight showing on the page
00:05:51here. That process has used way less tokens and is going to be way more accurate. Now the
00:05:56final thing I want to showcase is how you can actually take advantage of web MCP with no
00:06:00JavaScript. Up until now we've actually been using the imperative API which is where I the
00:06:05developer have written the JavaScript to handle the tool calls and also register specific tools.
00:06:10There's also a second approach called the declarative API. This approach is much simpler as it's
00:06:14meant for the simple use case of filling in HTML forms. So you can see I have a very simple
00:06:19booking reservation one and I can simply ask my AI to book me a table with some of the information
00:06:23that's needed to fill in the form and it will go ahead and actually fill that form in for
00:06:27me. That's because it has access to a web MCP tool called book table. But the important
00:06:32part here is I wrote no JavaScript to actually have access to this web MCP tool. And that's
00:06:36because the way that the declarative API of web MCP works is you simply need to add in
00:06:40a tool name and a tool description attribute onto your HTML form and the browser will then
00:06:44try convert that form into a web MCP tool for you trying to understand what each of the inputs
00:06:49should be for the argument of the MCP tool. And we see that here we have a tool name of
00:06:53book table on that booking form that we saw and a tool description. So the AI knows when
00:06:57to call it and we simply have a normal HTML form. The only other differences in some of
00:07:02the inputs here. We also use the attribute tool param description to give the AI a bit
00:07:06more context on how it should fill in that information. But for the rest of it, the browser
00:07:10is going to pick up the input, the input type, the input name, and use that to create the
00:07:14MCP tool. And we can see that back on our inspector here where it's picked up the input arguments
00:07:18are correctly name, phone, date, time, guests, seating, and requests. And it's done all of
00:07:23that just using simple HTML form logic with me writing zero JavaScript. That's pretty much
00:07:27all there is to the web MCP proposal at the moment. And as I said, I'm pretty positive
00:07:31on this one. I like the way that it bridges the gap between web apps and AI agents, and
00:07:34it removes any of the guesswork when agents are trying to use a site and it makes sure
00:07:38that any interactions are defined explicitly by the websites developers. Plus I'm also not
00:07:43fully AI pill yet. I like it when there's a tool that helps an AI agent work alongside
00:07:47me instead of replacing me. I don't like the idea of booking my flights or restaurants in
00:07:51chat GPTs interface. And I much prefer going to the actual website myself in a browser.
00:07:56And if I want to, I can have the AI help me out on that page. It's a much better system
00:08:00at keeping a human in the loop and also allowing the website developers to define how that experience
00:08:05goes. But it's also worth remembering that this is just a proposal at the moment. So it
00:08:08might take some time to appear in the browsers. And there's also still some limitations that
00:08:12you need to deal with. Like the classic one of security, there could be poison tools and
00:08:16descriptions on certain websites. So how much access it's given to user information and
00:08:21how much control will the browser AI have over the entire browser. So if one of these poison
00:08:25tools does go out of control, how much damage can it do? Hopefully they find an answer for
00:08:29that as I'm pretty positive on this proposal. Let me know what you think in the comments
00:08:33down below while you're there. Subscribe. And as always, see you in the next one.