Sites are now expected duplicate effort by manually defining schemas for the same actions — like re-describing a button's purpose in JSON when it's already semantically marked up?
For example, web accessibility has potential as a starting point for making actions automatable, with the advantage that the automatable things are visible to humans, so are less likely to drift / break over time.
Any work happening in that space?
I think that the github repo's README may be more useful: https://github.com/webmachinelearning/webmcp?tab=readme-ov-f...
Also, the prior implementations may be useful to look at: https://github.com/MiguelsPizza/WebMCP and https://github.com/jasonjmcghee/WebMCP
But no MCP server today has tools that appear on page load, change with every SPA route, and die when you close the tab. Client support for this would have to be tightly coupled to whatever is controlling the browser.
What they really built is a browser-native tool API borrowing MCP's shape. If calling it "MCP" is what gets web developers to start exposing structured tools for agents, I'll take it.
Then came mobile phones with their small screens and touch control which forced the web to adapt: responsive design.
Now it’s the turn of agents that need to see and interact with websites.
Sure you could keep on feeding them html/js and have them write logic to interact with the page, just like you can open a website in desktop mode and still navigate it: but it’s clunky.
Don’t stop at the name “MCP” that is debased: it’s much bigger than that
The next one would be to also decouple the visual part of a website from the data/interactions: Let the users tell their in-browser agent how to render - or even offer different views on the same data. (And possibly also WHAT to render: So your LLM could work as an in-website adblocker for example; Similar to browser extensions such as a LinkedIn/Facebook feed blocker)
I really like the way you can expose your schema through adding fields to a web form, that feels like a really nice extension and a great way to piggyback on your existing logic.
To me this seems much more promising than either needing an MCP server or the MCP Apps proposal.
The browser has tons of functionality baked in, everything from web workers to persistence.
This would also allow for interesting ways of authenticating/manipulating data from existing sites. Say I'm logged into image-website-x. I can then use the WebMCP to allow agents to interact with the images I've stored there. The WebMCP becomes a much more intuitive way than interpreting the DOM elements
People should be mindful of using magic that has no protection of their data and then discover it's too late.
That's not a gap in the technology, it's just early.
This is true excitement. I am not being ironic.
HN Thread Link: https://news.ycombinator.com/item?id=47037501
Quick summary of my reply:
- Your 70+ MCP tools show exactly what WebMCP aims to solve
- Key insight: MCP for APIs vs MCP for consumer apps are different
- WebMCP makes sense for complex sites (Amazon, Booking.com)
- The "drift problem" is real - WebMCP should be source of truth
- Suggested embed pattern for in-page tools