How well does it work for agents.
The score maps to a single letter grade you can share, embed, or watch over time. Bands are deliberately calibrated to match Lighthouse and SSL Labs — A is meant to be hard to earn:
-
A (85–100) — Agent-ready. Valid manifest, strong discovery, broad capability coverage.
-
B (70–84) — Solid. Minor gaps or one weak category, agents can still transact.
-
C (50–69) — Partial. Manifest works but missing capabilities or surface signals.
-
D (30–49) — Weak. Manifest reachable but invalid or near-empty.
-
F (0–29) — Failing. Blocked, unreachable, or no manifest detected.
Every score breaks down into three weighted categories so you can see exactly where the points come from:
-
Agent Discovery (30%) — Can agents find and reach you? HTTPS, reachability, agent-friendly
robots.txt, plus the surface signals that keep you in the conversation: /llms.txt, sitemap.xml, Open Graph tags, Organization JSON-LD, mobile viewport meta.
-
UCP Conformance (40%) — Does the manifest validate against the spec? Validity is ×ばつ weighted in this category — an invalid manifest cannot score above ~50 here, regardless of how good the surface polish is.
-
Capability Coverage (30%) — What can an agent actually do at your store? Declared transports (REST/MCP/A2A), checkout, payment handlers, and breadth of capabilities. When functional probes run, declared transport endpoints that don't actually respond drag this score down.
The composite is a straight weighted average: Discovery ×ばつ 0.30 + Conformance ×ばつ 0.40 + Capabilities ×ばつ 0.30. No tricks, no hidden weights. The full ruleset is documented in our methodology.
What you actually get
Every score URL is a live page at /score/{your-domain}, indexed and shareable. Open one and you don't just see a number:
-
Top priorities — The three highest-impact issues we found, ranked by impact ×ばつ effort. Start here.
-
Impact vs Effort matrix — Quick Wins / Strategic / Incremental / Consider Later quadrants so you can plan a sprint instead of staring at a wall of warnings.
-
Recommendations with copy-paste fixes — Every flagged issue surfaces a snippet you can drop straight into your manifest,
robots.txt, sitemap, or HTML <head>. Hit "Show fix", copy, paste, redeploy, re-check.
-
Platform-aware percentile — "You're at p72 latency vs the median Shopify store." Because comparing your latency against the whole directory is meaningless when half of it runs on a fundamentally different infrastructure profile.
-
Full check breakdown — Every signal we evaluate, grouped by category, with a "why it matters" paragraph alongside each check. No black boxes.
-
Save this report — We re-run the full check weekly and email you only when something material changes. Score drops, capability regresses, status flips. Free, no marketing, unsubscribe anytime.
The page is ungated. No signup, no paywall, no "create an account to see the breakdown." We're indexing every score — just like SSL Labs grades and PageSpeed scores. Public scores create a baseline and pressure for the ecosystem to improve, in the same way SSL grades did for HTTPS adoption.
Why we built it
The honest answer: verified-or-not is the wrong question now.
When the UCP spec first landed in January (v2026-01-11), finding a verified store at all was novel. The bar was "did anyone publish a manifest." The status page was the right product for that moment, and it still is for the discovery layer.
The directory has 4,500+ verified domains today. Verified isn't novel. The interesting question shifted to "how well does this thing actually work for agents," and nobody had a good answer to that — including us.
When we ran a deeper analysis for our April State of Agentic Commerce report, the gap was stark: out of 4,014 verified UCP stores, only 9 delivered a flawless end-to-end agent experience. A 0.2% flawless rate. The other 99.8% had a manifest published — they just didn't actually work as well as that manifest suggested. That gap between "verified" and "actually works" is the central infrastructure problem in agentic commerce today. The UCP Score makes that gap visible, measurable, and addressable.
There's a clear analogue: PageSpeed before Lighthouse. Pre-Lighthouse, web performance optimisation was vibes. People knew slow sites were bad and fast sites were good but couldn't quantify "how slow" or "compared to what." Lighthouse gave them three things — a graded score, a category breakdown, and copy-paste optimisations — and the field changed overnight. Nobody ships a serious site today without checking their Lighthouse score first.
The agentic commerce ecosystem is at exactly that pre-Lighthouse moment. There's no shared yardstick for agent-readiness. Stores have no way to tell whether the integration they shipped last month is competitive. Platform teams have no way to back up "our merchants are more agent-ready" with a number. AI agent builders have no way to filter "show me the stores most likely to actually complete a transaction."
The UCP Score is meant to be that yardstick. Lighthouse for agentic commerce.
How we built it (the short version)
Three signal sources, one composite:
-
Static analysis — The same manifest validator that powers
/check and /ucp-validator. Validity, version format, signing keys, payment handlers — every spec rule turned into a check row.
-
Surface signals — Five public files and meta tags fetched in parallel:
/llms.txt, /sitemap.xml, Open Graph, Organization JSON-LD, viewport. Presence + content captured (with a content hash for change detection on llms.txt so we can spot when a brand updates their LLM brief).
-
Functional probes (opt-in) — Two probe families. Transport probes hit each declared transport endpoint with a benign request (MCP gets a
tools/list, REST/A2A get a GET). URL resolution probes fetch every spec and schema URL declared in the manifest. Probes only run on user-triggered checks — not on the 24h cron sweep, because hammering 4,500 merchants daily with a dozen extra HTTP requests each isn't neighbourly.
Each signal feeds one category sub-score (0–100), and the composite is the weighted average. Recommendations join error codes against a fix library so every flagged issue surfaces a copy-paste snippet — the same pattern Lighthouse uses for its audit list. The whole pipeline runs on the same 24h cycle as the rest of the directory; checks you trigger manually run the full probe stack.
If you want the deep version, the methodology page walks through every category, every check, every grade band, and the "what we don't score" list.
What you can do with it
A few workflows the score unlocks immediately:
-
Pre-merge gate — Add a check in your CI that fails the build if your
/score/{domain} drops below B. Same pattern as Lighthouse CI. The score URL is stable and the JSON breakdown lands in the API soon.
-
Platform comparison — The
/platforms page now shows average UCP Score by platform — Shopify vs WooCommerce vs BigCommerce vs Magento at a glance. Useful both for picking a stack and for benchmarking the one you're on.
-
Leaderboard — The leaderboard is now ranked by UCP Score with sortable columns for each sub-score. Filter by platform to see the top stores on your stack.
-
Monitoring — Save any report against your email. We re-run it weekly and alert you on regressions. Score drops, capability disappears, status flips — one email, free, no marketing.
-
Competitive benchmarking — Run Allbirds vs Casper and see grades side by side. The compare page picks up score data automatically.
What's next
This is v1. A few things already on the roadmap:
-
Score history & sparkline — Save a report and you'll see your score trend over time. We're tracking every check in our history table from day one, so the data exists; the visual lands shortly.
-
Score API —
GET /api/v1/score/{domain} returning the full breakdown as JSON. The data feed is already public; the score endpoint is the same data behind a stable contract.
-
Spec-version-aware scoring weights — As new UCP spec versions land with new emphasis, scoring rules for each version live in config and absorb cleanly. Already version-aware for validation; widening to scoring weights too.
We've also taken pains to make the system absorb future spec releases without a rewrite. Static check copy lives in config, not hardcoded; new error codes plug into the recommendations engine via a single config entry. The next spec drop should land as a configuration change, not a refactor.
About UCP Checker
UCP Checker is the independent validation and monitoring layer for the Universal Commerce Protocol. We crawl, validate, and grade every public UCP manifest in the open web, run the public merchant directory, publish the leaderboard and adoption stats, and ship developer tools — the validator, the bulk checker, the browser extension, and now the UCP Score. Everything is free, indexed, and ungated; the dataset is published openly under CC-BY 4.0. Think of us as the SSL Labs of agentic commerce — the third-party scoreboard the ecosystem can build trust on top of.
Try it
Pick any domain. Type it into ucpchecker.com/score and you'll have a graded report in under a second. If you find a score that surprised you — yours or a competitor's — let us know. The interesting score gaps are the ones nobody's looked at yet.