The formula

EVI is a single 0–100 score composed from three weighted axes. The formula is fixed; the weights do not change between scans.

EVI v0.9 EVI = 0.40 × Coverage + 0.30 × Prominence + 0.30 × Consistency

Each axis is scored on a deterministic 0–5 rubric with named anchors. Final EVI is the weighted sum, rescaled to 0–100. There is no LLM-in-the-loop scoring at any stage; two independent scanners running the same rubric on the same surface set produce the same axis scores.

Why deterministic. A standard that grades vendors must be reproducible. LLM-graded scores drift between runs and between models, which makes them unsuitable as the basis for a public certification. EVI v0.9 is reproducible by construction.

The three axes

Coverage 0.40
Prominence 0.30
Consistency 0.30

Coverage 0.40 weight

Measures the presence of the vendor across machine-discoverable surfaces. The buyer's agent has to be able to find the vendor before anything else matters.

Surfaces measured:

  • llms.txt at site root
  • Schema.org Organization + Product blocks on canonical pages
  • .well-known/agent.json
  • MCP server published at a stable URL
  • Structured pricing (machine-readable; not marketing copy)
  • A2A agent card
  • UCP merchant metadata
  • EUC merchant metadata
  • Procurement directory listings

Prominence 0.30 weight

Measures ranking and citation density within agent-mediated buying conversations across the LLM panel. Being present is necessary but not sufficient — the vendor also has to surface when an agent is helping a buyer narrow a shortlist.

How it is measured: a fixed panel of buying-intent prompts is run weekly across ChatGPT, Claude, Perplexity, Gemini, and Grok. The vendor's rank within answers and the citation density across the panel are scored against the rubric. The prompt panel is published; results are published at transparency.

Consistency 0.30 weight

Measures alignment across surfaces of the canonical entity name, tagline, pricing, contact, and methodology references. An agent that gets a different answer from llms.txt than from the MCP server cannot trust either.

What is checked:

  • Legal entity name matches across surfaces
  • Tagline / one-line description matches
  • Pricing surface points to a single canonical location
  • Contact email and methodology URL match
  • Methodology version reference matches

How we score

Each axis is scored 0–5 against named anchors. Two scanners running the same rubric on the same surface set must produce the same axis score. Below is the full Coverage rubric as an example; the Prominence and Consistency rubrics follow the same structure and are published in evi-python.

Coverage rubric (0–5)

ScoreAnchorWhat is present
0 None No machine-discoverable surfaces present.
1 Minimal llms.txt only.
2 Partial llms.txt + Schema.org Org block, but Product block missing or stale.
3 Bronze tier complete llms.txt + Schema.org Org+Product complete and consistent.
4 Silver tier complete Bronze + MCP server + .well-known/agent.json + structured pricing.
5 Gold tier complete Silver + A2A agent card + UCP merchant + agent checkout + procurement directories.

Prominence and Consistency rubrics are documented in the reference implementation. See evi-python.

Two-axis customer framing

The bureau reports two numbers to a buyer:

  • AI Discoverability Index (0–100) — the headline metric. The full EVI score, presented to vendors as a single number. Most B2B SaaS today score between 30 and 70.
  • Agent Commerce Cert (Bronze / Silver / Gold) — a forward-readiness tier indicating which surface set the vendor has fully shipped. A vendor can hold a tier without yet hitting a particular score threshold; the tier reflects coverage, the index reflects all three axes.

Agent Commerce Cert tiers

Tier Surfaces required
Bronze llms.txt + Schema.org Org + Schema.org Product
Silver Bronze + MCP server + .well-known/agent.json + structured pricing
Gold Silver + A2A agent card + UCP merchant + agent checkout + procurement directory listings

Trustmark Certified v0.9 — separate standard

Trustmark Certified v0.9 is a distinct standard the bureau publishes for AI agents themselves, not for vendors. It scores agents on two axes, Security and Capability, mapped to OWASP, NIST AI RMF, ISO 42001, SOC 2, and MITRE ATLAS controls. The reference implementation is trustmark-python, MIT-licensed.

Trustmark and EVI are distinct standards. Trustmark scores agents; EVI scores vendor discoverability. Trustmark's Bronze/Silver/Gold designations apply to agents and are not the same as EVI score thresholds. Earlier versions of this page conflated the two; that conflation has been removed.

Reference implementations

evi-python

Reference scanner that produces a deterministic EVI v0.9 score from a domain. Implements the full 0–5 rubric for each axis. MIT-licensed.

github.com/elephant-accountability/evi-python →

trustmark-python

Reference scanner for Trustmark Certified v0.9. Maps agent capabilities and security posture against OWASP, NIST AI RMF, ISO 42001, SOC 2, and MITRE ATLAS. MIT-licensed.

github.com/elephant-accountability/trustmark-python →

Versioning

EVI v0.9 is the canonical version. The constant METHODOLOGY_VERSION = '0.9' in the orchestrator codebase is the source of truth. Any score, scanner output, or report that references a different version is out of sync with this page and should be regenerated.

VersionStatusNotes
v0.9 Canonical Three-axis, deterministic 0–5 rubric, no LLM scoring. Published under CC-BY-4.0.

Watch the spec on GitHub →

Score your website

See how your site scores against EVI v0.9 across all three axes.

Get free scan