Methodology — Elephant Accountability

The formula

EVI is a single 0–100 score composed from three weighted axes. The formula is fixed; the weights do not change between scans.

EVI v0.9 EVI = 0.40 × Coverage + 0.30 × Prominence + 0.30 × Consistency

Each axis is scored on a deterministic 0–5 rubric with named anchors. Final EVI is the weighted sum, rescaled to 0–100. There is no LLM-in-the-loop scoring at any stage; two independent scanners running the same rubric on the same surface set produce the same axis scores.

Why deterministic. A standard that grades vendors must be reproducible. LLM-graded scores drift between runs and between models, which makes them unsuitable as the basis for a public certification. EVI v0.9 is reproducible by construction.

The three axes

Coverage 0.40

Prominence 0.30

Consistency 0.30

Coverage 0.40 weight

Measures the presence of the vendor across machine-discoverable surfaces. The buyer's agent has to be able to find the vendor before anything else matters.

Surfaces measured:

llms.txt at site root
Schema.org Organization + Product blocks on canonical pages
.well-known/agent.json
MCP server published at a stable URL
Structured pricing (machine-readable; not marketing copy)
A2A agent card
UCP merchant metadata
EUC merchant metadata
Procurement directory listings

Prominence 0.30 weight

Measures ranking and citation density within agent-mediated buying conversations across the LLM panel. Being present is necessary but not sufficient — the vendor also has to surface when an agent is helping a buyer narrow a shortlist.

How it is measured: a fixed panel of buying-intent prompts is run weekly across ChatGPT, Claude, Perplexity, Gemini, and Grok. The vendor's rank within answers and the citation density across the panel are scored against the rubric. The prompt panel is published; results are published at transparency.

Consistency 0.30 weight

Measures alignment across surfaces of the canonical entity name, tagline, pricing, contact, and methodology references. An agent that gets a different answer from llms.txt than from the MCP server cannot trust either.

What is checked:

Legal entity name matches across surfaces
Tagline / one-line description matches
Pricing surface points to a single canonical location
Contact email and methodology URL match
Methodology version reference matches

How we score

Each axis is scored 0–5 against named anchors. Two scanners running the same rubric on the same surface set must produce the same axis score. Below is the full Coverage rubric as an example; the Prominence and Consistency rubrics follow the same structure and are published in evi-python.

Coverage rubric (0–5)

Score	Anchor	What is present
0	None	No machine-discoverable surfaces present.
1	Minimal	`llms.txt` only.
2	Partial	`llms.txt` + Schema.org Org block, but Product block missing or stale.
3	Bronze tier complete	`llms.txt` + Schema.org Org+Product complete and consistent.
4	Silver tier complete	Bronze + MCP server + `.well-known/agent.json` + structured pricing.
5	Gold tier complete	Silver + A2A agent card + UCP merchant + agent checkout + procurement directories.

Prominence and Consistency rubrics are documented in the reference implementation. See evi-python.

Two-axis customer framing

The bureau reports two numbers to a buyer:

AI Discoverability Index (0–100) — the headline metric. The full EVI score, presented to vendors as a single number. Most B2B SaaS today score between 30 and 70.
Agent Commerce Cert (Bronze / Silver / Gold) — a forward-readiness tier indicating which surface set the vendor has fully shipped. A vendor can hold a tier without yet hitting a particular score threshold; the tier reflects coverage, the index reflects all three axes.

Agent Commerce Cert tiers

Tier	Surfaces required
Bronze	`llms.txt` + Schema.org Org + Schema.org Product
Silver	Bronze + MCP server + `.well-known/agent.json` + structured pricing
Gold	Silver + A2A agent card + UCP merchant + agent checkout + procurement directory listings

Trustmark Certified v0.9 — separate standard

Trustmark Certified v0.9 is a distinct standard the bureau publishes for AI agents themselves, not for vendors. It scores agents on two axes, Security and Capability, mapped to OWASP, NIST AI RMF, ISO 42001, SOC 2, and MITRE ATLAS controls. The reference implementation is trustmark-python, MIT-licensed.

Trustmark and EVI are distinct standards. Trustmark scores agents; EVI scores vendor discoverability. Trustmark's Bronze/Silver/Gold designations apply to agents and are not the same as EVI score thresholds. Earlier versions of this page conflated the two; that conflation has been removed.

Reference implementations

`evi-python`

Reference scanner that produces a deterministic EVI v0.9 score from a domain. Implements the full 0–5 rubric for each axis. MIT-licensed.

github.com/elephant-accountability/evi-python →

`trustmark-python`

Reference scanner for Trustmark Certified v0.9. Maps agent capabilities and security posture against OWASP, NIST AI RMF, ISO 42001, SOC 2, and MITRE ATLAS. MIT-licensed.

github.com/elephant-accountability/trustmark-python →

Versioning

EVI v0.9 is the canonical version. The constant METHODOLOGY_VERSION = '0.9' in the orchestrator codebase is the source of truth. Any score, scanner output, or report that references a different version is out of sync with this page and should be regenerated.

Version	Status	Notes
v0.9	Canonical	Three-axis, deterministic 0–5 rubric, no LLM scoring. Published under CC-BY-4.0.

Watch the spec on GitHub →

Score your website

See how your site scores against EVI v0.9 across all three axes.

Get free scan

EVI v0.9 — The Elephant Visibility Index