Trustmark Certified — Open Standard for AI Agent Trust (v0.9)

The two axes

Security. Capability. Both always visible.

Security Score

B+

Is it safe?

Secure 40%

Private 25%

Auditable 20%

Compliant 15%

Procurement asks this question first.

Capability Score

A−

Is it good?

Capable 35%

Efficient 25%

Reliable 25%

Usable 15%

Operations asks this question second.

The logic

Why two axes, not one

A single aggregate trust score is seductive but misleading. An agent can be architecturally secure and still be unable to complete the task it was purchased to do. An agent can perform its task extremely well and still leak sensitive data via prompt injection on every deployment. Collapsing these into one number hides both problems.

“A secure agent that can’t complete a task is useless. A capable agent you can’t trust is dangerous. Buyers need both numbers.”

Procurement and legal ask the security question first: does this agent meet our infosec requirements, can we deploy it under our compliance posture, have we seen its test results. Operations and the business unit ask the capability question second: does it work, how fast, how often does it fail, can our team use it without a PhD in prompting.

Trustmark keeps both axes visible, never combined. The public leaderboard shows Security grade and Capability grade side by side. Certified vendors publish both. A vendor can earn an A on one axis and a C on the other — that information is the point.

The data

The market for this standard

65%

of enterprises had at least one AI agent security incident in the past 12 months.

CSA / Token Security, April 2026

97%

of security leaders expect a material AI-agent incident within the next 12 months.

Arkose Labs / VentureBeat, 2026

21%

Only 21% of enterprises have runtime visibility into what their agents are doing.

Gravitee / VentureBeat, 2026

340%

Year-over-year growth in agent-involved breaches from 2024 to 2025.

Digital Applied, 2026

Every number above represents a procurement decision made without a trust score.

The source frameworks

We aggregate. We don’t invent.

We don’t invent a framework. We aggregate how agents score against the frameworks that already exist.

Framework	Publisher	What it covers	Trustmark axis
OWASP LLM Top 10	OWASP	Ten ranked LLM application risks including prompt injection, supply chain, and excessive agency	Security
OWASP Agentic AI Top 10	OWASP	Agent-specific risks: goal hijack, tool misuse, memory poisoning, rogue agents	Security
NIST AI RMF 1.0	NIST	Risk management lifecycle: Govern, Map, Measure, Manage	Security · Capability
ISO/IEC 42001:2023	ISO	AI management system governance: risk assessment, lifecycle management, supplier oversight	Security · Capability
SOC 2 Type II	AICPA	Security, availability, confidentiality, processing integrity, and privacy trust service criteria	Security
MITRE ATLAS v5.4	MITRE	16 tactics, 84 techniques for AI/ML threat modeling including agent-specific attacks	Security
EU AI Act	European Parliament	Risk-tiered compliance obligations: logging, human oversight, transparency, conformity assessment	Security
Anthropic MCP Security Spec	Anthropic	OAuth 2.1, PKCE, resource indicators, token scoping for agent tool connections	Security · Capability
CSA MAESTRO	Cloud Security Alliance	Multi-agent system threat taxonomy: trust hierarchies, agent supply chain, runtime governance	Security
NYDFS Cybersecurity Regulation	NYDFS	Third-party service provider security requirements, penetration testing, audit trails	Security
FINRA 2026 AI Oversight	FINRA	AI agent governance for financial services: access monitoring, HITL procedures, action logging	Security · Capability

The scoring rubric

The two axes in detail

Each axis is a weighted composite of sub-dimensions. The weights reflect where the evidence shows the highest incident frequency and the most consequential failures.

Security Axis

Secure 40%

Authentication architecture, prompt injection defenses, spending controls, sandbox isolation. The primary failure mode in 65% of enterprise incidents.

Private 25%

Data handling, PII exposure controls, data residency, retention limits. What can the agent see, and who else can see what it processed.

Auditable 20%

Tamper-evident logging, full action trace to a human identity, SOC 2 Trust Services Criteria coverage, OpenTelemetry compliance.

Compliant 15%

ISO 42001 certification status, EU AI Act obligations met, NYDFS / FINRA sector requirements, published model card or AIBOM.

Capability Axis

Capable 35%

Task completion rate across the agent’s stated use cases. Benchmark performance on representative production scenarios.

Efficient 25%

Latency, token cost per task completion, resource consumption. Capability at scale, not just in demonstration conditions.

Reliable 25%

Uptime, error rate, consistency of outputs across equivalent inputs. What operations asks after procurement signs off.

Usable 15%

MCP/A2A discoverability, API documentation quality, integration friction for the agents and systems that will orchestrate this one.

Letter grades

Scores map to letter grades on each axis

Each axis produces an independent letter grade. Both grades are always published. The underlying 0–100 score is the authoritative number for tracking and comparison.

Grade	Score range	What it means
A+	95 – 100	Category-leading. Meets or exceeds every applicable framework control.
A	85 – 94	Strong posture. Minor gaps in one or two controls.
A−	80 – 84	Above average. Framework coverage is solid; operational evidence is thinner.
B	70 – 79	Adequate. Notable gaps in 2–3 controls. Procurement can proceed with documented exceptions.
C	55 – 69	Below average. Significant gaps. Requires remediation plan before deployment in sensitive contexts.
D	40 – 54	Poor posture. Structural deficiencies that cannot be resolved by configuration alone.
F	0 – 39	Critical failures present. Active risk to deploying organization.

Grading scope

Public leaderboard. Paid certification.

Public Leaderboard — Free

Top 10 public agents, graded quarterly

The ten most widely deployed AI agents in the market receive Trustmark grades every quarter at no cost and without their cooperation. Grades are published regardless of outcome. Vendors cannot suppress a public grade.

ChatGPT
Claude
Gemini
Copilot
Perplexity
SAP Joule
Coupa Navi
Jaggaer JAI
Workday AI
Oracle AI

Certified — Paid

Any B2B SaaS agent can request grading

Paid certification is available to any B2B SaaS agent that wants a published Trustmark grade. Grades are published only with the vendor’s permission. Certified vendors receive a detailed rubric report, a grade badge, and a dispute mechanism via the public GitHub RFC process.

Pricing published at the time of v1.0 launch, June 2026.

Free for the agents everyone uses. Paid for proving yours is as safe as the best.

Reproducibility

Open corpus. Frozen versions. Auditable by anyone.

The Trustmark scoring rubric, the test corpus, and the reference grading library are all published under CC BY 4.0. Any party with the same corpus version and the same rubric version should arrive at the same Trustmark grade independently. If they don’t, that is a public dispute filed against the methodology, not a private appeal.

Every Trustmark report carries its corpus version and rubric version. When either changes materially, the version number bumps. Prior-version grades are not retroactively modified. The reference Python library trustmark-score is MIT-licensed and available on PyPI. Prompt injection test fixtures are versioned in the same repository as the rubric.

This is the same governance model as EVI. The spec editor is Elephant Accountability LLC. The governance document is in the repository. Disputes go through the public GitHub RFC process.

How Elephant uses Trustmark

Integrated into everything we audit

Vendor Audit — Trustmark Sub-Score

Every vendor we assess for a procurement audit receives a Trustmark Security sub-score as part of the deliverable. Procurement teams get a documented, reproducible grade alongside the audit narrative.

Included
Directory — Verified Tier

Vendors in the Elephant Agent-Ready Directory at the Verified tier display their current Trustmark status. Buyers can filter the directory by Trustmark Security grade.

Included
$499 llms.txt Review — Mini Security Check

The $499 llms.txt Review now includes a mini Trustmark Security check: prompt injection surface assessment, authentication model review, and a preliminary Security axis score.

$499

See the leaderboard Get your agent certified →

Two standards, one governance model

Trustmark and EVI are sister standards

EVI — Elephant Visibility Index

Does the agent find your business?

EVI measures how often and how prominently AI assistants recommend a vendor when a buyer asks a category question. It is a discoverability score. The question it answers: do AI agents know you exist.

Trustmark Certified

Can you trust the agent you deploy?

Trustmark measures whether the agent a business deploys is safe and capable. It is a trust score. The question it answers: should you be deploying this agent in a production context with real data and real spend authority.

EVI and Trustmark share the same governance model, the same open methodology, and the same CC BY 4.0 license. Neither is proprietary. Both are editor-maintained by Elephant Accountability LLC with public dispute resolution. No overlap in what they measure. Complementary instruments. See the full methodology →

FAQ

Six questions about Trustmark

Who owns Trustmark?

Elephant Accountability LLC is the current editor and maintainer of the Trustmark specification. The content of the spec is published under Creative Commons BY 4.0. Anyone can implement, fork, or build services on top of Trustmark with attribution. The reference scoring library is MIT-licensed. Neither the spec nor the library is proprietary.

Can my agency use Trustmark in our own reports?

Yes, that is the point. Attribution requirement: “Scored per Trustmark v0.9 methodology, eaccountability.org/trustmark.” No license fee. No approval required. If you make improvements to the rubric, submit them as a pull request to the spec repository. Substantive methodology changes require the RFC process.

How is Trustmark different from SOC 2?

SOC 2 audits organizations — their controls, their policies, their operational processes. A company can hold a SOC 2 Type II report while its AI agents have no spending controls, no prompt injection defenses, and no per-agent identity. Trustmark scores the agent itself. The two are complementary: a SOC 2 report covers the organization; a Trustmark grade covers the specific agent being deployed.

What if a vendor disagrees with their grade?

Disputes are filed through the public GitHub RFC process. Evidence-based: the vendor must provide documentation showing that the rubric item in question was incorrectly assessed. Disputes and their resolutions are public. Grades can be revised if the evidence supports it, and the revision reason is published alongside the updated grade.

How often does the public leaderboard update?

Quarterly re-verification for all ten agents on the public leaderboard. If a significant new vulnerability, CVE, or framework update warrants an interim re-grade, we do so and note the reason. Certified vendors can request re-grading after material changes to their agent architecture.

What happens after the RFC period?

The public comment period runs through 2026-06-22. We incorporate substantive feedback, resolve open issues in the spec repository, and lock v1.0 on June 22, 2026. That version persists indefinitely. Future improvements require v1.1 or v2.0 and a new corpus run. Prior-version grades remain published and labeled with their version number.

Governance and license

Open. Versioned. Attributed.

The Trustmark specification is published under Creative Commons BY 4.0. The reference Python library is MIT-licensed. Elephant Accountability LLC acts as editor, not as sole authority. The governance model is documented in the spec repository.

Spec repository (GitHub) Python library (GitHub) Governance document CC BY 4.0 license

Two numbers every agent deployment needs.

Security grade. Capability grade. Both public. Both reproducible. The public leaderboard is free. Certification for your own agent is paid.

See the public leaderboard Get your agent certified →

Trustmark Certified. An open standard for scoring AI agent trust.