Open Standard · v0.9 RFC · April 2026
Trustmark Certified.
An open standard for scoring AI agent trust.
Trustmark is a Rotten-Tomatoes-style aggregate score for AI agents. It rolls up how an agent scores against existing frameworks — OWASP, NIST, ISO 42001, SOC 2, EU AI Act, MITRE ATLAS — rather than inventing new criteria. Two axes are always visible: Security (is it safe?) and Capability (is it good?).
Creative Commons BY 4.0. Public comment period open until 2026-06-22. v1.0 locks June 22, 2026.
The two axes
Security. Capability. Both always visible.
Security Score
B+
Is it safe?
Procurement asks this question first.
Capability Score
A−
Is it good?
Operations asks this question second.
The logic
Why two axes, not one
A single aggregate trust score is seductive but misleading. An agent can be architecturally secure and still be unable to complete the task it was purchased to do. An agent can perform its task extremely well and still leak sensitive data via prompt injection on every deployment. Collapsing these into one number hides both problems.
Procurement and legal ask the security question first: does this agent meet our infosec requirements, can we deploy it under our compliance posture, have we seen its test results. Operations and the business unit ask the capability question second: does it work, how fast, how often does it fail, can our team use it without a PhD in prompting.
Trustmark keeps both axes visible, never combined. The public leaderboard shows Security grade and Capability grade side by side. Certified vendors publish both. A vendor can earn an A on one axis and a C on the other — that information is the point.
The data
The market for this standard
65%
of enterprises had at least one AI agent security incident in the past 12 months.
97%
of security leaders expect a material AI-agent incident within the next 12 months.
21%
Only 21% of enterprises have runtime visibility into what their agents are doing.
Every number above represents a procurement decision made without a trust score.
The source frameworks
We aggregate. We don’t invent.
We don’t invent a framework. We aggregate how agents score against the frameworks that already exist.
| Framework | Publisher | What it covers | Trustmark axis |
|---|---|---|---|
| OWASP LLM Top 10 | OWASP | Ten ranked LLM application risks including prompt injection, supply chain, and excessive agency | Security |
| OWASP Agentic AI Top 10 | OWASP | Agent-specific risks: goal hijack, tool misuse, memory poisoning, rogue agents | Security |
| NIST AI RMF 1.0 | NIST | Risk management lifecycle: Govern, Map, Measure, Manage | Security · Capability |
| ISO/IEC 42001:2023 | ISO | AI management system governance: risk assessment, lifecycle management, supplier oversight | Security · Capability |
| SOC 2 Type II | AICPA | Security, availability, confidentiality, processing integrity, and privacy trust service criteria | Security |
| MITRE ATLAS v5.4 | MITRE | 16 tactics, 84 techniques for AI/ML threat modeling including agent-specific attacks | Security |
| EU AI Act | European Parliament | Risk-tiered compliance obligations: logging, human oversight, transparency, conformity assessment | Security |
| Anthropic MCP Security Spec | Anthropic | OAuth 2.1, PKCE, resource indicators, token scoping for agent tool connections | Security · Capability |
| CSA MAESTRO | Cloud Security Alliance | Multi-agent system threat taxonomy: trust hierarchies, agent supply chain, runtime governance | Security |
| NYDFS Cybersecurity Regulation | NYDFS | Third-party service provider security requirements, penetration testing, audit trails | Security |
| FINRA 2026 AI Oversight | FINRA | AI agent governance for financial services: access monitoring, HITL procedures, action logging | Security · Capability |
The scoring rubric
The two axes in detail
Each axis is a weighted composite of sub-dimensions. The weights reflect where the evidence shows the highest incident frequency and the most consequential failures.
Security Axis
Authentication architecture, prompt injection defenses, spending controls, sandbox isolation. The primary failure mode in 65% of enterprise incidents.
Data handling, PII exposure controls, data residency, retention limits. What can the agent see, and who else can see what it processed.
Tamper-evident logging, full action trace to a human identity, SOC 2 Trust Services Criteria coverage, OpenTelemetry compliance.
ISO 42001 certification status, EU AI Act obligations met, NYDFS / FINRA sector requirements, published model card or AIBOM.
Capability Axis
Task completion rate across the agent’s stated use cases. Benchmark performance on representative production scenarios.
Latency, token cost per task completion, resource consumption. Capability at scale, not just in demonstration conditions.
Uptime, error rate, consistency of outputs across equivalent inputs. What operations asks after procurement signs off.
MCP/A2A discoverability, API documentation quality, integration friction for the agents and systems that will orchestrate this one.
Letter grades
Scores map to letter grades on each axis
Each axis produces an independent letter grade. Both grades are always published. The underlying 0–100 score is the authoritative number for tracking and comparison.
| Grade | Score range | What it means |
|---|---|---|
| A+ | 95 – 100 | Category-leading. Meets or exceeds every applicable framework control. |
| A | 85 – 94 | Strong posture. Minor gaps in one or two controls. |
| A− | 80 – 84 | Above average. Framework coverage is solid; operational evidence is thinner. |
| B | 70 – 79 | Adequate. Notable gaps in 2–3 controls. Procurement can proceed with documented exceptions. |
| C | 55 – 69 | Below average. Significant gaps. Requires remediation plan before deployment in sensitive contexts. |
| D | 40 – 54 | Poor posture. Structural deficiencies that cannot be resolved by configuration alone. |
| F | 0 – 39 | Critical failures present. Active risk to deploying organization. |
Grading scope
Public leaderboard. Paid certification.
Public Leaderboard — Free
Top 10 public agents, graded quarterly
The ten most widely deployed AI agents in the market receive Trustmark grades every quarter at no cost and without their cooperation. Grades are published regardless of outcome. Vendors cannot suppress a public grade.
- ChatGPT
- Claude
- Gemini
- Copilot
- Perplexity
- SAP Joule
- Coupa Navi
- Jaggaer JAI
- Workday AI
- Oracle AI
Certified — Paid
Any B2B SaaS agent can request grading
Paid certification is available to any B2B SaaS agent that wants a published Trustmark grade. Grades are published only with the vendor’s permission. Certified vendors receive a detailed rubric report, a grade badge, and a dispute mechanism via the public GitHub RFC process.
Pricing published at the time of v1.0 launch, June 2026.
Free for the agents everyone uses. Paid for proving yours is as safe as the best.
Reproducibility
Open corpus. Frozen versions. Auditable by anyone.
The Trustmark scoring rubric, the test corpus, and the reference grading library are all published under CC BY 4.0. Any party with the same corpus version and the same rubric version should arrive at the same Trustmark grade independently. If they don’t, that is a public dispute filed against the methodology, not a private appeal.
Every Trustmark report carries its corpus version and rubric version. When either changes
materially, the version number bumps. Prior-version grades are not retroactively modified.
The reference Python library trustmark-score is MIT-licensed and available
on PyPI. Prompt injection test fixtures are versioned in the same repository as the rubric.
This is the same governance model as EVI. The spec editor is Elephant Accountability LLC. The governance document is in the repository. Disputes go through the public GitHub RFC process.
How Elephant uses Trustmark
Integrated into everything we audit
-
Vendor Audit — Trustmark Sub-Score
Every vendor we assess for a procurement audit receives a Trustmark Security sub-score as part of the deliverable. Procurement teams get a documented, reproducible grade alongside the audit narrative.
Included -
Directory — Verified Tier
Vendors in the Elephant Agent-Ready Directory at the Verified tier display their current Trustmark status. Buyers can filter the directory by Trustmark Security grade.
Included -
$499 llms.txt Review — Mini Security Check
The $499 llms.txt Review now includes a mini Trustmark Security check: prompt injection surface assessment, authentication model review, and a preliminary Security axis score.
$499
Two standards, one governance model
Trustmark and EVI are sister standards
EVI — Elephant Visibility Index
Does the agent find your business?
EVI measures how often and how prominently AI assistants recommend a vendor when a buyer asks a category question. It is a discoverability score. The question it answers: do AI agents know you exist.
Trustmark Certified
Can you trust the agent you deploy?
Trustmark measures whether the agent a business deploys is safe and capable. It is a trust score. The question it answers: should you be deploying this agent in a production context with real data and real spend authority.
EVI and Trustmark share the same governance model, the same open methodology, and the same CC BY 4.0 license. Neither is proprietary. Both are editor-maintained by Elephant Accountability LLC with public dispute resolution. No overlap in what they measure. Complementary instruments. See the full methodology →
FAQ
Six questions about Trustmark
Who owns Trustmark?
Elephant Accountability LLC is the current editor and maintainer of the Trustmark specification. The content of the spec is published under Creative Commons BY 4.0. Anyone can implement, fork, or build services on top of Trustmark with attribution. The reference scoring library is MIT-licensed. Neither the spec nor the library is proprietary.
Can my agency use Trustmark in our own reports?
Yes, that is the point. Attribution requirement: “Scored per Trustmark v0.9 methodology, eaccountability.org/trustmark.” No license fee. No approval required. If you make improvements to the rubric, submit them as a pull request to the spec repository. Substantive methodology changes require the RFC process.
How is Trustmark different from SOC 2?
SOC 2 audits organizations — their controls, their policies, their operational processes. A company can hold a SOC 2 Type II report while its AI agents have no spending controls, no prompt injection defenses, and no per-agent identity. Trustmark scores the agent itself. The two are complementary: a SOC 2 report covers the organization; a Trustmark grade covers the specific agent being deployed.
What if a vendor disagrees with their grade?
Disputes are filed through the public GitHub RFC process. Evidence-based: the vendor must provide documentation showing that the rubric item in question was incorrectly assessed. Disputes and their resolutions are public. Grades can be revised if the evidence supports it, and the revision reason is published alongside the updated grade.
How often does the public leaderboard update?
Quarterly re-verification for all ten agents on the public leaderboard. If a significant new vulnerability, CVE, or framework update warrants an interim re-grade, we do so and note the reason. Certified vendors can request re-grading after material changes to their agent architecture.
What happens after the RFC period?
The public comment period runs through 2026-06-22. We incorporate substantive feedback, resolve open issues in the spec repository, and lock v1.0 on June 22, 2026. That version persists indefinitely. Future improvements require v1.1 or v2.0 and a new corpus run. Prior-version grades remain published and labeled with their version number.
Governance and license
Open. Versioned. Attributed.
The Trustmark specification is published under Creative Commons BY 4.0. The reference Python library is MIT-licensed. Elephant Accountability LLC acts as editor, not as sole authority. The governance model is documented in the spec repository.
Two numbers every agent deployment needs.
Security grade. Capability grade. Both public. Both reproducible. The public leaderboard is free. Certification for your own agent is paid.