Understanding Your Results

After every scan, TrustTrace produces an OWASP-scored vulnerability report. This guide explains how to read your results and prioritize remediation.

OWASP Score and Letter Grade

Every scan produces an overall score from 0 to 100, translated to a letter grade:

Score	Grade	What It Means
90–100	A	Strong security posture. Few or no significant findings.
80–89	B	Good. Minor issues that should be addressed but don't represent immediate risk.
70–79	C	Adequate. Notable gaps that need attention, particularly for regulated industries.
50–69	D	Concerning. Multiple significant vulnerabilities. Remediation should be prioritized.
0–49	F	Critical risk. Immediate action required. Not suitable for production without remediation.

The score is weighted by industry context. Healthcare organizations are scored more heavily on data protection categories (LLM02, LLM07) because PHI exposure carries regulatory penalties. Financial services organizations are weighted more heavily on excessive agency (LLM06) because unauthorized transactions have direct financial impact.

A first-time assessment for an organization that hasn't prioritized AI security typically scores between 35 and 55.

Severity Levels

Each finding is assigned a severity level based on its potential impact and exploitability:

Critical

Represents an immediate, exploitable risk that could result in data breach, unauthorized access, or system compromise. Examples:

PHI exposed in plaintext log files (HIPAA violation risk)
Unauthenticated MCP server on a public network
Raw SQL execution from LLM-generated input
Hardcoded API keys in public repositories
Tool poisoning detected in MCP server descriptions

Action: Address within 24–48 hours. These are actively exploitable.

High

Represents a significant risk that requires prompt attention but may require specific conditions to exploit. Examples:

LLM provider without BAA handling PHI
Write operations on sensitive data without human approval
Prompt injection sinks in agent code
MCP tool definitions not version-pinned (rug pull risk)
Known CVEs in agent dependencies

Action: Address within 1–2 weeks. Plan remediation immediately.

Medium

Represents a moderate risk or a best-practice gap. Not immediately exploitable but increases your attack surface. Examples:

CORS configured with wildcard origins
Missing parameter constraints on tool inputs
Unpinned general dependencies
AI endpoints without WAF/CDN protection
Agent exposes more tools than its stated purpose requires

Action: Address within 30–60 days as part of scheduled security improvements.

Low

Represents a minor observation or informational finding. Low risk on its own but may contribute to a larger attack chain. Examples:

Debug logging enabled in non-test environments
Architecture details discoverable in public documentation
Abandoned dependencies with no known vulnerabilities

Action: Address during normal development cycles. Track for awareness.

OWASP LLM Top 10 Categories

Every finding maps to one or more categories in the OWASP LLM Top 10. This is the industry-standard framework for classifying AI/LLM security risks. For a complete reference, see the OWASP LLM Top 10 Guide.

Category	Short Name	Common Findings
LLM01	Prompt Injection	User input in prompts, RAG poisoning, MCP tool poisoning
LLM02	Sensitive Information Disclosure	PHI in logs, excessive data in responses, missing redaction
LLM03	Supply Chain	Unvetted dependencies, missing BAAs, MCP rug pull risk
LLM04	Data and Model Poisoning	Uncontrolled RAG ingestion, no content screening
LLM05	Improper Output Handling	Raw SQL/code execution from LLM output
LLM06	Excessive Agency	Over-permissioned tools, writes without approval, unauthenticated MCP
LLM07	System Prompt Leakage	Secrets in prompts, prompt extraction, public code exposure
LLM08	Vector and Embedding Weaknesses	Missing retrieval scoping, cross-tenant data
LLM09	Misinformation	High-stakes decisions without disclaimers, missing citations
LLM10	Unbounded Consumption	No input limits, no cost controls, missing rate limiting

Compliance Mapping (Enterprise Tier)

Enterprise tier scans include compliance mapping on each finding, linking vulnerabilities to specific regulatory controls:

HIPAA Controls

Findings that affect Protected Health Information (PHI) are mapped to relevant HIPAA Security Rule sections:

§164.312(a) — Access Control
§164.312(b) — Audit Controls
§164.312(c) — Integrity Controls
§164.312(d) — Authentication
§164.312(e) — Transmission Security
§164.314(a) — Business Associate Contracts

SOC 2 Trust Services Criteria

Findings are mapped to SOC 2 criteria covering security, availability, and confidentiality controls relevant to AI systems.

This mapping helps compliance teams connect technical findings to audit requirements without manual interpretation.

Confidence Levels

Findings include a confidence indicator reflecting how the vulnerability was detected:

Confidence	Meaning	Source
Confirmed	Vulnerability directly observed through scanning or live testing	Log analysis, live MCP connection, CVE database match
Probable	Strong indicator based on code or configuration analysis	Code pattern matching, tool schema analysis
Theoretical	Risk identified from architecture review, not confirmed through scanning	Interview responses, offline analysis

Confirmed findings should be treated as verified vulnerabilities. Probable findings should be validated by your team. Theoretical findings indicate areas to investigate further.

Prioritizing Remediation

Not all findings need immediate action. Here's a practical prioritization framework:

Fix now (this week):

All Critical findings
High findings with Confirmed confidence
Any finding involving PHI exposure (regulatory risk)

Fix soon (this month):

Remaining High findings
Medium findings with Confirmed confidence
Supply chain vulnerabilities with available patches

Fix eventually (this quarter):

Medium findings with Probable confidence
Low findings
Best-practice improvements

Monitor ongoing:

MCP baseline changes (set up regular rescans)
New CVEs in your dependencies (rescan after dependency updates)
Findings flagged as "monitor candidates" (ongoing risks, not one-time fixes)

Baseline Comparisons (Pro+)

When you scan the same MCP server URL more than once, TrustTrace compares the current scan against the most recent baseline. The comparison shows:

New tools — Tools added since the last scan. Could indicate legitimate feature additions or a rug pull attack.
Modified descriptions — Tool descriptions that changed. Watch for injected instructions.
Changed parameters — Parameter schemas that expanded (new fields, removed constraints).
Removed tools — Tools that disappeared. May indicate cleanup or an attempt to remove evidence.

Any change is worth investigating. Legitimate changes should be documented by the MCP server maintainer. Unexpected changes — especially to tool descriptions — may indicate a supply chain compromise.

Next Steps

Managed Assessments — If your scan reveals significant issues, a managed assessment provides expert-led analysis, live injection testing, and a comprehensive remediation roadmap
CI/CD Integration — Automate scans in your deployment pipeline
API Reference — Build custom scanning workflows