Understanding Your Results
After every scan, TrustTrace produces an OWASP-scored vulnerability report. This guide explains how to read your results and prioritize remediation.
OWASP Score and Letter Grade
Every scan produces an overall score from 0 to 100, translated to a letter grade:
| Score | Grade | What It Means |
|---|---|---|
| 90–100 | A | Strong security posture. Few or no significant findings. |
| 80–89 | B | Good. Minor issues that should be addressed but don't represent immediate risk. |
| 70–79 | C | Adequate. Notable gaps that need attention, particularly for regulated industries. |
| 50–69 | D | Concerning. Multiple significant vulnerabilities. Remediation should be prioritized. |
| 0–49 | F | Critical risk. Immediate action required. Not suitable for production without remediation. |
The score is weighted by industry context. Healthcare organizations are scored more heavily on data protection categories (LLM02, LLM07) because PHI exposure carries regulatory penalties. Financial services organizations are weighted more heavily on excessive agency (LLM06) because unauthorized transactions have direct financial impact.
A first-time assessment for an organization that hasn't prioritized AI security typically scores between 35 and 55.
Severity Levels
Each finding is assigned a severity level based on its potential impact and exploitability:
Critical
Represents an immediate, exploitable risk that could result in data breach, unauthorized access, or system compromise. Examples:
- PHI exposed in plaintext log files (HIPAA violation risk)
- Unauthenticated MCP server on a public network
- Raw SQL execution from LLM-generated input
- Hardcoded API keys in public repositories
- Tool poisoning detected in MCP server descriptions
Action: Address within 24–48 hours. These are actively exploitable.
High
Represents a significant risk that requires prompt attention but may require specific conditions to exploit. Examples:
- LLM provider without BAA handling PHI
- Write operations on sensitive data without human approval
- Prompt injection sinks in agent code
- MCP tool definitions not version-pinned (rug pull risk)
- Known CVEs in agent dependencies
Action: Address within 1–2 weeks. Plan remediation immediately.
Medium
Represents a moderate risk or a best-practice gap. Not immediately exploitable but increases your attack surface. Examples:
- CORS configured with wildcard origins
- Missing parameter constraints on tool inputs
- Unpinned general dependencies
- AI endpoints without WAF/CDN protection
- Agent exposes more tools than its stated purpose requires
Action: Address within 30–60 days as part of scheduled security improvements.
Low
Represents a minor observation or informational finding. Low risk on its own but may contribute to a larger attack chain. Examples:
- Debug logging enabled in non-test environments
- Architecture details discoverable in public documentation
- Abandoned dependencies with no known vulnerabilities
Action: Address during normal development cycles. Track for awareness.
OWASP LLM Top 10 Categories
Every finding maps to one or more categories in the OWASP LLM Top 10. This is the industry-standard framework for classifying AI/LLM security risks. For a complete reference, see the OWASP LLM Top 10 Guide.
| Category | Short Name | Common Findings |
|---|---|---|
| LLM01 | Prompt Injection | User input in prompts, RAG poisoning, MCP tool poisoning |
| LLM02 | Sensitive Information Disclosure | PHI in logs, excessive data in responses, missing redaction |
| LLM03 | Supply Chain | Unvetted dependencies, missing BAAs, MCP rug pull risk |
| LLM04 | Data and Model Poisoning | Uncontrolled RAG ingestion, no content screening |
| LLM05 | Improper Output Handling | Raw SQL/code execution from LLM output |
| LLM06 | Excessive Agency | Over-permissioned tools, writes without approval, unauthenticated MCP |
| LLM07 | System Prompt Leakage | Secrets in prompts, prompt extraction, public code exposure |
| LLM08 | Vector and Embedding Weaknesses | Missing retrieval scoping, cross-tenant data |
| LLM09 | Misinformation | High-stakes decisions without disclaimers, missing citations |
| LLM10 | Unbounded Consumption | No input limits, no cost controls, missing rate limiting |
Compliance Mapping (Enterprise Tier)
Enterprise tier scans include compliance mapping on each finding, linking vulnerabilities to specific regulatory controls:
HIPAA Controls
Findings that affect Protected Health Information (PHI) are mapped to relevant HIPAA Security Rule sections:
- §164.312(a) — Access Control
- §164.312(b) — Audit Controls
- §164.312(c) — Integrity Controls
- §164.312(d) — Authentication
- §164.312(e) — Transmission Security
- §164.314(a) — Business Associate Contracts
SOC 2 Trust Services Criteria
Findings are mapped to SOC 2 criteria covering security, availability, and confidentiality controls relevant to AI systems.
This mapping helps compliance teams connect technical findings to audit requirements without manual interpretation.
Confidence Levels
Findings include a confidence indicator reflecting how the vulnerability was detected:
| Confidence | Meaning | Source |
|---|---|---|
| Confirmed | Vulnerability directly observed through scanning or live testing | Log analysis, live MCP connection, CVE database match |
| Probable | Strong indicator based on code or configuration analysis | Code pattern matching, tool schema analysis |
| Theoretical | Risk identified from architecture review, not confirmed through scanning | Interview responses, offline analysis |
Confirmed findings should be treated as verified vulnerabilities. Probable findings should be validated by your team. Theoretical findings indicate areas to investigate further.
Prioritizing Remediation
Not all findings need immediate action. Here's a practical prioritization framework:
Fix now (this week):
- All Critical findings
- High findings with Confirmed confidence
- Any finding involving PHI exposure (regulatory risk)
Fix soon (this month):
- Remaining High findings
- Medium findings with Confirmed confidence
- Supply chain vulnerabilities with available patches
Fix eventually (this quarter):
- Medium findings with Probable confidence
- Low findings
- Best-practice improvements
Monitor ongoing:
- MCP baseline changes (set up regular rescans)
- New CVEs in your dependencies (rescan after dependency updates)
- Findings flagged as "monitor candidates" (ongoing risks, not one-time fixes)
Baseline Comparisons (Pro+)
When you scan the same MCP server URL more than once, TrustTrace compares the current scan against the most recent baseline. The comparison shows:
- New tools — Tools added since the last scan. Could indicate legitimate feature additions or a rug pull attack.
- Modified descriptions — Tool descriptions that changed. Watch for injected instructions.
- Changed parameters — Parameter schemas that expanded (new fields, removed constraints).
- Removed tools — Tools that disappeared. May indicate cleanup or an attempt to remove evidence.
Any change is worth investigating. Legitimate changes should be documented by the MCP server maintainer. Unexpected changes — especially to tool descriptions — may indicate a supply chain compromise.
Next Steps
- Managed Assessments — If your scan reveals significant issues, a managed assessment provides expert-led analysis, live injection testing, and a comprehensive remediation roadmap
- CI/CD Integration — Automate scans in your deployment pipeline
- API Reference — Build custom scanning workflows