Red Team Findings → Remediation
How to map offensive findings to defensive recommendations, severity scoring for AI vulnerabilities, actionable remediation guidance, and the report-to-fix pipeline.
Finding vulnerabilities is only half the job. The other half is translating those findings into remediation that defenders can understand, prioritize, and implement. This page covers the entire pipeline from raw finding to verified fix.
The Finding-to-Fix Pipeline
Discovery → Classification → Severity Scoring → Remediation Mapping → Report → Fix → Verify
Each stage requires structured thinking:
Stage 1: Classification
Categorize findings using a consistent taxonomy. The OWASP Top 10 for LLM Applications provides a standard framework:
| OWASP LLM Category | Example Finding | Typical Remediation Layer |
|---|---|---|
| LLM01: Prompt Injection | System prompt override via delimiter escape | Input filtering + instruction hierarchy |
| LLM02: Insecure Output Handling | XSS via model-generated HTML | Output sanitization + CSP |
| LLM03: Training Data Poisoning | Backdoor in fine-tuning data | Data validation pipeline |
| LLM04: Model Denial of Service | Context window exhaustion via large inputs | Rate limiting + input size limits |
| LLM05: Supply Chain Vulnerabilities | Malicious model on HuggingFace Hub | Model provenance verification |
| LLM06: Sensitive Information Disclosure | System prompt extraction | Prompt design + output filtering |
| LLM07: Insecure Plugin Design | SQL injection via tool call arguments | Input validation on tool layer |
| LLM08: Excessive Agency | Model executes unintended tool calls | Least privilege + approval gates |
| LLM09: Overreliance | Users trust incorrect model output | Confidence indicators + disclaimers |
| LLM10: Model Theft | Model weights extracted via API | Rate limiting + watermarking |
Stage 2: Severity Scoring
AI vulnerabilities do not fit neatly into traditional CVSS scoring. Use an adapted framework:
| Factor | Weight | Low (1) | Medium (2) | High (3) | Critical (4) |
|---|---|---|---|---|---|
| Exploitability | 30% | Requires model access | Requires insider knowledge | Automated, low skill | Automated, zero skill |
| Impact | 30% | Minor policy violation | Data leakage (non-sensitive) | Sensitive data exposure | Safety harm, legal liability |
| Reproducibility | 20% | <10% success rate | 10-50% success rate | 50-90% success rate | >90% success rate |
| Scope | 20% | Single user affected | Multiple users affected | All users of one feature | Entire application affected |
Composite score = weighted sum, mapped to severity:
- 1.0-1.5: Low
- 1.6-2.5: Medium
- 2.6-3.5: High
- 3.6-4.0: Critical
Remediation Mapping
For each finding category, map to specific defensive controls:
Prompt Injection Findings
| Finding | Recommended Remediations |
|---|---|
| Direct instruction override | 1. Deploy prompt shield (Azure/Lakera) on input 2. Implement instruction hierarchy in model config 3. Add output filter for system prompt content |
| Delimiter escape | 1. Use unique, randomized delimiters per session 2. Add delimiter integrity validation 3. Sanitize user input for delimiter characters |
| System prompt extraction | 1. Minimize sensitive content in system prompt 2. Add output filter for system prompt fragments 3. Deploy canary tokens to detect extraction |
Data Exposure Findings
| Finding | Recommended Remediations |
|---|---|
| PII in model output | 1. Deploy PII detection on output (regex + NER) 2. Redact before delivery 3. Audit training data for PII |
| Training data extraction | 1. Implement output diversity requirements 2. Add memorization detection 3. Rate limit repeated queries on same topic |
| RAG context leakage | 1. Filter retrieved documents before inclusion 2. Add access control to retrieval layer 3. Validate that responses stay within authorized context |
Agentic/Tool Findings
| Finding | Recommended Remediations |
|---|---|
| Unauthorized tool execution | 1. Implement tool call approval gates 2. Apply least privilege to tool access 3. Log and monitor all tool invocations |
| Tool argument injection | 1. Validate tool arguments against schema 2. Parameterize tool inputs (no string concatenation) 3. Sandbox tool execution environments |
| Privilege escalation via tools | 1. Enforce per-tool permission boundaries 2. Validate tool chain sequences 3. Implement session-scoped tool permissions |
Writing Effective Remediation Guidance
Be specific, not generic
Bad: "Improve input filtering." Good: "Deploy Azure Prompt Shield on the /api/chat endpoint with threshold set to medium sensitivity. This blocks the delimiter escape technique demonstrated in Finding #3."
Include implementation effort estimates
Tag each recommendation with estimated effort: Quick Win (hours), Moderate (days), Significant (weeks). This helps defenders prioritize.
Provide verification criteria
For each fix, specify how to verify it works: "After deploying the prompt shield, re-run test cases F3-1 through F3-7. All should return the blocked response."
Layer recommendations
Provide short-term mitigations (deploy a filter this week) and long-term fixes (redesign the prompt architecture next quarter).
The Remediation Priority Matrix
When defenders face 20+ findings, they need a prioritization framework:
| Easy to Fix | Hard to Fix | |
|---|---|---|
| High Severity | Fix immediately (Quick Wins) | Plan for next sprint |
| Low Severity | Fix when convenient | Accept risk or defer |
Tracking Findings Through Resolution
The report-to-fix pipeline does not end at delivery. Best practices for tracking:
- Finding registry -- maintain a database of findings with status (open, in-progress, fixed, verified, accepted-risk)
- Retest cycles -- schedule retesting 2-4 weeks after reported fixes to verify they work and have not introduced regressions
- Regression monitoring -- add successful exploit payloads to continuous testing suites so regressions are caught automatically
- Metrics -- track mean time to remediate (MTTR) by severity level; this shows whether the organization's defense posture is improving
Further Reading
- Defense-in-Depth for LLM Apps -- layered defense strategy for comprehensive remediation
- Thinking Like a Defender -- understanding defender priorities for better report reception
- Report Writing -- structuring the overall red team report
Related Topics
- Defense-in-Depth for LLM Apps - Layered defense strategy for comprehensive remediation
- Thinking Like a Defender - Understanding defender priorities for better report reception
- Guardrails & Safety Layer Architecture - Architecture knowledge needed for specific remediation recommendations
- AI Threat Models - Threat modeling context for severity assessment
References
- "OWASP Top 10 for LLM Applications" - OWASP (2025) - Standard vulnerability taxonomy for classifying AI-specific findings
- "CVSS v4.0 Specification" - FIRST (2023) - Common Vulnerability Scoring System adapted for AI vulnerability severity assessment
- "NIST AI Risk Management Framework (AI RMF 1.0)" - NIST (2023) - Risk framework for prioritizing AI security findings and remediation efforts
- "Penetration Testing Report Writing Guide" - SANS Institute (2023) - Best practices for structuring security findings and remediation guidance
You find a prompt injection bypass with 80% success rate that causes the model to leak PII from its RAG context. Using the severity framework, what factors push this toward Critical severity?