Professional Practice Study Guide
Study guide covering AI red teaming methodology, engagement management, report writing, governance frameworks, and professional ethics.
Professional Practice Study Guide
This guide covers the professional side of AI red teaming: how to plan and execute engagements, write effective reports, navigate governance frameworks, and maintain ethical standards. It supports preparation for the Governance and Professional Skills assessments.
Engagement Methodology
A professional AI red team engagement follows a structured methodology that ensures thoroughness, legal protection, and actionable results.
Engagement Lifecycle
| Phase | Key Activities | Deliverables |
|---|---|---|
| Pre-engagement | Scoping, rules of engagement, legal authorization, team assembly | Signed scope document, RoE, authorization letter |
| Reconnaissance | System prompt extraction, model fingerprinting, capability mapping, trust boundary identification | Architecture diagram, threat model, attack surface inventory |
| Threat Modeling | Identifying threats by component, prioritizing by impact and exploitability | Prioritized threat matrix, testing plan |
| Active Testing | Executing attacks by category, logging all test cases, iterating on findings | Testing log, raw findings |
| Analysis | Classifying findings by severity, determining root causes, developing remediations | Draft findings with severity ratings |
| Reporting | Writing the report, peer review, executive summary, technical details | Final report |
| Debrief | Presenting findings to stakeholders, answering questions, planning remediation | Debrief meeting notes, remediation tracking |
| Retest | Verifying remediations, testing for regressions | Retest report |
Scoping Best Practices
The scope document is the foundation of the engagement. It should be specific enough to prevent scope disputes but flexible enough to allow following interesting leads.
Must include:
- Target system architecture (model provider, version, orchestration framework, connected tools, data sources)
- Attack categories in scope (prompt injection, jailbreaking, data extraction, agent exploitation, etc.)
- Explicit exclusions (production systems with real users, specific techniques, out-of-scope infrastructure)
- Rules of engagement (testing windows, escalation contacts, data handling, emergency procedures)
- Success criteria (what qualifies as a finding, severity thresholds for reporting)
- Deliverables and timeline
- Written authorization with signatures from individuals with legal authority
Common scoping mistakes:
- Scoping too broadly ("test everything") without prioritization
- Failing to specify which model version and configuration are in scope
- Not addressing how model version changes during the engagement will be handled
- Omitting data handling procedures for accidentally discovered PII
- Not establishing communication cadence and escalation paths
Rules of Engagement
Rules of engagement define the operational boundaries. They protect both the red team and the client.
| RoE Element | Purpose | Example |
|---|---|---|
| Authorized targets | Legal protection, prevents scope creep | "Testing is limited to the staging instance of the customer-facing chatbot at chat-staging.example.com" |
| Prohibited techniques | Prevents unacceptable risk | "No testing against production endpoints serving real users. No social engineering of employees." |
| Testing windows | Minimizes operational disruption | "Testing permitted 09:00-18:00 EST, Monday-Friday. No testing during quarterly board presentations." |
| Data handling | Protects discovered sensitive data | "Any PII encountered will be immediately reported to the security contact and not stored in testing logs." |
| Escalation contacts | Ensures rapid response to critical findings | "Critical findings: call [name] at [phone]. Non-critical: email [address] within 24 hours." |
| Communication | Keeps stakeholders informed | "Daily status updates via Slack. Weekly summary calls. Final readout presentation." |
Threat Modeling for AI Systems
Threat modeling should be performed after scoping and before active testing. It guides where to focus testing effort.
STRIDE Applied to AI Systems
| STRIDE Category | AI-Specific Examples |
|---|---|
| Spoofing | Impersonating authorized users or systems, forging tool call responses, spoofing MCP server identity |
| Tampering | Poisoning training data, modifying RAG documents, altering model artifacts in transit |
| Repudiation | Insufficient logging of prompt/response pairs, lack of audit trail for model decisions |
| Information Disclosure | System prompt leaking, training data extraction, PII in model outputs, embedding inversion |
| Denial of Service | Resource-exhaustion prompts, context window flooding, infinite tool call loops |
| Elevation of Privilege | Prompt injection gaining tool access, cross-agent escalation, bypassing role-based guardrails |
Trust Boundary Analysis
Identifying trust boundaries is critical. Every point where untrusted data enters the system is a potential injection surface:
- User input to model: The primary direct injection surface
- Retrieved documents to model: Indirect injection via RAG
- Tool outputs to model: Indirect injection via tool responses
- Inter-agent messages: Lateral injection in multi-agent systems
- Model output to downstream systems: Insecure output handling
- External APIs to orchestration layer: Supply chain and integration risks
Report Writing
The report is the primary deliverable. A well-written report translates technical findings into actionable intelligence for multiple audiences.
Report Structure
| Section | Audience | Content |
|---|---|---|
| Executive Summary | Executives, legal, compliance | Business impact, risk level, key findings (no technical jargon), strategic recommendations |
| Methodology | Technical leads, auditors | Approach, scope, tools used, frameworks referenced (ATLAS, OWASP), limitations |
| Findings | Engineers, security team | Detailed findings with severity, reproduction steps, root cause, remediation |
| Negative Results | Security team, auditors | Attack categories tested that were successfully defended, establishing coverage baseline |
| Recommendations | Engineering, management | Prioritized remediation roadmap, quick wins vs. structural improvements |
| Appendices | Engineers | Raw test logs, full prompt/response sequences, tool configurations |
Finding Format
Each finding should follow a consistent structure:
Title: [Descriptive name]
ID: [Unique identifier]
Severity: [Critical/High/Medium/Low/Informational]
OWASP LLM: [LLM01-LLM10 mapping]
MITRE ATLAS: [Technique ID]
Description: What the vulnerability is and why it matters.
Business Impact: What an attacker could achieve and the consequence
to the organization.
Reproduction Steps:
1. [Detailed step with exact inputs]
2. [Expected model response]
3. [Next step...]
Evidence: [Screenshots, full prompt/response logs]
Root Cause: Why the vulnerability exists at a technical level.
Remediation: Specific, implementable recommendations.
Verification: How to confirm the fix works.
Severity Classification
Use a consistent rubric that considers multiple factors:
| Factor | Critical | High | Medium | Low |
|---|---|---|---|---|
| Exploitability | Single prompt, no prerequisites | Few turns, minimal prerequisites | Requires specific conditions | Complex, unreliable |
| Impact | RCE, full data access, credential theft | PII leakage, significant policy bypass | Moderate policy bypass, off-topic responses | Minor behavioral deviation |
| Scope | All users, all contexts | Many users or broad contexts | Specific user scenarios | Edge cases |
| Compensating controls | None effective | Partially effective | Significant mitigation exists | Strong mitigation exists |
Common Report Writing Mistakes
Governance Frameworks
AI red teamers must understand the governance landscape to contextualize findings and communicate with compliance stakeholders.
Framework Comparison
| Framework | Type | Scope | Key Features | Red Team Relevance |
|---|---|---|---|---|
| NIST AI RMF | Voluntary framework | All AI systems | Four functions: Govern, Map, Measure, Manage | Red teaming is a key Measure function activity |
| EU AI Act | Regulation (mandatory) | AI systems deployed/affecting EU | Risk-based classification, mandatory requirements for high-risk | Red teaming supports conformity assessment |
| ISO/IEC 42001 | Certifiable standard | Organizations deploying AI | AI Management System requirements | Provides governance structure for actioning findings |
| OWASP LLM Top 10 | Industry guidance | LLM-based applications | Ranked risk list with mitigations | Primary technical taxonomy for findings |
| MITRE ATLAS | Knowledge base | ML/AI systems | Adversary TTP matrix | Attack classification and coverage tracking |
Risk Management Process
Understanding how red teaming findings flow through governance:
- Finding identified during red team engagement
- Classified by severity using consistent rubric
- Mapped to frameworks (OWASP, ATLAS, regulatory requirements)
- Communicated to stakeholders via report and debrief
- Entered into risk register with owner, timeline, and acceptance criteria
- Remediated by engineering team within SLA
- Verified through retesting by red team
- Tracked through governance review cycles
- Accepted (if residual risk exists) by authorized individual with documented rationale
Ethics and Professional Conduct
Ethical Boundaries
AI red teaming presents unique ethical challenges that go beyond traditional penetration testing:
| Challenge | Traditional Pentesting | AI Red Teaming |
|---|---|---|
| Harmful content | Rarely generated directly | Successful jailbreaks produce genuinely harmful text that must be handled responsibly |
| Technique portability | Exploits require technical skill to replicate | Prompt-based attacks are trivially copy-pasted by anyone |
| Bias exposure | Not typically relevant | Testing may reveal biases affecting real demographic groups |
| Dual-use risk | Moderate -- exploits have specific targets | High -- attack prompts work across model families |
| Researcher wellbeing | Physical safety concerns in some contexts | Psychological impact of repeatedly eliciting harmful content |
Professional Standards
- Authorization: Never test without written authorization. Scope documents protect you legally.
- Data handling: Follow agreed procedures for PII and sensitive data. When in doubt, do not store it.
- Responsible disclosure: Balance transparency with harm prevention when publishing research.
- Confidentiality: Client findings are confidential unless explicitly agreed otherwise.
- Honest reporting: Report what you found, including limitations. Do not overstate or understate findings.
- Continuous learning: The field evolves rapidly. Stay current through research, community, and practice.
- Colleague wellbeing: Monitor for signs of secondary stress from exposure to harmful content. Rotate team members on particularly distressing testing.
Common Pitfalls
Study Checklist
Before taking the Governance and Professional Skills assessments, confirm you can:
- Describe the complete engagement lifecycle from scoping through retest
- Write a finding in the standard format with all required components
- Classify a finding's severity using a multi-factor rubric
- Explain the purpose of each section in a red team report
- Name three governance frameworks and their relationship to red teaming
- Describe the NIST AI RMF's four core functions
- Explain how the EU AI Act's risk classification affects testing requirements
- Articulate the ethical challenges unique to AI red teaming
- Describe how findings flow through governance from discovery to remediation
- Explain why negative results are valuable in a red team report