Professional Practice Study Guide

intermediate10 min readUpdated 2026-03-15

Study guide covering AI red teaming methodology, engagement management, report writing, governance frameworks, and professional ethics.

study-guide professional methodology reporting governance

Professional Practice Study Guide

This guide covers the professional side of AI red teaming: how to plan and execute engagements, write effective reports, navigate governance frameworks, and maintain ethical standards. It supports preparation for the Governance and Professional Skills assessments.

Engagement Methodology

A professional AI red team engagement follows a structured methodology that ensures thoroughness, legal protection, and actionable results.

Engagement Lifecycle

Phase	Key Activities	Deliverables
Pre-engagement	Scoping, rules of engagement, legal authorization, team assembly	Signed scope document, RoE, authorization letter
Reconnaissance	System prompt extraction, model fingerprinting, capability mapping, trust boundary identification	Architecture diagram, threat model, attack surface inventory
Threat Modeling	Identifying threats by component, prioritizing by impact and exploitability	Prioritized threat matrix, testing plan
Active Testing	Executing attacks by category, logging all test cases, iterating on findings	Testing log, raw findings
Analysis	Classifying findings by severity, determining root causes, developing remediations	Draft findings with severity ratings
Reporting	Writing the report, peer review, executive summary, technical details	Final report
Debrief	Presenting findings to stakeholders, answering questions, planning remediation	Debrief meeting notes, remediation tracking
Retest	Verifying remediations, testing for regressions	Retest report

Scoping Best Practices

The scope document is the foundation of the engagement. It should be specific enough to prevent scope disputes but flexible enough to allow following interesting leads.

Must include:

Target system architecture (model provider, version, orchestration framework, connected tools, data sources)
Attack categories in scope (prompt injection, jailbreaking, data extraction, agent exploitation, etc.)
Explicit exclusions (production systems with real users, specific techniques, out-of-scope infrastructure)
Rules of engagement (testing windows, escalation contacts, data handling, emergency procedures)
Success criteria (what qualifies as a finding, severity thresholds for reporting)
Deliverables and timeline
Written authorization with signatures from individuals with legal authority

Common scoping mistakes:

Scoping too broadly ("test everything") without prioritization
Failing to specify which model version and configuration are in scope
Not addressing how model version changes during the engagement will be handled
Omitting data handling procedures for accidentally discovered PII
Not establishing communication cadence and escalation paths

Rules of Engagement

Rules of engagement define the operational boundaries. They protect both the red team and the client.

RoE Element	Purpose	Example
Authorized targets	Legal protection, prevents scope creep	"Testing is limited to the staging instance of the customer-facing chatbot at chat-staging.example.com"
Prohibited techniques	Prevents unacceptable risk	"No testing against production endpoints serving real users. No social engineering of employees."
Testing windows	Minimizes operational disruption	"Testing permitted 09:00-18:00 EST, Monday-Friday. No testing during quarterly board presentations."
Data handling	Protects discovered sensitive data	"Any PII encountered will be immediately reported to the security contact and not stored in testing logs."
Escalation contacts	Ensures rapid response to critical findings	"Critical findings: call [name] at [phone]. Non-critical: email [address] within 24 hours."
Communication	Keeps stakeholders informed	"Daily status updates via Slack. Weekly summary calls. Final readout presentation."

Threat Modeling for AI Systems

Threat modeling should be performed after scoping and before active testing. It guides where to focus testing effort.

STRIDE Applied to AI Systems

STRIDE Category	AI-Specific Examples
Spoofing	Impersonating authorized users or systems, forging tool call responses, spoofing MCP server identity
Tampering	Poisoning training data, modifying RAG documents, altering model artifacts in transit
Repudiation	Insufficient logging of prompt/response pairs, lack of audit trail for model decisions
Information Disclosure	System prompt leaking, training data extraction, PII in model outputs, embedding inversion
Denial of Service	Resource-exhaustion prompts, context window flooding, infinite tool call loops
Elevation of Privilege	Prompt injection gaining tool access, cross-agent escalation, bypassing role-based guardrails

Trust Boundary Analysis

Identifying trust boundaries is critical. Every point where untrusted data enters the system is a potential injection surface:

User input to model: The primary direct injection surface
Retrieved documents to model: Indirect injection via RAG
Tool outputs to model: Indirect injection via tool responses
Inter-agent messages: Lateral injection in multi-agent systems
Model output to downstream systems: Insecure output handling
External APIs to orchestration layer: Supply chain and integration risks

Report Writing

The report is the primary deliverable. A well-written report translates technical findings into actionable intelligence for multiple audiences.

Report Structure

Section	Audience	Content
Executive Summary	Executives, legal, compliance	Business impact, risk level, key findings (no technical jargon), strategic recommendations
Methodology	Technical leads, auditors	Approach, scope, tools used, frameworks referenced (ATLAS, OWASP), limitations
Findings	Engineers, security team	Detailed findings with severity, reproduction steps, root cause, remediation
Negative Results	Security team, auditors	Attack categories tested that were successfully defended, establishing coverage baseline
Recommendations	Engineering, management	Prioritized remediation roadmap, quick wins vs. structural improvements
Appendices	Engineers	Raw test logs, full prompt/response sequences, tool configurations

Finding Format

Each finding should follow a consistent structure:

Title: [Descriptive name]
ID: [Unique identifier]
Severity: [Critical/High/Medium/Low/Informational]
OWASP LLM: [LLM01-LLM10 mapping]
MITRE ATLAS: [Technique ID]

Description: What the vulnerability is and why it matters.

Business Impact: What an attacker could achieve and the consequence
to the organization.

Reproduction Steps:
1. [Detailed step with exact inputs]
2. [Expected model response]
3. [Next step...]

Evidence: [Screenshots, full prompt/response logs]

Root Cause: Why the vulnerability exists at a technical level.

Remediation: Specific, implementable recommendations.

Verification: How to confirm the fix works.

Severity Classification

Use a consistent rubric that considers multiple factors:

Factor	Critical	High	Medium	Low
Exploitability	Single prompt, no prerequisites	Few turns, minimal prerequisites	Requires specific conditions	Complex, unreliable
Impact	RCE, full data access, credential theft	PII leakage, significant policy bypass	Moderate policy bypass, off-topic responses	Minor behavioral deviation
Scope	All users, all contexts	Many users or broad contexts	Specific user scenarios	Edge cases
Compensating controls	None effective	Partially effective	Significant mitigation exists	Strong mitigation exists

Common Report Writing Mistakes

Warning

Vague reproduction steps: "Send a jailbreak prompt" is not a reproduction step. Include the exact prompt text, model parameters, and expected responses.
Missing business impact: Technical findings without business context do not drive remediation. Translate every finding into organizational consequences.
Severity inflation: Rating everything as Critical undermines credibility. Use your rubric consistently.
No root cause: "The model was jailbroken" is a symptom. The root cause might be "the guardrail only checks the first message in a multi-turn conversation."
Missing negative results: Omitting what was tested successfully leaves the client unsure about their security posture.
Remediation without specificity: "Improve the guardrails" is not actionable. "Add output-side PII detection using a NER model to scan all responses before delivery" is actionable.

Governance Frameworks

AI red teamers must understand the governance landscape to contextualize findings and communicate with compliance stakeholders.

Framework Comparison

Framework	Type	Scope	Key Features	Red Team Relevance
NIST AI RMF	Voluntary framework	All AI systems	Four functions: Govern, Map, Measure, Manage	Red teaming is a key Measure function activity
EU AI Act	Regulation (mandatory)	AI systems deployed/affecting EU	Risk-based classification, mandatory requirements for high-risk	Red teaming supports conformity assessment
ISO/IEC 42001	Certifiable standard	Organizations deploying AI	AI Management System requirements	Provides governance structure for actioning findings
OWASP LLM Top 10	Industry guidance	LLM-based applications	Ranked risk list with mitigations	Primary technical taxonomy for findings
MITRE ATLAS	Knowledge base	ML/AI systems	Adversary TTP matrix	Attack classification and coverage tracking

Risk Management Process

Understanding how red teaming findings flow through governance:

Finding identified during red team engagement
Classified by severity using consistent rubric
Mapped to frameworks (OWASP, ATLAS, regulatory requirements)
Communicated to stakeholders via report and debrief
Entered into risk register with owner, timeline, and acceptance criteria
Remediated by engineering team within SLA
Verified through retesting by red team
Tracked through governance review cycles
Accepted (if residual risk exists) by authorized individual with documented rationale

Ethics and Professional Conduct

Ethical Boundaries

AI red teaming presents unique ethical challenges that go beyond traditional penetration testing:

Challenge	Traditional Pentesting	AI Red Teaming
Harmful content	Rarely generated directly	Successful jailbreaks produce genuinely harmful text that must be handled responsibly
Technique portability	Exploits require technical skill to replicate	Prompt-based attacks are trivially copy-pasted by anyone
Bias exposure	Not typically relevant	Testing may reveal biases affecting real demographic groups
Dual-use risk	Moderate -- exploits have specific targets	High -- attack prompts work across model families
Researcher wellbeing	Physical safety concerns in some contexts	Psychological impact of repeatedly eliciting harmful content

Professional Standards

Authorization: Never test without written authorization. Scope documents protect you legally.
Data handling: Follow agreed procedures for PII and sensitive data. When in doubt, do not store it.
Responsible disclosure: Balance transparency with harm prevention when publishing research.
Confidentiality: Client findings are confidential unless explicitly agreed otherwise.
Honest reporting: Report what you found, including limitations. Do not overstate or understate findings.
Continuous learning: The field evolves rapidly. Stay current through research, community, and practice.
Colleague wellbeing: Monitor for signs of secondary stress from exposure to harmful content. Rotate team members on particularly distressing testing.

Common Pitfalls

Warning

"Red teaming is just jailbreaking": Red teaming encompasses the full attack lifecycle -- recon, threat modeling, multi-vector testing, analysis, and reporting. Jailbreaking is one technique among many.
"The report is the deliverable": The report is a communication tool. The actual deliverable is the client's improved security posture, which requires actionable findings, clear communication, and remediation support.
"All findings need technical depth": Executive summaries and business impact statements are equally important. A technically perfect report that no decision-maker reads does not improve security.
"Governance is someone else's problem": Red teamers who understand governance frameworks communicate findings more effectively and ensure their work drives organizational change.
"Testing is complete when time runs out": Professional assessments track coverage systematically and document what was and was not tested, regardless of time pressure.

Study Checklist

Before taking the Governance and Professional Skills assessments, confirm you can:

Professional Practice Study Guide

Related articles

Professional Practice Study Guide

Related articles