Capstone: Red Team Program Design
Design a complete AI red team program for a fictional enterprise, producing a comprehensive program charter document.
Overview
An effective AI red team is more than a collection of skilled testers — it is a program with clear objectives, defined processes, organizational support, and measurable outcomes. This capstone project challenges you to design a complete AI red team program from scratch for a fictional enterprise, producing the charter document that would be used to launch and operate the program.
Program design requires a different skill set than technical testing. You will need to think about organizational dynamics, resource allocation, stakeholder management, and how to demonstrate value to executives who may not understand the technical details of AI security.
Prerequisites
- Full Engagement Methodology — Understanding what a complete engagement looks like
- Planning & Scoping — Engagement-level planning concepts
- Governance & Compliance — Regulatory context for AI security programs
- AI Threat Landscape — Understanding the threats the program must address
- Experience with or exposure to security programs, penetration testing programs, or risk management frameworks
Project Brief
Scenario
You have been hired as the founding lead of the AI Red Team at Helios Financial Group, a large financial services company with 15,000 employees. Helios has been deploying AI aggressively over the past two years and now has:
- 12 production AI systems ranging from customer-facing chatbots to internal fraud detection models
- 6 AI systems in development including an autonomous trading advisor and a loan approval assistant
- 3 third-party AI integrations including an AI-powered CRM, an automated document processing vendor, and a code generation tool used by developers
- No formal AI security testing program — security assessments have been ad hoc, performed by the general penetration testing team with limited AI expertise
The CISO has given you a mandate: design an AI red team program, present it to the executive committee, and be ready to begin operations within 90 days. Your budget for the first year is $1.2 million (covering headcount, tools, and external engagements). You have authority to hire up to 4 full-time team members.
Organizational Context
- Helios is subject to financial regulatory requirements (SOX, PCI-DSS, OCC guidelines on AI in banking)
- The company has a mature traditional security program with an existing pen test team, SOC, and vulnerability management
- The AI development teams are distributed across 4 business units, each with its own engineering practices
- There is executive support for AI security but limited understanding of AI-specific threats
- The existing pen test team has attempted AI testing but acknowledges their techniques are not effective against LLM-based systems
Deliverables
Primary Deliverables
| Deliverable | Description | Weight |
|---|---|---|
| Program charter | Complete program charter document (15-25 pages) | 40% |
| Organizational structure | Org chart, role descriptions, hiring plan | 15% |
| Engagement framework | Engagement types, templates, and cadence plan | 20% |
| Metrics and reporting | KPIs, dashboards, and executive reporting templates | 15% |
| Budget and roadmap | 12-month budget breakdown and quarterly milestones | 10% |
Rubric Criteria
- Completeness (20%) — Charter covers all essential program elements (mission, scope, structure, process, metrics, governance)
- Practicality (25%) — Designs are realistic and implementable within the stated constraints (budget, headcount, timeline)
- Risk Prioritization (15%) — The program prioritizes high-risk systems and aligns testing cadence to risk levels
- Organizational Integration (20%) — Program design accounts for integration with existing security teams, development workflows, and compliance processes
- Executive Communication (20%) — Charter includes an executive summary and business case that would persuade a non-technical audience
Phased Approach
Phase 1: Program Foundation (2 hours)
Define the program mission and scope
Write a clear mission statement that articulates what the AI red team does, why it exists, and what it does not do. Define the scope: which AI systems are covered, what types of testing are performed, and how the program relates to the existing pen test team and SOC.
Design the organizational structure
Define the team structure: roles (lead, senior tester, junior tester, automation engineer), reporting relationships (reports to CISO? To VP of Security?), and collaboration model with other teams. Create a hiring plan that accounts for the 90-day launch timeline.
Define engagement types
Establish the types of engagements the team will perform: comprehensive assessments (full red team), focused assessments (single attack category), pre-deployment reviews (new system launch gates), continuous monitoring (CART), and incident support. Define when each type is appropriate.
Establish the risk-based prioritization framework
Create a framework for prioritizing which systems to test first and how often. Consider: regulatory risk (systems subject to compliance requirements), data sensitivity (systems processing PII, financial data, or health information), autonomy level (systems that take actions vs. systems that advise humans), and external exposure (customer-facing vs. internal).
Phase 2: Process Design (2 hours)
Create the engagement lifecycle
Document the end-to-end engagement process: intake (how do teams request assessments?), scoping (who defines scope and rules of engagement?), execution (testing methodology and standards), reporting (finding format, severity criteria, SLAs for remediation), and follow-up (retesting, verification, closure).
Design the pre-deployment gate
Define how AI security review integrates into the development lifecycle. What criteria trigger a mandatory assessment? What is the minimum assessment scope for a high-risk system? How does the team handle urgent launches that cannot wait for a full assessment?
Establish rules of engagement templates
Create standardized RoE templates for different engagement types. Include: scope definition checklists, authorized testing techniques, data handling requirements, escalation procedures (what happens when testing causes a production issue?), and communication protocols.
Define the finding lifecycle
Document how findings move from discovery to closure: triage, severity assignment, remediation SLA (critical: 7 days, high: 30 days, medium: 90 days), remediation verification, exception process (for findings that cannot be fixed), and risk acceptance (who can accept residual risk).
Phase 3: Metrics and Reporting (2 hours)
Define program KPIs
Establish measurable KPIs: coverage (percentage of AI systems assessed in the past 12 months), finding metrics (findings by severity, time to remediation, exception rate), efficiency (time per engagement, cost per finding), and maturity (types of assessments performed, depth of testing).
Design the executive dashboard
Create a template for quarterly executive reporting. Include: program status (systems assessed, findings identified, remediation progress), risk posture trends (is the organization getting more or less secure?), coverage gaps (which systems have not been assessed?), and resource utilization.
Design the technical reporting standard
Define the report format and quality standard for all engagements. Include: required sections, finding template, severity classification criteria, evidence requirements, and statistical reporting standards for AI-specific findings.
Phase 4: Operationalization (2 hours)
Build the 12-month roadmap
Plan the first year quarter by quarter: Q1 (hiring, tooling, first assessments on highest-risk systems), Q2 (expand coverage, establish CART for critical systems, first executive report), Q3 (pre-deployment gate integration, vendor AI assessments, team skill development), Q4 (full coverage of high-risk systems, program maturity assessment, year 2 planning).
Create the budget breakdown
Allocate the $1.2M budget across: headcount (salaries and benefits for 4 FTEs), tooling (commercial and open-source tools, cloud infrastructure for testing), external engagements (specialist consultants for niche assessments), training (team skill development, conferences, certifications), and operational costs (reporting tools, collaboration platforms).
Write the executive summary and business case
Write the opening section of the charter: why the program exists, what risks it mitigates, what the cost of inaction is (regulatory penalties, reputational damage, financial loss from AI exploitation), and what success looks like at the 12-month mark.
Address common objections
Anticipate and address executive objections: "Why can't the existing pen test team handle this?" "AI is changing too fast to have a fixed program." "How do we measure ROI on security testing?" Prepare clear, concise responses with supporting evidence.
Example Output
Example Program Mission Statement
## Mission
The Helios AI Red Team exists to proactively identify and help remediate
security vulnerabilities in AI systems before they can be exploited by
adversaries or cause harm to customers, employees, or the organization.
The team accomplishes this through:
- Structured security assessments of AI systems across their lifecycle
- Pre-deployment security reviews that serve as quality gates for new AI launches
- Continuous automated testing of critical AI systems in production
- Incident response support for AI-specific security events
- Security guidance and threat intelligence for AI development teams
The AI Red Team is NOT responsible for:
- General application security testing (handled by the AppSec team)
- Infrastructure and network penetration testing (handled by the Pen Test team)
- AI model performance, fairness, or bias testing (handled by the AI Ethics team)
- Day-to-day security monitoring (handled by the SOC)
The team collaborates closely with all of the above groups and serves as
the AI security subject matter expert for the broader security organization.Example Risk Prioritization Matrix
| System | Risk Level | Regulatory | Data Sensitivity | Autonomy | Exposure | Priority |
|--------|-----------|------------|-----------------|----------|----------|----------|
| Loan Approval AI | Critical | OCC, ECOA | PII, financial | Decision-making | Customer-facing | P1 |
| Trading Advisor | Critical | SEC, FINRA | Financial | Autonomous | Internal | P1 |
| Customer Chatbot | High | PCI-DSS | PII, payment | Advisory | Customer-facing | P2 |
| Fraud Detection | High | SOX | Transaction data | Flagging | Internal | P2 |
| Code Gen Tool | Medium | None | Source code | Advisory | Internal | P3 |
| Doc Processing | Medium | SOX | Financial docs | Processing | Vendor | P3 |Example Budget Breakdown
## Year 1 Budget: $1,200,000
### Headcount (68% — $816,000)
- AI Red Team Lead: $210,000 (fully loaded)
- Senior AI Security Tester (x2): $380,000 ($190k each)
- Automation/Tooling Engineer: $180,000
- Recruiting costs: $46,000
### Tooling (15% — $180,000)
- Commercial AI security tools: $80,000
- Cloud infrastructure for testing: $50,000
- Open-source tool hosting and maintenance: $20,000
- Reporting and collaboration platforms: $30,000
### External Engagements (10% — $120,000)
- Specialist consultant engagements (2x): $100,000
- Third-party AI vendor assessments: $20,000
### Training and Development (5% — $60,000)
- Conference attendance (team): $25,000
- Training courses and certifications: $20,000
- Research time and lab materials: $15,000
### Operational Reserve (2% — $24,000)
- Unplanned expenses and contingencyHints
Why should an AI red team program define a pre-deployment security gate rather than relying solely on periodic assessments of production systems?