Building an AI Red Team

intermediate11 min readUpdated 2026-03-15

Complete guide to building an AI red team from scratch: hiring strategy, skills matrix design, team structure, tooling stack selection, budget planning, and organizational integration.

team-building hiring skills tooling organization

Building an AI red team is not the same as expanding a traditional penetration testing team. The skill set is different, the tooling is different, and the organizational relationships are different. This page provides a comprehensive guide to standing up an AI red team, from the first hire to a fully operational program.

Organizational Models

Before hiring anyone, decide where the red team sits in the organization. This decision shapes everything else: reporting lines, budget, access to systems, and credibility with stakeholders.

Model 1: Security-Embedded

The AI red team is a subteam within the existing security organization (AppSec, offensive security, or security research).

Advantage	Disadvantage
Leverages existing security processes, tools, and culture	May lack ML expertise and credibility with ML teams
Natural integration with vulnerability management	Security leadership may not understand AI-specific risks
Access to existing offensive tooling and infrastructure	Risk of being treated as "just another pen test team"

Model 2: ML-Embedded

The AI red team sits within the ML engineering or AI research organization.

Advantage	Disadvantage
Deep access to models, training pipelines, and deployment systems	May lack offensive security mindset and methodology
Strong ML expertise and credibility with ML teams	Potential conflicts of interest (testing your own team's work)
Can influence ML architecture and design decisions early	Risk of findings being deprioritized internally

Model 3: Independent

The AI red team is a standalone function reporting to a CISO, CTO, or VP of Engineering.

Advantage	Disadvantage
Independence from both security and ML teams	Higher organizational overhead
Credibility as a neutral assessor	Requires building relationships from scratch
Can set its own priorities and methodology	May struggle to access systems and information

Hiring Strategy

Phase 1: The Foundation (1-3 People)

The first hires define the team's culture and capabilities. Prioritize versatility over deep specialization.

Hire 1: Technical Lead

The first hire should be someone who can operate independently, define methodology, and build credibility with both security and ML teams.

Requirement	Why
3+ years offensive security experience	Can conduct assessments independently
Working knowledge of ML/LLM architecture	Understands the attack surface
Strong communication skills	Will be the team's face to stakeholders
Self-directed and resourceful	No established playbook to follow

Hire 2: ML Security Specialist

The second hire should complement the first with deeper ML expertise.

Requirement	Why
ML engineering background	Deep understanding of model internals
Interest in adversarial ML	Can develop novel attack techniques
Comfort with ambiguity	The field is evolving rapidly

Hire 3: Tooling and Automation Engineer

The third hire should focus on building the infrastructure that makes the first two more productive.

Requirement	Why
DevOps/platform engineering experience	Can build and maintain lab infrastructure
Automation skills (Python, scripting)	Enables repeatable, scalable testing
Familiarity with ML infrastructure	Can configure model serving, training environments

Phase 2: Specialization (4-8 People)

Once the foundation is established, add specialists based on your organization's risk profile.

Specialist Role	When to Hire	Focus Areas
Prompt injection specialist	If deploying LLM-based products	Injection techniques, jailbreaking, safety bypass
Training pipeline security	If training custom models	Data poisoning, fine-tuning attacks, supply chain
Agent security specialist	If deploying agentic AI	Tool abuse, memory poisoning, multi-agent attacks
Multimodal specialist	If using vision/audio/video models	Adversarial media, cross-modal injection
Domain expert	For regulated industries	Healthcare, finance, legal-specific AI risks

Interview Process

Designing an interview process for AI red team candidates requires testing both security thinking and ML understanding.

Interview Structure:
 
1. Technical screen (60 min)
   - Security fundamentals: can they think like an attacker?
   - ML basics: do they understand how models work?
   - Problem solving: given a novel AI system, where would they look for vulnerabilities?
 
2. Practical exercise (take-home, 2-4 hours)
   - Provide access to a test LLM deployment
   - Task: find and document as many security issues as possible
   - Evaluate: methodology, creativity, documentation quality
 
3. Team interview (60 min)
   - Communication: can they explain findings to non-technical stakeholders?
   - Collaboration: how do they work with development teams?
   - Ethics: how do they handle sensitive findings?
 
4. Final discussion with leadership
   - Career goals alignment
   - Cultural fit
   - Expectations setting

Skills Matrix

Track team capabilities across the AI red team skills landscape. Use this matrix to identify gaps and plan development.

Core Skills

Skill Area	Level 0	Level 1	Level 2	Level 3
Prompt injection	Unfamiliar	Can execute known techniques	Can adapt techniques to new contexts	Can develop novel injection methods
Jailbreak/safety bypass	Unfamiliar	Can execute documented jailbreaks	Can combine techniques creatively	Can discover new bypass categories
Model architecture knowledge	Unfamiliar	Understands transformer basics	Can analyze architecture vulnerabilities	Can exploit architecture-specific weaknesses
Training pipeline attacks	Unfamiliar	Understands data poisoning concepts	Can execute fine-tuning attacks	Can design novel training attacks
Agent exploitation	Unfamiliar	Understands tool abuse concepts	Can chain tool exploits	Can discover novel agent attack patterns
Infrastructure security	Unfamiliar	Basic cloud security knowledge	Can assess ML infrastructure	Can find novel infrastructure-AI attack chains
Report writing	Unfamiliar	Can document findings clearly	Can write executive-ready reports	Can tailor reports to multiple audiences
Automation/tooling	Unfamiliar	Can use existing tools	Can extend and customize tools	Can build new assessment tools

Skills Development Plan

For each team member, create a development plan that addresses their lowest-rated skills:

Current Level	Development Approach	Timeline
Level 0 to 1	Self-study, online courses, shadowing	1-2 months
Level 1 to 2	Guided practice, internal projects, CTFs	3-6 months
Level 2 to 3	Research projects, conference talks, mentoring	6-12 months

Tooling Stack

A functional AI red team needs tooling across several categories.

Assessment Tools

Category	Tools	Purpose
Automated red teaming	Garak, PyRIT, Promptfoo	Systematic vulnerability scanning
Prompt testing	Custom scripts, Jupyter notebooks	Manual and semi-automated prompt testing
Model analysis	TransformerLens, nnsight	Interpretability and weight analysis (open models)
Traffic analysis	Burp Suite, mitmproxy	API traffic interception and manipulation

Infrastructure

Category	Tools	Purpose
Model hosting	Ollama, vLLM, text-generation-inference	Local model deployment for testing
Lab environment	Docker, Kubernetes	Isolated testing environments
GPU compute	Cloud GPU instances, local GPUs	Model inference and training attack execution
Version control	Git, with finding templates	Track assessments and findings

Reporting and Tracking

Category	Tools	Purpose
Finding management	Custom tracker, Jira, Linear	Track findings through remediation
Report generation	Templates, automated report builders	Consistent, professional deliverables
Knowledge base	Internal wiki, shared documentation	Capture techniques, past findings, methodology
Metrics dashboard	Grafana, custom dashboards	Track team performance and risk metrics

Minimum Viable Stack

For a team just starting, the minimum viable tooling stack is:

One automated scanning tool (Garak or PyRIT)
Local model hosting (Ollama for quick testing)
A structured finding template (Markdown or document template)
A shared knowledge base (wiki or shared folder)
GPU access (at least one machine with a capable GPU)

Budget Planning

Cost Categories

Category	Annual Cost (Typical)	Notes
Personnel (3-person team)	$450K-$750K	Varies significantly by market
Cloud compute (GPU instances)	$24K-$120K	Depends on assessment frequency and model sizes
API costs (model access)	$12K-$60K	Testing against commercial models
Tooling licenses	$5K-$20K	Most core tools are open source
Training and conferences	$10K-$30K	Essential for keeping skills current
Lab infrastructure	$10K-$50K	One-time hardware, ongoing cloud

ROI Justification

Frame the red team's value in terms of risk reduction:

Cost of a public AI safety incident (reputational, regulatory, financial)
Regulatory compliance requirements (EU AI Act, NIST AI RMF)
Customer trust and competitive differentiation
Early detection of vulnerabilities before deployment (shift-left savings)

Organizational Integration

Relationships to Build

Team	Relationship	Why
ML Engineering	Collaborative, access to systems	Need access to models, training pipelines, deployment configs
Product/Application teams	Client-like, assessment-based	Need to assess AI features before and after deployment
Security Operations	Peer, information-sharing	Need to integrate AI findings into security monitoring
Legal/Compliance	Advisory, risk communication	Need to translate findings into compliance language
Executive leadership	Reporting, risk communication	Need to communicate AI risk at the strategic level

Engagement Model

Define engagement types
Establish standard engagement types: new model assessment, pre-deployment review, periodic re-assessment, incident-triggered assessment. Define scope, duration, and deliverables for each type.
Create intake process
Build a lightweight intake process where teams can request AI security assessments. Include a questionnaire that captures the AI system's architecture, data sources, deployment model, and risk profile.
Establish SLAs
Set service-level agreements for assessment turnaround. Critical pre-deployment reviews may need 1-2 week turnaround. Periodic assessments may be scheduled quarterly.
Build feedback loops
After each engagement, collect feedback from the team being assessed. Use this to improve methodology, communication, and deliverables over time.
Report and track metrics
Track and report team metrics: assessments completed, findings by severity, time to remediation, stakeholder satisfaction. Use these metrics to demonstrate value and justify continued investment.

Common Pitfalls

Pitfall	Why It Happens	How to Avoid
Hiring only security people	Familiar to security leadership	Explicitly plan for ML expertise from the start
Tool-first approach	Tools feel like progress	Establish methodology first, tool second
Adversarial relationship with dev teams	Natural tension in offensive security	Frame findings as shared risk reduction, not blame
Scope creep	AI security touches everything	Define clear scope boundaries and engagement types
Ignoring domain context	Technical focus dominates	Include domain impact in every finding

Summary

Building an AI red team requires deliberate planning across organizational structure, hiring, skills development, tooling, and stakeholder relationships. Start small with versatile generalists, invest in both security and ML expertise, and build tooling incrementally based on actual assessment needs. The most common failure mode is building a team that excels at one discipline (security or ML) while neglecting the other. The intersection of these disciplines is where the most impactful AI vulnerabilities live, and that is where your team needs to be.

Edit this page on GitHub

Building an AI Red Team

intermediate11 min readUpdated 2026-03-15

Complete guide to building an AI red team from scratch: hiring strategy, skills matrix design, team structure, tooling stack selection, budget planning, and organizational integration.

team-building hiring skills tooling organization

Organizational Models

Before hiring anyone, decide where the red team sits in the organization. This decision shapes everything else: reporting lines, budget, access to systems, and credibility with stakeholders.

Model 1: Security-Embedded

The AI red team is a subteam within the existing security organization (AppSec, offensive security, or security research).

Advantage	Disadvantage
Leverages existing security processes, tools, and culture	May lack ML expertise and credibility with ML teams
Natural integration with vulnerability management	Security leadership may not understand AI-specific risks
Access to existing offensive tooling and infrastructure	Risk of being treated as "just another pen test team"

Model 2: ML-Embedded

The AI red team sits within the ML engineering or AI research organization.

Advantage	Disadvantage
Deep access to models, training pipelines, and deployment systems	May lack offensive security mindset and methodology
Strong ML expertise and credibility with ML teams	Potential conflicts of interest (testing your own team's work)
Can influence ML architecture and design decisions early	Risk of findings being deprioritized internally

Model 3: Independent

The AI red team is a standalone function reporting to a CISO, CTO, or VP of Engineering.

Advantage	Disadvantage
Independence from both security and ML teams	Higher organizational overhead
Credibility as a neutral assessor	Requires building relationships from scratch
Can set its own priorities and methodology	May struggle to access systems and information

Hiring Strategy

Phase 1: The Foundation (1-3 People)

The first hires define the team's culture and capabilities. Prioritize versatility over deep specialization.

Hire 1: Technical Lead

The first hire should be someone who can operate independently, define methodology, and build credibility with both security and ML teams.

Requirement	Why
3+ years offensive security experience	Can conduct assessments independently
Working knowledge of ML/LLM architecture	Understands the attack surface
Strong communication skills	Will be the team's face to stakeholders
Self-directed and resourceful	No established playbook to follow

Hire 2: ML Security Specialist

The second hire should complement the first with deeper ML expertise.

Requirement	Why
ML engineering background	Deep understanding of model internals
Interest in adversarial ML	Can develop novel attack techniques
Comfort with ambiguity	The field is evolving rapidly

Hire 3: Tooling and Automation Engineer

The third hire should focus on building the infrastructure that makes the first two more productive.

Requirement	Why
DevOps/platform engineering experience	Can build and maintain lab infrastructure
Automation skills (Python, scripting)	Enables repeatable, scalable testing
Familiarity with ML infrastructure	Can configure model serving, training environments

Phase 2: Specialization (4-8 People)

Once the foundation is established, add specialists based on your organization's risk profile.

Specialist Role	When to Hire	Focus Areas
Prompt injection specialist	If deploying LLM-based products	Injection techniques, jailbreaking, safety bypass
Training pipeline security	If training custom models	Data poisoning, fine-tuning attacks, supply chain
Agent security specialist	If deploying agentic AI	Tool abuse, memory poisoning, multi-agent attacks
Multimodal specialist	If using vision/audio/video models	Adversarial media, cross-modal injection
Domain expert	For regulated industries	Healthcare, finance, legal-specific AI risks

Interview Process

Designing an interview process for AI red team candidates requires testing both security thinking and ML understanding.

Interview Structure:
 
1. Technical screen (60 min)
   - Security fundamentals: can they think like an attacker?
   - ML basics: do they understand how models work?
   - Problem solving: given a novel AI system, where would they look for vulnerabilities?
 
2. Practical exercise (take-home, 2-4 hours)
   - Provide access to a test LLM deployment
   - Task: find and document as many security issues as possible
   - Evaluate: methodology, creativity, documentation quality
 
3. Team interview (60 min)
   - Communication: can they explain findings to non-technical stakeholders?
   - Collaboration: how do they work with development teams?
   - Ethics: how do they handle sensitive findings?
 
4. Final discussion with leadership
   - Career goals alignment
   - Cultural fit
   - Expectations setting

Skills Matrix

Track team capabilities across the AI red team skills landscape. Use this matrix to identify gaps and plan development.

Core Skills

Skill Area	Level 0	Level 1	Level 2	Level 3
Prompt injection	Unfamiliar	Can execute known techniques	Can adapt techniques to new contexts	Can develop novel injection methods
Jailbreak/safety bypass	Unfamiliar	Can execute documented jailbreaks	Can combine techniques creatively	Can discover new bypass categories
Model architecture knowledge	Unfamiliar	Understands transformer basics	Can analyze architecture vulnerabilities	Can exploit architecture-specific weaknesses
Training pipeline attacks	Unfamiliar	Understands data poisoning concepts	Can execute fine-tuning attacks	Can design novel training attacks
Agent exploitation	Unfamiliar	Understands tool abuse concepts	Can chain tool exploits	Can discover novel agent attack patterns
Infrastructure security	Unfamiliar	Basic cloud security knowledge	Can assess ML infrastructure	Can find novel infrastructure-AI attack chains
Report writing	Unfamiliar	Can document findings clearly	Can write executive-ready reports	Can tailor reports to multiple audiences
Automation/tooling	Unfamiliar	Can use existing tools	Can extend and customize tools	Can build new assessment tools

Skills Development Plan

For each team member, create a development plan that addresses their lowest-rated skills:

Current Level	Development Approach	Timeline
Level 0 to 1	Self-study, online courses, shadowing	1-2 months
Level 1 to 2	Guided practice, internal projects, CTFs	3-6 months
Level 2 to 3	Research projects, conference talks, mentoring	6-12 months

Tooling Stack

A functional AI red team needs tooling across several categories.

Assessment Tools

Category	Tools	Purpose
Automated red teaming	Garak, PyRIT, Promptfoo	Systematic vulnerability scanning
Prompt testing	Custom scripts, Jupyter notebooks	Manual and semi-automated prompt testing
Model analysis	TransformerLens, nnsight	Interpretability and weight analysis (open models)
Traffic analysis	Burp Suite, mitmproxy	API traffic interception and manipulation

Infrastructure

Category	Tools	Purpose
Model hosting	Ollama, vLLM, text-generation-inference	Local model deployment for testing
Lab environment	Docker, Kubernetes	Isolated testing environments
GPU compute	Cloud GPU instances, local GPUs	Model inference and training attack execution
Version control	Git, with finding templates	Track assessments and findings

Reporting and Tracking

Category	Tools	Purpose
Finding management	Custom tracker, Jira, Linear	Track findings through remediation
Report generation	Templates, automated report builders	Consistent, professional deliverables
Knowledge base	Internal wiki, shared documentation	Capture techniques, past findings, methodology
Metrics dashboard	Grafana, custom dashboards	Track team performance and risk metrics

Minimum Viable Stack

For a team just starting, the minimum viable tooling stack is:

One automated scanning tool (Garak or PyRIT)
Local model hosting (Ollama for quick testing)
A structured finding template (Markdown or document template)
A shared knowledge base (wiki or shared folder)
GPU access (at least one machine with a capable GPU)

Budget Planning

Cost Categories

Category	Annual Cost (Typical)	Notes
Personnel (3-person team)	$450K-$750K	Varies significantly by market
Cloud compute (GPU instances)	$24K-$120K	Depends on assessment frequency and model sizes
API costs (model access)	$12K-$60K	Testing against commercial models
Tooling licenses	$5K-$20K	Most core tools are open source
Training and conferences	$10K-$30K	Essential for keeping skills current
Lab infrastructure	$10K-$50K	One-time hardware, ongoing cloud

ROI Justification

Frame the red team's value in terms of risk reduction:

Cost of a public AI safety incident (reputational, regulatory, financial)
Regulatory compliance requirements (EU AI Act, NIST AI RMF)
Customer trust and competitive differentiation
Early detection of vulnerabilities before deployment (shift-left savings)

Organizational Integration

Relationships to Build

Team	Relationship	Why
ML Engineering	Collaborative, access to systems	Need access to models, training pipelines, deployment configs
Product/Application teams	Client-like, assessment-based	Need to assess AI features before and after deployment
Security Operations	Peer, information-sharing	Need to integrate AI findings into security monitoring
Legal/Compliance	Advisory, risk communication	Need to translate findings into compliance language
Executive leadership	Reporting, risk communication	Need to communicate AI risk at the strategic level

Engagement Model

Define engagement types
Establish standard engagement types: new model assessment, pre-deployment review, periodic re-assessment, incident-triggered assessment. Define scope, duration, and deliverables for each type.
Create intake process
Build a lightweight intake process where teams can request AI security assessments. Include a questionnaire that captures the AI system's architecture, data sources, deployment model, and risk profile.
Establish SLAs
Set service-level agreements for assessment turnaround. Critical pre-deployment reviews may need 1-2 week turnaround. Periodic assessments may be scheduled quarterly.
Build feedback loops
After each engagement, collect feedback from the team being assessed. Use this to improve methodology, communication, and deliverables over time.
Report and track metrics
Track and report team metrics: assessments completed, findings by severity, time to remediation, stakeholder satisfaction. Use these metrics to demonstrate value and justify continued investment.

Common Pitfalls

Pitfall	Why It Happens	How to Avoid
Hiring only security people	Familiar to security leadership	Explicitly plan for ML expertise from the start
Tool-first approach	Tools feel like progress	Establish methodology first, tool second
Adversarial relationship with dev teams	Natural tension in offensive security	Frame findings as shared risk reduction, not blame
Scope creep	AI security touches everything	Define clear scope boundaries and engagement types
Ignoring domain context	Technical focus dominates	Include domain impact in every finding

Summary

Edit this page on GitHub

Building an AI Red Team

Define engagement types

Create intake process

Establish SLAs

Build feedback loops

Report and track metrics

Related articles

Building an AI Red Team

Define engagement types

Create intake process

Establish SLAs

Build feedback loops

Report and track metrics

Related articles