Building an AI Red Team
Complete guide to building an AI red team from scratch: hiring strategy, skills matrix design, team structure, tooling stack selection, budget planning, and organizational integration.
Building an AI red team is not the same as expanding a traditional penetration testing team. The skill set is different, the tooling is different, and the organizational relationships are different. This page provides a comprehensive guide to standing up an AI red team, from the first hire to a fully operational program.
Organizational Models
Before hiring anyone, decide where the red team sits in the organization. This decision shapes everything else: reporting lines, budget, access to systems, and credibility with stakeholders.
Model 1: Security-Embedded
The AI red team is a subteam within the existing security organization (AppSec, offensive security, or security research).
| Advantage | Disadvantage |
|---|---|
| Leverages existing security processes, tools, and culture | May lack ML expertise and credibility with ML teams |
| Natural integration with vulnerability management | Security leadership may not understand AI-specific risks |
| Access to existing offensive tooling and infrastructure | Risk of being treated as "just another pen test team" |
Model 2: ML-Embedded
The AI red team sits within the ML engineering or AI research organization.
| Advantage | Disadvantage |
|---|---|
| Deep access to models, training pipelines, and deployment systems | May lack offensive security mindset and methodology |
| Strong ML expertise and credibility with ML teams | Potential conflicts of interest (testing your own team's work) |
| Can influence ML architecture and design decisions early | Risk of findings being deprioritized internally |
Model 3: Independent
The AI red team is a standalone function reporting to a CISO, CTO, or VP of Engineering.
| Advantage | Disadvantage |
|---|---|
| Independence from both security and ML teams | Higher organizational overhead |
| Credibility as a neutral assessor | Requires building relationships from scratch |
| Can set its own priorities and methodology | May struggle to access systems and information |
Hiring Strategy
Phase 1: The Foundation (1-3 People)
The first hires define the team's culture and capabilities. Prioritize versatility over deep specialization.
Hire 1: Technical Lead
The first hire should be someone who can operate independently, define methodology, and build credibility with both security and ML teams.
| Requirement | Why |
|---|---|
| 3+ years offensive security experience | Can conduct assessments independently |
| Working knowledge of ML/LLM architecture | Understands the attack surface |
| Strong communication skills | Will be the team's face to stakeholders |
| Self-directed and resourceful | No established playbook to follow |
Hire 2: ML Security Specialist
The second hire should complement the first with deeper ML expertise.
| Requirement | Why |
|---|---|
| ML engineering background | Deep understanding of model internals |
| Interest in adversarial ML | Can develop novel attack techniques |
| Comfort with ambiguity | The field is evolving rapidly |
Hire 3: Tooling and Automation Engineer
The third hire should focus on building the infrastructure that makes the first two more productive.
| Requirement | Why |
|---|---|
| DevOps/platform engineering experience | Can build and maintain lab infrastructure |
| Automation skills (Python, scripting) | Enables repeatable, scalable testing |
| Familiarity with ML infrastructure | Can configure model serving, training environments |
Phase 2: Specialization (4-8 People)
Once the foundation is established, add specialists based on your organization's risk profile.
| Specialist Role | When to Hire | Focus Areas |
|---|---|---|
| Prompt injection specialist | If deploying LLM-based products | Injection techniques, jailbreaking, safety bypass |
| Training pipeline security | If training custom models | Data poisoning, fine-tuning attacks, supply chain |
| Agent security specialist | If deploying agentic AI | Tool abuse, memory poisoning, multi-agent attacks |
| Multimodal specialist | If using vision/audio/video models | Adversarial media, cross-modal injection |
| Domain expert | For regulated industries | Healthcare, finance, legal-specific AI risks |
Interview Process
Designing an interview process for AI red team candidates requires testing both security thinking and ML understanding.
Interview Structure:
1. Technical screen (60 min)
- Security fundamentals: can they think like an attacker?
- ML basics: do they understand how models work?
- Problem solving: given a novel AI system, where would they look for vulnerabilities?
2. Practical exercise (take-home, 2-4 hours)
- Provide access to a test LLM deployment
- Task: find and document as many security issues as possible
- Evaluate: methodology, creativity, documentation quality
3. Team interview (60 min)
- Communication: can they explain findings to non-technical stakeholders?
- Collaboration: how do they work with development teams?
- Ethics: how do they handle sensitive findings?
4. Final discussion with leadership
- Career goals alignment
- Cultural fit
- Expectations settingSkills Matrix
Track team capabilities across the AI red team skills landscape. Use this matrix to identify gaps and plan development.
Core Skills
| Skill Area | Level 0 | Level 1 | Level 2 | Level 3 |
|---|---|---|---|---|
| Prompt injection | Unfamiliar | Can execute known techniques | Can adapt techniques to new contexts | Can develop novel injection methods |
| Jailbreak/safety bypass | Unfamiliar | Can execute documented jailbreaks | Can combine techniques creatively | Can discover new bypass categories |
| Model architecture knowledge | Unfamiliar | Understands transformer basics | Can analyze architecture vulnerabilities | Can exploit architecture-specific weaknesses |
| Training pipeline attacks | Unfamiliar | Understands data poisoning concepts | Can execute fine-tuning attacks | Can design novel training attacks |
| Agent exploitation | Unfamiliar | Understands tool abuse concepts | Can chain tool exploits | Can discover novel agent attack patterns |
| Infrastructure security | Unfamiliar | Basic cloud security knowledge | Can assess ML infrastructure | Can find novel infrastructure-AI attack chains |
| Report writing | Unfamiliar | Can document findings clearly | Can write executive-ready reports | Can tailor reports to multiple audiences |
| Automation/tooling | Unfamiliar | Can use existing tools | Can extend and customize tools | Can build new assessment tools |
Skills Development Plan
For each team member, create a development plan that addresses their lowest-rated skills:
| Current Level | Development Approach | Timeline |
|---|---|---|
| Level 0 to 1 | Self-study, online courses, shadowing | 1-2 months |
| Level 1 to 2 | Guided practice, internal projects, CTFs | 3-6 months |
| Level 2 to 3 | Research projects, conference talks, mentoring | 6-12 months |
Tooling Stack
A functional AI red team needs tooling across several categories.
Assessment Tools
| Category | Tools | Purpose |
|---|---|---|
| Automated red teaming | Garak, PyRIT, Promptfoo | Systematic vulnerability scanning |
| Prompt testing | Custom scripts, Jupyter notebooks | Manual and semi-automated prompt testing |
| Model analysis | TransformerLens, nnsight | Interpretability and weight analysis (open models) |
| Traffic analysis | Burp Suite, mitmproxy | API traffic interception and manipulation |
Infrastructure
| Category | Tools | Purpose |
|---|---|---|
| Model hosting | Ollama, vLLM, text-generation-inference | Local model deployment for testing |
| Lab environment | Docker, Kubernetes | Isolated testing environments |
| GPU compute | Cloud GPU instances, local GPUs | Model inference and training attack execution |
| Version control | Git, with finding templates | Track assessments and findings |
Reporting and Tracking
| Category | Tools | Purpose |
|---|---|---|
| Finding management | Custom tracker, Jira, Linear | Track findings through remediation |
| Report generation | Templates, automated report builders | Consistent, professional deliverables |
| Knowledge base | Internal wiki, shared documentation | Capture techniques, past findings, methodology |
| Metrics dashboard | Grafana, custom dashboards | Track team performance and risk metrics |
Minimum Viable Stack
For a team just starting, the minimum viable tooling stack is:
- One automated scanning tool (Garak or PyRIT)
- Local model hosting (Ollama for quick testing)
- A structured finding template (Markdown or document template)
- A shared knowledge base (wiki or shared folder)
- GPU access (at least one machine with a capable GPU)
Budget Planning
Cost Categories
| Category | Annual Cost (Typical) | Notes |
|---|---|---|
| Personnel (3-person team) | $450K-$750K | Varies significantly by market |
| Cloud compute (GPU instances) | $24K-$120K | Depends on assessment frequency and model sizes |
| API costs (model access) | $12K-$60K | Testing against commercial models |
| Tooling licenses | $5K-$20K | Most core tools are open source |
| Training and conferences | $10K-$30K | Essential for keeping skills current |
| Lab infrastructure | $10K-$50K | One-time hardware, ongoing cloud |
ROI Justification
Frame the red team's value in terms of risk reduction:
- Cost of a public AI safety incident (reputational, regulatory, financial)
- Regulatory compliance requirements (EU AI Act, NIST AI RMF)
- Customer trust and competitive differentiation
- Early detection of vulnerabilities before deployment (shift-left savings)
Organizational Integration
Relationships to Build
| Team | Relationship | Why |
|---|---|---|
| ML Engineering | Collaborative, access to systems | Need access to models, training pipelines, deployment configs |
| Product/Application teams | Client-like, assessment-based | Need to assess AI features before and after deployment |
| Security Operations | Peer, information-sharing | Need to integrate AI findings into security monitoring |
| Legal/Compliance | Advisory, risk communication | Need to translate findings into compliance language |
| Executive leadership | Reporting, risk communication | Need to communicate AI risk at the strategic level |
Engagement Model
Define engagement types
Establish standard engagement types: new model assessment, pre-deployment review, periodic re-assessment, incident-triggered assessment. Define scope, duration, and deliverables for each type.
Create intake process
Build a lightweight intake process where teams can request AI security assessments. Include a questionnaire that captures the AI system's architecture, data sources, deployment model, and risk profile.
Establish SLAs
Set service-level agreements for assessment turnaround. Critical pre-deployment reviews may need 1-2 week turnaround. Periodic assessments may be scheduled quarterly.
Build feedback loops
After each engagement, collect feedback from the team being assessed. Use this to improve methodology, communication, and deliverables over time.
Report and track metrics
Track and report team metrics: assessments completed, findings by severity, time to remediation, stakeholder satisfaction. Use these metrics to demonstrate value and justify continued investment.
Common Pitfalls
| Pitfall | Why It Happens | How to Avoid |
|---|---|---|
| Hiring only security people | Familiar to security leadership | Explicitly plan for ML expertise from the start |
| Tool-first approach | Tools feel like progress | Establish methodology first, tool second |
| Adversarial relationship with dev teams | Natural tension in offensive security | Frame findings as shared risk reduction, not blame |
| Scope creep | AI security touches everything | Define clear scope boundaries and engagement types |
| Ignoring domain context | Technical focus dominates | Include domain impact in every finding |
Summary
Building an AI red team requires deliberate planning across organizational structure, hiring, skills development, tooling, and stakeholder relationships. Start small with versatile generalists, invest in both security and ML expertise, and build tooling incrementally based on actual assessment needs. The most common failure mode is building a team that excels at one discipline (security or ML) while neglecting the other. The intersection of these disciplines is where the most impactful AI vulnerabilities live, and that is where your team needs to be.