Engagement Tracking & Project Management
Managing AI red team engagements with structured tracking tools, progress metrics, time management, and Kanban/Jira templates for professional red teaming.
Engagement Tracking & Project Management
An AI red team engagement is a project with a defined scope, timeline, and deliverables. Without structured tracking, work gets duplicated, attack surfaces are missed, and deadlines slip. This page covers the project management practices that keep engagements on track.
Engagement Phases and Time Allocation
Most engagements follow a standard phase structure. The time allocation below reflects typical proportions for a two-week engagement:
| Phase | Duration | Activities | Deliverables |
|---|---|---|---|
| Scoping & Planning | 10% (1 day) | Target analysis, rules of engagement, test plan | Scope document, test plan |
| Reconnaissance | 15% (1.5 days) | System prompt extraction, capability mapping, architecture analysis | Recon report, attack surface map |
| Active Testing | 50% (5 days) | Attack execution, evidence collection, finding documentation | Raw findings, evidence packages |
| Analysis & Reporting | 20% (2 days) | Severity assessment, report writing, reproduction verification | Draft report |
| Review & Delivery | 5% (0.5 days) | Internal QA, client presentation, knowledge transfer | Final report, presentation |
Kanban Board Structure
A red team Kanban board needs columns that reflect the evidence-driven workflow:
Columns
| Column | Purpose | Entry Criteria | Exit Criteria |
|---|---|---|---|
| Backlog | Attack vectors to investigate | Identified during recon or planning | Prioritized and assigned |
| In Progress | Currently being tested | Assigned to analyst, lab environment ready | Initial result obtained |
| Evidence Collection | Attack worked, collecting full evidence | Successful initial test | Complete evidence package with reproduction steps |
| Blocked | Waiting on access, information, or tools | Identified blocker documented | Blocker resolved |
| Ready for Review | Findings documented, needs peer review | Evidence package complete | Peer-reviewed and validated |
| Done | Finding complete and report-ready | Reviewed and approved | Included in report draft |
Card Template
Each card on the board should include:
title: "System prompt extraction via role-play"
attack_surface: "system_prompt"
category: "information_disclosure"
priority: "high"
assigned_to: "analyst_name"
estimated_hours: 2
actual_hours: null # filled on completion
status: "in_progress"
finding_id: null # assigned when confirmed
evidence_ids: []
notes: |
Hypothesis: Role-play scenarios may bypass
instruction-following guardrails.
blockers: []Attack Surface Coverage Tracking
Coverage Matrix
| Attack Surface | Planned Tests | Completed | Findings | Coverage |
|---|---|---|---|---|
| Direct prompt injection | 12 | 10 | 2 | 83% |
| Indirect prompt injection | 8 | 8 | 1 | 100% |
| System prompt extraction | 6 | 4 | 1 | 67% |
| Tool/function abuse | 10 | 3 | 0 | 30% |
| Safety bypass (jailbreak) | 15 | 12 | 3 | 80% |
| Data exfiltration | 5 | 5 | 0 | 100% |
| Multi-turn manipulation | 8 | 2 | 0 | 25% |
| Total | 64 | 44 | 7 | 69% |
Daily Stand-up Questions
For team engagements, daily stand-ups should address:
- What attack surfaces did you test yesterday?
- What did you find? (including negative results)
- What are you testing today?
- Are you blocked on anything?
- Any findings that need urgent escalation?
Time Tracking
Track time at the activity level, not just by day. This data informs future engagement scoping and pricing.
| Activity Category | Description | Example |
|---|---|---|
recon | Target analysis and information gathering | System prompt extraction attempts |
testing | Active attack execution | Running injection payloads |
evidence | Evidence collection and organization | Capturing API logs, screenshots |
analysis | Analyzing results and assessing severity | Determining exploitability and impact |
reporting | Writing report content | Drafting findings, executive summary |
admin | Project management, meetings, communication | Client check-ins, team coordination |
import json
import datetime
from pathlib import Path
class TimeTracker:
"""Simple time tracking for red team engagements."""
CATEGORIES = ["recon", "testing", "evidence", "analysis", "reporting", "admin"]
def __init__(self, engagement_id: str, analyst: str):
self.engagement_id = engagement_id
self.analyst = analyst
self.log_file = Path(f"./time-logs/{engagement_id}_{analyst}.jsonl")
self.log_file.parent.mkdir(parents=True, exist_ok=True)
self.current_entry = None
def start(self, category: str, description: str):
if category not in self.CATEGORIES:
raise ValueError(f"Category must be one of {self.CATEGORIES}")
if self.current_entry:
self.stop()
self.current_entry = {
"category": category,
"description": description,
"start": datetime.datetime.now(datetime.timezone.utc).isoformat(),
}
def stop(self):
if not self.current_entry:
return
self.current_entry["end"] = datetime.datetime.now(
datetime.timezone.utc
).isoformat()
with open(self.log_file, "a") as f:
f.write(json.dumps(self.current_entry) + "\n")
self.current_entry = None
def summary(self) -> dict:
totals = {cat: 0.0 for cat in self.CATEGORIES}
if not self.log_file.exists():
return totals
for line in self.log_file.read_text().strip().split("\n"):
entry = json.loads(line)
start = datetime.datetime.fromisoformat(entry["start"])
end = datetime.datetime.fromisoformat(entry["end"])
hours = (end - start).total_seconds() / 3600
totals[entry["category"]] += hours
return totalsStatus Reporting
Regular status updates maintain client confidence and surface blockers early.
Weekly Status Report Template
# Red Team Engagement Status Report
**Engagement:** ENG-2026-042 | **Week:** 1 of 2 | **Date:** 2026-03-13
## Summary
- Testing is 45% complete across 7 attack surface categories
- 4 findings identified (1 Critical, 1 High, 2 Medium)
- On track for draft report delivery on 2026-03-20
## Findings Summary
| ID | Severity | Category | Status |
|----|----------|----------|--------|
| F001 | Critical | Prompt Injection | Evidence complete |
| F002 | High | Data Exfiltration | Under review |
| F003 | Medium | Safety Bypass | Evidence collection |
| F004 | Medium | Info Disclosure | Evidence collection |
## Coverage
- Completed: Direct injection, indirect injection, data exfiltration
- In progress: Safety bypass, system prompt extraction
- Upcoming: Tool abuse, multi-turn manipulation
## Blockers
- Awaiting API credentials for the production endpoint (requested 2026-03-10)
## Next Week Plan
- Complete remaining attack surface testing
- Begin report drafting (target: Wednesday)
- Schedule findings review meetingRelated Topics
- Red Team Lab & Operations -- operational foundations
- Evidence Collection & Chain of Custody -- evidence standards that integrate with tracking
- Metrics, KPIs & Demonstrating ROI -- using engagement data for program metrics
Implementation Considerations
Architecture Patterns
When implementing systems that interact with LLMs, several architectural patterns affect the security posture of the overall application:
Gateway pattern: A dedicated API gateway sits between users and the LLM, handling authentication, rate limiting, input validation, and output filtering. This centralizes security controls but creates a single point of failure.
from dataclasses import dataclass
from typing import Optional
import time
@dataclass
class SecurityGateway:
"""Gateway pattern for securing LLM application access."""
input_classifier: object # ML-based input classifier
output_filter: object # Output content filter
rate_limiter: object # Rate limiting service
audit_logger: object # Audit trail logger
def process_request(self, user_id: str, message: str, session_id: str) -> dict:
"""Process a request through all security layers."""
request_id = self._generate_request_id()
# Layer 1: Rate limiting
if not self.rate_limiter.allow(user_id):
self.audit_logger.log(request_id, "rate_limited", user_id)
return {"error": "Rate limit exceeded", "retry_after": 60}
# Layer 2: Input classification
classification = self.input_classifier.classify(message)
if classification.is_adversarial:
self.audit_logger.log(
request_id, "input_blocked",
user_id, classification.category
)
return {"error": "Request could not be processed"}
# Layer 3: LLM processing
response = self._call_llm(message, session_id)
# Layer 4: Output filtering
filtered = self.output_filter.filter(response)
if filtered.was_modified:
self.audit_logger.log(
request_id, "output_filtered",
user_id, filtered.reason
)
# Layer 5: Audit logging
self.audit_logger.log(
request_id, "completed",
user_id, len(message), len(filtered.content)
)
return {"response": filtered.content}
def _generate_request_id(self) -> str:
import uuid
return str(uuid.uuid4())
def _call_llm(self, message: str, session_id: str) -> str:
# LLM API call implementation
passSidecar pattern: Security components run alongside the LLM as independent services, each responsible for a specific aspect of security. This provides better isolation and independent scaling but increases system complexity.
Mesh pattern: For multi-agent systems, each agent has its own security perimeter with authentication, authorization, and auditing. Inter-agent communication follows zero-trust principles.
Performance Implications
Security measures inevitably add latency and computational overhead. Understanding these trade-offs is essential for production deployments:
| Security Layer | Typical Latency | Computational Cost | Impact on UX |
|---|---|---|---|
| Keyword filter | <1ms | Negligible | None |
| Regex filter | 1-5ms | Low | None |
| ML classifier (small) | 10-50ms | Moderate | Minimal |
| ML classifier (large) | 50-200ms | High | Noticeable |
| LLM-as-judge | 500-2000ms | Very High | Significant |
| Full pipeline | 100-500ms | High | Moderate |
The recommended approach is to use fast, lightweight checks first (keyword and regex filters) to catch obvious attacks, followed by more expensive ML-based analysis only for inputs that pass the initial filters. This cascading approach provides good security with acceptable performance.
Monitoring and Observability
Effective security monitoring for LLM applications requires tracking metrics that capture adversarial behavior patterns:
from dataclasses import dataclass
from collections import defaultdict
import time
@dataclass
class SecurityMetrics:
"""Track security-relevant metrics for LLM applications."""
# Counters
total_requests: int = 0
blocked_requests: int = 0
filtered_outputs: int = 0
anomalous_sessions: int = 0
# Rate tracking
_request_times: list = None
_block_times: list = None
def __post_init__(self):
self._request_times = []
self._block_times = []
def record_request(self, was_blocked: bool = False, was_filtered: bool = False):
"""Record a request and its disposition."""
now = time.time()
self.total_requests += 1
self._request_times.append(now)
if was_blocked:
self.blocked_requests += 1
self._block_times.append(now)
if was_filtered:
self.filtered_outputs += 1
def get_block_rate(self, window_seconds: int = 300) -> float:
"""Calculate the block rate over a time window."""
now = time.time()
cutoff = now - window_seconds
recent_requests = sum(1 for t in self._request_times if t > cutoff)
recent_blocks = sum(1 for t in self._block_times if t > cutoff)
if recent_requests == 0:
return 0.0
return recent_blocks / recent_requests
def should_alert(self) -> bool:
"""Determine if current metrics warrant an alert."""
block_rate = self.get_block_rate()
# Alert if block rate exceeds threshold
if block_rate > 0.3: # >30% of requests blocked in last 5 min
return True
return FalseSecurity Testing in CI/CD
Integrating AI security testing into the development pipeline catches regressions before they reach production:
- Unit-level tests: Test individual security components (classifiers, filters) against known payloads
- Integration tests: Test the full security pipeline end-to-end
- Regression tests: Maintain a suite of previously-discovered attack payloads and verify they remain blocked
- Adversarial tests: Periodically run automated red team tools (Garak, Promptfoo) as part of the deployment pipeline
Emerging Trends
Current Research Directions
The field of LLM security is evolving rapidly. Key research directions that are likely to shape the landscape include:
-
Formal verification for LLM behavior: Researchers are exploring mathematical frameworks for proving properties about model behavior under adversarial conditions. While full formal verification of neural networks remains intractable, bounded verification of specific properties shows promise.
-
Adversarial training for LLM robustness: Beyond standard RLHF, researchers are developing training procedures that explicitly expose models to adversarial inputs during safety training, improving robustness against known attack patterns.
-
Interpretability-guided defense: Mechanistic interpretability research is enabling defenders to understand why specific attacks succeed at the neuron and circuit level, informing more targeted defensive measures.
-
Multi-agent security: As LLM agents become more prevalent, securing inter-agent communication and maintaining trust boundaries across agent systems is an active area of research with significant practical implications.
-
Automated red teaming at scale: Tools like NVIDIA's Garak, Microsoft's PyRIT, and the UK AISI's Inspect framework are enabling automated security testing at scales previously impossible, but the quality and coverage of automated testing remains an open challenge.
The integration of these research directions into production systems will define the next generation of AI security practices.
Implementation Considerations
Architecture Patterns
When implementing systems that interact with LLMs, several architectural patterns affect the security posture of the overall application:
Gateway pattern: A dedicated API gateway sits between users and the LLM, handling authentication, rate limiting, input validation, and output filtering. This centralizes security controls but creates a single point of failure.
from dataclasses import dataclass
from typing import Optional
import time
@dataclass
class SecurityGateway:
"""Gateway pattern for securing LLM application access."""
input_classifier: object # ML-based input classifier
output_filter: object # Output content filter
rate_limiter: object # Rate limiting service
audit_logger: object # Audit trail logger
def process_request(self, user_id: str, message: str, session_id: str) -> dict:
"""Process a request through all security layers."""
request_id = self._generate_request_id()
# Layer 1: Rate limiting
if not self.rate_limiter.allow(user_id):
self.audit_logger.log(request_id, "rate_limited", user_id)
return {"error": "Rate limit exceeded", "retry_after": 60}
# Layer 2: Input classification
classification = self.input_classifier.classify(message)
if classification.is_adversarial:
self.audit_logger.log(
request_id, "input_blocked",
user_id, classification.category
)
return {"error": "Request could not be processed"}
# Layer 3: LLM processing
response = self._call_llm(message, session_id)
# Layer 4: Output filtering
filtered = self.output_filter.filter(response)
if filtered.was_modified:
self.audit_logger.log(
request_id, "output_filtered",
user_id, filtered.reason
)
# Layer 5: Audit logging
self.audit_logger.log(
request_id, "completed",
user_id, len(message), len(filtered.content)
)
return {"response": filtered.content}
def _generate_request_id(self) -> str:
import uuid
return str(uuid.uuid4())
def _call_llm(self, message: str, session_id: str) -> str:
# LLM API call implementation
passSidecar pattern: Security components run alongside the LLM as independent services, each responsible for a specific aspect of security. This provides better isolation and independent scaling but increases system complexity.
Mesh pattern: For multi-agent systems, each agent has its own security perimeter with authentication, authorization, and auditing. Inter-agent communication follows zero-trust principles.
Performance Implications
Security measures inevitably add latency and computational overhead. Understanding these trade-offs is essential for production deployments:
| Security Layer | Typical Latency | Computational Cost | Impact on UX |
|---|---|---|---|
| Keyword filter | <1ms | Negligible | None |
| Regex filter | 1-5ms | Low | None |
| ML classifier (small) | 10-50ms | Moderate | Minimal |
| ML classifier (large) | 50-200ms | High | Noticeable |
| LLM-as-judge | 500-2000ms | Very High | Significant |
| Full pipeline | 100-500ms | High | Moderate |
The recommended approach is to use fast, lightweight checks first (keyword and regex filters) to catch obvious attacks, followed by more expensive ML-based analysis only for inputs that pass the initial filters. This cascading approach provides good security with acceptable performance.
Monitoring and Observability
Effective security monitoring for LLM applications requires tracking metrics that capture adversarial behavior patterns:
from dataclasses import dataclass
from collections import defaultdict
import time
@dataclass
class SecurityMetrics:
"""Track security-relevant metrics for LLM applications."""
# Counters
total_requests: int = 0
blocked_requests: int = 0
filtered_outputs: int = 0
anomalous_sessions: int = 0
# Rate tracking
_request_times: list = None
_block_times: list = None
def __post_init__(self):
self._request_times = []
self._block_times = []
def record_request(self, was_blocked: bool = False, was_filtered: bool = False):
"""Record a request and its disposition."""
now = time.time()
self.total_requests += 1
self._request_times.append(now)
if was_blocked:
self.blocked_requests += 1
self._block_times.append(now)
if was_filtered:
self.filtered_outputs += 1
def get_block_rate(self, window_seconds: int = 300) -> float:
"""Calculate the block rate over a time window."""
now = time.time()
cutoff = now - window_seconds
recent_requests = sum(1 for t in self._request_times if t > cutoff)
recent_blocks = sum(1 for t in self._block_times if t > cutoff)
if recent_requests == 0:
return 0.0
return recent_blocks / recent_requests
def should_alert(self) -> bool:
"""Determine if current metrics warrant an alert."""
block_rate = self.get_block_rate()
# Alert if block rate exceeds threshold
if block_rate > 0.3: # >30% of requests blocked in last 5 min
return True
return FalseSecurity Testing in CI/CD
Integrating AI security testing into the development pipeline catches regressions before they reach production:
- Unit-level tests: Test individual security components (classifiers, filters) against known payloads
- Integration tests: Test the full security pipeline end-to-end
- Regression tests: Maintain a suite of previously-discovered attack payloads and verify they remain blocked
- Adversarial tests: Periodically run automated red team tools (Garak, Promptfoo) as part of the deployment pipeline
Emerging Trends
Current Research Directions
The field of LLM security is evolving rapidly. Key research directions that are likely to shape the landscape include:
-
Formal verification for LLM behavior: Researchers are exploring mathematical frameworks for proving properties about model behavior under adversarial conditions. While full formal verification of neural networks remains intractable, bounded verification of specific properties shows promise.
-
Adversarial training for LLM robustness: Beyond standard RLHF, researchers are developing training procedures that explicitly expose models to adversarial inputs during safety training, improving robustness against known attack patterns.
-
Interpretability-guided defense: Mechanistic interpretability research is enabling defenders to understand why specific attacks succeed at the neuron and circuit level, informing more targeted defensive measures.
-
Multi-agent security: As LLM agents become more prevalent, securing inter-agent communication and maintaining trust boundaries across agent systems is an active area of research with significant practical implications.
-
Automated red teaming at scale: Tools like NVIDIA's Garak, Microsoft's PyRIT, and the UK AISI's Inspect framework are enabling automated security testing at scales previously impossible, but the quality and coverage of automated testing remains an open challenge.
The integration of these research directions into production systems will define the next generation of AI security practices.
References
- "PTES: Penetration Testing Execution Standard" - PTES Technical Guidelines (2024) - Engagement workflow standards adaptable to AI red teaming
- "Agile Project Management for Security Teams" - SANS Institute (2024) - Kanban and agile methodologies applied to security engagement tracking
- "NIST Cybersecurity Framework 2.0" - National Institute of Standards and Technology (2024) - Risk assessment and tracking frameworks that inform engagement scope management
What percentage of total engagement time should be allocated to analysis, reporting, and review?