Engagement Tracking & Project Management
Managing AI red team engagements with structured tracking tools, progress metrics, time management, and Kanban/Jira templates for professional red teaming.
Engagement Tracking & Project Management
An AI 紅隊 engagement is a project with a defined scope, timeline, and deliverables. Without structured tracking, work gets duplicated, attack surfaces are missed, and deadlines slip. This page covers the project management practices that keep engagements on track.
Engagement Phases and Time Allocation
Most engagements follow a standard phase structure. The time allocation below reflects typical proportions for a two-week engagement:
| Phase | Duration | Activities | Deliverables |
|---|---|---|---|
| Scoping & Planning | 10% (1 day) | Target analysis, rules of engagement, 測試 plan | Scope document, 測試 plan |
| Reconnaissance | 15% (1.5 days) | 系統提示詞 extraction, capability mapping, architecture analysis | Recon report, 攻擊面 map |
| Active 測試 | 50% (5 days) | 攻擊 execution, evidence collection, finding documentation | Raw findings, evidence packages |
| Analysis & Reporting | 20% (2 days) | Severity 評估, report writing, reproduction verification | Draft report |
| Review & Delivery | 5% (0.5 days) | Internal QA, client presentation, knowledge transfer | Final report, presentation |
Kanban Board Structure
A 紅隊 Kanban board needs columns that reflect the evidence-driven workflow:
Columns
| Column | Purpose | Entry Criteria | Exit Criteria |
|---|---|---|---|
| Backlog | 攻擊 vectors to investigate | Identified during recon or planning | Prioritized and assigned |
| In Progress | Currently being tested | Assigned to analyst, lab environment ready | Initial result obtained |
| Evidence Collection | 攻擊 worked, collecting full evidence | Successful initial 測試 | Complete evidence package with reproduction steps |
| Blocked | Waiting on access, information, or tools | Identified blocker documented | Blocker resolved |
| Ready for Review | Findings documented, needs peer review | Evidence package complete | Peer-reviewed and validated |
| Done | Finding complete and report-ready | Reviewed and approved | Included in report draft |
Card Template
Each card on the board should include:
title: "系統提示詞 extraction via role-play"
attack_surface: "system_prompt"
category: "information_disclosure"
priority: "high"
assigned_to: "analyst_name"
estimated_hours: 2
actual_hours: null # filled on completion
status: "in_progress"
finding_id: null # assigned when confirmed
evidence_ids: []
notes: |
Hypothesis: Role-play scenarios may bypass
instruction-following 護欄.
blockers: []攻擊 Surface Coverage Tracking
Coverage Matrix
| 攻擊 Surface | Planned Tests | Completed | Findings | Coverage |
|---|---|---|---|---|
| Direct 提示詞注入 | 12 | 10 | 2 | 83% |
| Indirect 提示詞注入 | 8 | 8 | 1 | 100% |
| 系統提示詞 extraction | 6 | 4 | 1 | 67% |
| Tool/function abuse | 10 | 3 | 0 | 30% |
| 安全 bypass (越獄) | 15 | 12 | 3 | 80% |
| Data exfiltration | 5 | 5 | 0 | 100% |
| Multi-turn manipulation | 8 | 2 | 0 | 25% |
| Total | 64 | 44 | 7 | 69% |
Daily Stand-up Questions
For team engagements, daily stand-ups should address:
- What attack surfaces did you 測試 yesterday?
- What did you find? (including negative results)
- What are you 測試 today?
- Are you blocked on anything?
- Any findings that need urgent escalation?
Time Tracking
Track time at the activity level, not just by day. This data informs future engagement scoping and pricing.
| Activity Category | Description | 範例 |
|---|---|---|
recon | Target analysis and information gathering | 系統提示詞 extraction attempts |
測試 | Active attack execution | Running injection payloads |
evidence | Evidence collection and organization | Capturing API logs, screenshots |
analysis | Analyzing results and assessing severity | Determining exploitability and impact |
reporting | Writing report content | Drafting findings, executive summary |
admin | Project management, meetings, communication | Client check-ins, team coordination |
import json
import datetime
from pathlib import Path
class TimeTracker:
"""Simple time tracking for 紅隊 engagements."""
CATEGORIES = ["recon", "測試", "evidence", "analysis", "reporting", "admin"]
def __init__(self, engagement_id: str, analyst: str):
self.engagement_id = engagement_id
self.analyst = analyst
self.log_file = Path(f"./time-logs/{engagement_id}_{analyst}.jsonl")
self.log_file.parent.mkdir(parents=True, exist_ok=True)
self.current_entry = None
def start(self, category: str, description: str):
if category not in self.CATEGORIES:
raise ValueError(f"Category must be one of {self.CATEGORIES}")
if self.current_entry:
self.stop()
self.current_entry = {
"category": category,
"description": description,
"start": datetime.datetime.now(datetime.timezone.utc).isoformat(),
}
def stop(self):
if not self.current_entry:
return
self.current_entry["end"] = datetime.datetime.now(
datetime.timezone.utc
).isoformat()
with open(self.log_file, "a") as f:
f.write(json.dumps(self.current_entry) + "\n")
self.current_entry = None
def summary(self) -> dict:
totals = {cat: 0.0 for cat in self.CATEGORIES}
if not self.log_file.exists():
return totals
for line in self.log_file.read_text().strip().split("\n"):
entry = json.loads(line)
start = datetime.datetime.fromisoformat(entry["start"])
end = datetime.datetime.fromisoformat(entry["end"])
hours = (end - start).total_seconds() / 3600
totals[entry["category"]] += hours
return totalsStatus Reporting
Regular status updates maintain client confidence and surface blockers early.
Weekly Status Report Template
# 紅隊 Engagement Status Report
**Engagement:** ENG-2026-042 | **Week:** 1 of 2 | **Date:** 2026-03-13
## 總結
- 測試 is 45% complete across 7 攻擊面 categories
- 4 findings identified (1 Critical, 1 High, 2 Medium)
- On track for draft report delivery on 2026-03-20
## Findings 總結
| ID | Severity | Category | Status |
|----|----------|----------|--------|
| F001 | Critical | 提示詞注入 | Evidence complete |
| F002 | High | Data Exfiltration | Under review |
| F003 | Medium | 安全 Bypass | Evidence collection |
| F004 | Medium | Info Disclosure | Evidence collection |
## Coverage
- Completed: Direct injection, indirect injection, data exfiltration
- In progress: 安全 bypass, 系統提示詞 extraction
- Upcoming: Tool abuse, multi-turn manipulation
## Blockers
- Awaiting API credentials for the production endpoint (requested 2026-03-10)
## Next Week Plan
- Complete remaining 攻擊面 測試
- Begin report drafting (target: Wednesday)
- Schedule findings review meeting相關主題
- 紅隊 Lab & Operations -- operational foundations
- Evidence Collection & Chain of Custody -- evidence standards that integrate with tracking
- Metrics, KPIs & Demonstrating ROI -- using engagement data for program metrics
實作 Considerations
Architecture Patterns
When 實作 systems that interact with LLMs, several architectural patterns affect the 安全 posture of the overall application:
Gateway pattern: A dedicated API gateway sits between users and the LLM, handling 認證, rate limiting, 輸入 validation, and 輸出 filtering. This centralizes 安全 controls but creates a single point of failure.
from dataclasses import dataclass
from typing import Optional
import time
@dataclass
class SecurityGateway:
"""Gateway pattern for securing LLM application access."""
input_classifier: object # ML-based 輸入 classifier
output_filter: object # 輸出 content filter
rate_limiter: object # Rate limiting service
audit_logger: object # Audit trail logger
def process_request(self, user_id: str, message: str, session_id: str) -> dict:
"""Process a request through all 安全 layers."""
request_id = self._generate_request_id()
# Layer 1: Rate limiting
if not self.rate_limiter.allow(user_id):
self.audit_logger.log(request_id, "rate_limited", user_id)
return {"error": "Rate limit exceeded", "retry_after": 60}
# Layer 2: 輸入 classification
classification = self.input_classifier.classify(message)
if classification.is_adversarial:
self.audit_logger.log(
request_id, "input_blocked",
user_id, classification.category
)
return {"error": "Request could not be processed"}
# Layer 3: LLM processing
response = self._call_llm(message, session_id)
# Layer 4: 輸出 filtering
filtered = self.output_filter.filter(response)
if filtered.was_modified:
self.audit_logger.log(
request_id, "output_filtered",
user_id, filtered.reason
)
# Layer 5: Audit logging
self.audit_logger.log(
request_id, "completed",
user_id, len(message), len(filtered.content)
)
return {"response": filtered.content}
def _generate_request_id(self) -> str:
import uuid
return str(uuid.uuid4())
def _call_llm(self, message: str, session_id: str) -> str:
# LLM API call 實作
passSidecar pattern: 安全 components run alongside the LLM as independent services, each responsible for a specific aspect of 安全. This provides better isolation and independent scaling but increases system complexity.
Mesh pattern: For multi-代理 systems, each 代理 has its own 安全 perimeter with 認證, 授權, and auditing. Inter-代理 communication follows zero-trust principles.
Performance Implications
安全 measures inevitably add latency and computational overhead. 理解 these trade-offs is essential for production deployments:
| 安全 Layer | Typical Latency | Computational Cost | Impact on UX |
|---|---|---|---|
| Keyword filter | <1ms | Negligible | None |
| Regex filter | 1-5ms | Low | None |
| ML classifier (small) | 10-50ms | Moderate | Minimal |
| ML classifier (large) | 50-200ms | High | Noticeable |
| LLM-as-judge | 500-2000ms | Very High | Significant |
| Full pipeline | 100-500ms | High | Moderate |
The recommended approach is to use fast, lightweight checks first (keyword and regex filters) to catch obvious attacks, followed by more expensive ML-based analysis only for inputs that pass the initial filters. This cascading approach provides good 安全 with acceptable performance.
監控 and Observability
Effective 安全 監控 for LLM applications requires tracking metrics that capture 對抗性 behavior patterns:
from dataclasses import dataclass
from collections import defaultdict
import time
@dataclass
class SecurityMetrics:
"""Track 安全-relevant metrics for LLM applications."""
# Counters
total_requests: int = 0
blocked_requests: int = 0
filtered_outputs: int = 0
anomalous_sessions: int = 0
# Rate tracking
_request_times: list = None
_block_times: list = None
def __post_init__(self):
self._request_times = []
self._block_times = []
def record_request(self, was_blocked: bool = False, was_filtered: bool = False):
"""Record a request and its disposition."""
now = time.time()
self.total_requests += 1
self._request_times.append(now)
if was_blocked:
self.blocked_requests += 1
self._block_times.append(now)
if was_filtered:
self.filtered_outputs += 1
def get_block_rate(self, window_seconds: int = 300) -> float:
"""Calculate the block rate over a time window."""
now = time.time()
cutoff = now - window_seconds
recent_requests = sum(1 for t in self._request_times if t > cutoff)
recent_blocks = sum(1 for t in self._block_times if t > cutoff)
if recent_requests == 0:
return 0.0
return recent_blocks / recent_requests
def should_alert(self) -> bool:
"""Determine if current metrics warrant an alert."""
block_rate = self.get_block_rate()
# Alert if block rate exceeds threshold
if block_rate > 0.3: # >30% of requests blocked in last 5 min
return True
return False安全 測試 in CI/CD
Integrating AI 安全 測試 into the development pipeline catches regressions before they reach production:
- Unit-level tests: 測試 individual 安全 components (classifiers, filters) against known payloads
- Integration tests: 測試 the full 安全 pipeline end-to-end
- Regression tests: Maintain a suite of previously-discovered attack payloads and verify they remain blocked
- 對抗性 tests: Periodically run automated 紅隊 tools (Garak, Promptfoo) as part of the deployment pipeline
Emerging Trends
Current Research Directions
The field of LLM 安全 is evolving rapidly. Key research directions that are likely to shape the landscape include:
-
Formal verification for LLM behavior: Researchers are exploring mathematical frameworks for proving properties about model behavior under 對抗性 conditions. While full formal verification of neural networks remains intractable, bounded verification of specific properties shows promise.
-
對抗性 訓練 for LLM robustness: Beyond standard RLHF, researchers are developing 訓練 procedures that explicitly expose models to 對抗性 inputs during 安全 訓練, improving robustness against known attack patterns.
-
Interpretability-guided 防禦: Mechanistic interpretability research is enabling defenders to 理解 why specific attacks succeed at the neuron and circuit level, informing more targeted defensive measures.
-
Multi-代理 安全: As LLM 代理 become more prevalent, securing inter-代理 communication and maintaining trust boundaries across 代理 systems is an active area of research with significant practical implications.
-
Automated 紅隊演練 at scale: Tools like NVIDIA's Garak, Microsoft's PyRIT, and the UK AISI's Inspect framework are enabling automated 安全 測試 at scales previously impossible, but the quality and coverage of automated 測試 remains an open challenge.
The integration of these research directions into production systems will define the next generation of AI 安全 practices.
實作 Considerations
Architecture Patterns
When 實作 systems that interact with LLMs, several architectural patterns affect the 安全 posture of the overall application:
Gateway pattern: A dedicated API gateway sits between users and the LLM, handling 認證, rate limiting, 輸入 validation, and 輸出 filtering. This centralizes 安全 controls but creates a single point of failure.
from dataclasses import dataclass
from typing import Optional
import time
@dataclass
class SecurityGateway:
"""Gateway pattern for securing LLM application access."""
input_classifier: object # ML-based 輸入 classifier
output_filter: object # 輸出 content filter
rate_limiter: object # Rate limiting service
audit_logger: object # Audit trail logger
def process_request(self, user_id: str, message: str, session_id: str) -> dict:
"""Process a request through all 安全 layers."""
request_id = self._generate_request_id()
# Layer 1: Rate limiting
if not self.rate_limiter.allow(user_id):
self.audit_logger.log(request_id, "rate_limited", user_id)
return {"error": "Rate limit exceeded", "retry_after": 60}
# Layer 2: 輸入 classification
classification = self.input_classifier.classify(message)
if classification.is_adversarial:
self.audit_logger.log(
request_id, "input_blocked",
user_id, classification.category
)
return {"error": "Request could not be processed"}
# Layer 3: LLM processing
response = self._call_llm(message, session_id)
# Layer 4: 輸出 filtering
filtered = self.output_filter.filter(response)
if filtered.was_modified:
self.audit_logger.log(
request_id, "output_filtered",
user_id, filtered.reason
)
# Layer 5: Audit logging
self.audit_logger.log(
request_id, "completed",
user_id, len(message), len(filtered.content)
)
return {"response": filtered.content}
def _generate_request_id(self) -> str:
import uuid
return str(uuid.uuid4())
def _call_llm(self, message: str, session_id: str) -> str:
# LLM API call 實作
passSidecar pattern: 安全 components run alongside the LLM as independent services, each responsible for a specific aspect of 安全. This provides better isolation and independent scaling but increases system complexity.
Mesh pattern: For multi-代理 systems, each 代理 has its own 安全 perimeter with 認證, 授權, and auditing. Inter-代理 communication follows zero-trust principles.
Performance Implications
安全 measures inevitably add latency and computational overhead. 理解 these trade-offs is essential for production deployments:
| 安全 Layer | Typical Latency | Computational Cost | Impact on UX |
|---|---|---|---|
| Keyword filter | <1ms | Negligible | None |
| Regex filter | 1-5ms | Low | None |
| ML classifier (small) | 10-50ms | Moderate | Minimal |
| ML classifier (large) | 50-200ms | High | Noticeable |
| LLM-as-judge | 500-2000ms | Very High | Significant |
| Full pipeline | 100-500ms | High | Moderate |
The recommended approach is to use fast, lightweight checks first (keyword and regex filters) to catch obvious attacks, followed by more expensive ML-based analysis only for inputs that pass the initial filters. This cascading approach provides good 安全 with acceptable performance.
監控 and Observability
Effective 安全 監控 for LLM applications requires tracking metrics that capture 對抗性 behavior patterns:
from dataclasses import dataclass
from collections import defaultdict
import time
@dataclass
class SecurityMetrics:
"""Track 安全-relevant metrics for LLM applications."""
# Counters
total_requests: int = 0
blocked_requests: int = 0
filtered_outputs: int = 0
anomalous_sessions: int = 0
# Rate tracking
_request_times: list = None
_block_times: list = None
def __post_init__(self):
self._request_times = []
self._block_times = []
def record_request(self, was_blocked: bool = False, was_filtered: bool = False):
"""Record a request and its disposition."""
now = time.time()
self.total_requests += 1
self._request_times.append(now)
if was_blocked:
self.blocked_requests += 1
self._block_times.append(now)
if was_filtered:
self.filtered_outputs += 1
def get_block_rate(self, window_seconds: int = 300) -> float:
"""Calculate the block rate over a time window."""
now = time.time()
cutoff = now - window_seconds
recent_requests = sum(1 for t in self._request_times if t > cutoff)
recent_blocks = sum(1 for t in self._block_times if t > cutoff)
if recent_requests == 0:
return 0.0
return recent_blocks / recent_requests
def should_alert(self) -> bool:
"""Determine if current metrics warrant an alert."""
block_rate = self.get_block_rate()
# Alert if block rate exceeds threshold
if block_rate > 0.3: # >30% of requests blocked in last 5 min
return True
return False安全 測試 in CI/CD
Integrating AI 安全 測試 into the development pipeline catches regressions before they reach production:
- Unit-level tests: 測試 individual 安全 components (classifiers, filters) against known payloads
- Integration tests: 測試 the full 安全 pipeline end-to-end
- Regression tests: Maintain a suite of previously-discovered attack payloads and verify they remain blocked
- 對抗性 tests: Periodically run automated 紅隊 tools (Garak, Promptfoo) as part of the deployment pipeline
Emerging Trends
Current Research Directions
The field of LLM 安全 is evolving rapidly. Key research directions that are likely to shape the landscape include:
-
Formal verification for LLM behavior: Researchers are exploring mathematical frameworks for proving properties about model behavior under 對抗性 conditions. While full formal verification of neural networks remains intractable, bounded verification of specific properties shows promise.
-
對抗性 訓練 for LLM robustness: Beyond standard RLHF, researchers are developing 訓練 procedures that explicitly expose models to 對抗性 inputs during 安全 訓練, improving robustness against known attack patterns.
-
Interpretability-guided 防禦: Mechanistic interpretability research is enabling defenders to 理解 why specific attacks succeed at the neuron and circuit level, informing more targeted defensive measures.
-
Multi-代理 安全: As LLM 代理 become more prevalent, securing inter-代理 communication and maintaining trust boundaries across 代理 systems is an active area of research with significant practical implications.
-
Automated 紅隊演練 at scale: Tools like NVIDIA's Garak, Microsoft's PyRIT, and the UK AISI's Inspect framework are enabling automated 安全 測試 at scales previously impossible, but the quality and coverage of automated 測試 remains an open challenge.
The integration of these research directions into production systems will define the next generation of AI 安全 practices.
參考文獻
- "PTES: Penetration 測試 Execution Standard" - PTES Technical Guidelines (2024) - Engagement workflow standards adaptable to AI 紅隊演練
- "Agile Project Management for 安全 Teams" - SANS Institute (2024) - Kanban and agile methodologies applied to 安全 engagement tracking
- "NIST Cybersecurity Framework 2.0" - National Institute of Standards and Technology (2024) - Risk 評估 and tracking frameworks that inform engagement scope management
What percentage of total engagement time should be allocated to analysis, reporting, and review?