Engagement Tracking & Project Management

intermediate16 min readUpdated 2026-03-13

Managing AI red team engagements with structured tracking tools, progress metrics, time management, and Kanban/Jira templates for professional red teaming.

tracking project-management engagement workflow

Engagement Tracking & Project Management

An AI red team engagement is a project with a defined scope, timeline, and deliverables. Without structured tracking, work gets duplicated, attack surfaces are missed, and deadlines slip. This page covers the project management practices that keep engagements on track.

Engagement Phases and Time Allocation

Most engagements follow a standard phase structure. The time allocation below reflects typical proportions for a two-week engagement:

Phase	Duration	Activities	Deliverables
Scoping & Planning	10% (1 day)	Target analysis, rules of engagement, test plan	Scope document, test plan
Reconnaissance	15% (1.5 days)	System prompt extraction, capability mapping, architecture analysis	Recon report, attack surface map
Active Testing	50% (5 days)	Attack execution, evidence collection, finding documentation	Raw findings, evidence packages
Analysis & Reporting	20% (2 days)	Severity assessment, report writing, reproduction verification	Draft report
Review & Delivery	5% (0.5 days)	Internal QA, client presentation, knowledge transfer	Final report, presentation

Kanban Board Structure

A red team Kanban board needs columns that reflect the evidence-driven workflow:

Columns

Column	Purpose	Entry Criteria	Exit Criteria
Backlog	Attack vectors to investigate	Identified during recon or planning	Prioritized and assigned
In Progress	Currently being tested	Assigned to analyst, lab environment ready	Initial result obtained
Evidence Collection	Attack worked, collecting full evidence	Successful initial test	Complete evidence package with reproduction steps
Blocked	Waiting on access, information, or tools	Identified blocker documented	Blocker resolved
Ready for Review	Findings documented, needs peer review	Evidence package complete	Peer-reviewed and validated
Done	Finding complete and report-ready	Reviewed and approved	Included in report draft

Card Template

Each card on the board should include:

title: "System prompt extraction via role-play"
attack_surface: "system_prompt"
category: "information_disclosure"
priority: "high"
assigned_to: "analyst_name"
estimated_hours: 2
actual_hours: null  # filled on completion
status: "in_progress"
finding_id: null    # assigned when confirmed
evidence_ids: []
notes: |
  Hypothesis: Role-play scenarios may bypass
  instruction-following guardrails.
blockers: []

Attack Surface Coverage Tracking

Coverage Matrix

Attack Surface	Planned Tests	Completed	Findings	Coverage
Direct prompt injection	12	10	2	83%
Indirect prompt injection	8	8	1	100%
System prompt extraction	6	4	1	67%
Tool/function abuse	10	3	0	30%
Safety bypass (jailbreak)	15	12	3	80%
Data exfiltration	5	5	0	100%
Multi-turn manipulation	8	2	0	25%
Total	64	44	7	69%

Daily Stand-up Questions

For team engagements, daily stand-ups should address:

What attack surfaces did you test yesterday?
What did you find? (including negative results)
What are you testing today?
Are you blocked on anything?
Any findings that need urgent escalation?

Time Tracking

Track time at the activity level, not just by day. This data informs future engagement scoping and pricing.

Activity Category	Description	Example
`recon`	Target analysis and information gathering	System prompt extraction attempts
`testing`	Active attack execution	Running injection payloads
`evidence`	Evidence collection and organization	Capturing API logs, screenshots
`analysis`	Analyzing results and assessing severity	Determining exploitability and impact
`reporting`	Writing report content	Drafting findings, executive summary
`admin`	Project management, meetings, communication	Client check-ins, team coordination

import json
import datetime
from pathlib import Path
 
class TimeTracker:
    """Simple time tracking for red team engagements."""
 
    CATEGORIES = ["recon", "testing", "evidence", "analysis", "reporting", "admin"]
 
    def __init__(self, engagement_id: str, analyst: str):
        self.engagement_id = engagement_id
        self.analyst = analyst
        self.log_file = Path(f"./time-logs/{engagement_id}_{analyst}.jsonl")
        self.log_file.parent.mkdir(parents=True, exist_ok=True)
        self.current_entry = None
 
    def start(self, category: str, description: str):
        if category not in self.CATEGORIES:
            raise ValueError(f"Category must be one of {self.CATEGORIES}")
        if self.current_entry:
            self.stop()
        self.current_entry = {
            "category": category,
            "description": description,
            "start": datetime.datetime.now(datetime.timezone.utc).isoformat(),
        }
 
    def stop(self):
        if not self.current_entry:
            return
        self.current_entry["end"] = datetime.datetime.now(
            datetime.timezone.utc
        ).isoformat()
        with open(self.log_file, "a") as f:
            f.write(json.dumps(self.current_entry) + "\n")
        self.current_entry = None
 
    def summary(self) -> dict:
        totals = {cat: 0.0 for cat in self.CATEGORIES}
        if not self.log_file.exists():
            return totals
        for line in self.log_file.read_text().strip().split("\n"):
            entry = json.loads(line)
            start = datetime.datetime.fromisoformat(entry["start"])
            end = datetime.datetime.fromisoformat(entry["end"])
            hours = (end - start).total_seconds() / 3600
            totals[entry["category"]] += hours
        return totals

Status Reporting

Regular status updates maintain client confidence and surface blockers early.

Weekly Status Report Template

# Red Team Engagement Status Report
**Engagement:** ENG-2026-042 | **Week:** 1 of 2 | **Date:** 2026-03-13
 
## Summary
- Testing is 45% complete across 7 attack surface categories
- 4 findings identified (1 Critical, 1 High, 2 Medium)
- On track for draft report delivery on 2026-03-20
 
## Findings Summary
| ID | Severity | Category | Status |
|----|----------|----------|--------|
| F001 | Critical | Prompt Injection | Evidence complete |
| F002 | High | Data Exfiltration | Under review |
| F003 | Medium | Safety Bypass | Evidence collection |
| F004 | Medium | Info Disclosure | Evidence collection |
 
## Coverage
- Completed: Direct injection, indirect injection, data exfiltration
- In progress: Safety bypass, system prompt extraction
- Upcoming: Tool abuse, multi-turn manipulation
 
## Blockers
- Awaiting API credentials for the production endpoint (requested 2026-03-10)
 
## Next Week Plan
- Complete remaining attack surface testing
- Begin report drafting (target: Wednesday)
- Schedule findings review meeting

Red Team Lab & Operations -- operational foundations
Evidence Collection & Chain of Custody -- evidence standards that integrate with tracking
Metrics, KPIs & Demonstrating ROI -- using engagement data for program metrics

Implementation Considerations

Architecture Patterns

When implementing systems that interact with LLMs, several architectural patterns affect the security posture of the overall application:

Gateway pattern: A dedicated API gateway sits between users and the LLM, handling authentication, rate limiting, input validation, and output filtering. This centralizes security controls but creates a single point of failure.

from dataclasses import dataclass
from typing import Optional
import time
 
@dataclass
class SecurityGateway:
    """Gateway pattern for securing LLM application access."""
 
    input_classifier: object  # ML-based input classifier
    output_filter: object     # Output content filter
    rate_limiter: object      # Rate limiting service
    audit_logger: object      # Audit trail logger
 
    def process_request(self, user_id: str, message: str, session_id: str) -> dict:
        """Process a request through all security layers."""
        request_id = self._generate_request_id()
 
        # Layer 1: Rate limiting
        if not self.rate_limiter.allow(user_id):
            self.audit_logger.log(request_id, "rate_limited", user_id)
            return {"error": "Rate limit exceeded", "retry_after": 60}
 
        # Layer 2: Input classification
        classification = self.input_classifier.classify(message)
        if classification.is_adversarial:
            self.audit_logger.log(
                request_id, "input_blocked",
                user_id, classification.category
            )
            return {"error": "Request could not be processed"}
 
        # Layer 3: LLM processing
        response = self._call_llm(message, session_id)
 
        # Layer 4: Output filtering
        filtered = self.output_filter.filter(response)
        if filtered.was_modified:
            self.audit_logger.log(
                request_id, "output_filtered",
                user_id, filtered.reason
            )
 
        # Layer 5: Audit logging
        self.audit_logger.log(
            request_id, "completed",
            user_id, len(message), len(filtered.content)
        )
 
        return {"response": filtered.content}
 
    def _generate_request_id(self) -> str:
        import uuid
        return str(uuid.uuid4())
 
    def _call_llm(self, message: str, session_id: str) -> str:
        # LLM API call implementation
        pass

Sidecar pattern: Security components run alongside the LLM as independent services, each responsible for a specific aspect of security. This provides better isolation and independent scaling but increases system complexity.

Mesh pattern: For multi-agent systems, each agent has its own security perimeter with authentication, authorization, and auditing. Inter-agent communication follows zero-trust principles.

Performance Implications

Security measures inevitably add latency and computational overhead. Understanding these trade-offs is essential for production deployments:

Security Layer	Typical Latency	Computational Cost	Impact on UX
Keyword filter	<1ms	Negligible	None
Regex filter	1-5ms	Low	None
ML classifier (small)	10-50ms	Moderate	Minimal
ML classifier (large)	50-200ms	High	Noticeable
LLM-as-judge	500-2000ms	Very High	Significant
Full pipeline	100-500ms	High	Moderate

The recommended approach is to use fast, lightweight checks first (keyword and regex filters) to catch obvious attacks, followed by more expensive ML-based analysis only for inputs that pass the initial filters. This cascading approach provides good security with acceptable performance.

Monitoring and Observability

Effective security monitoring for LLM applications requires tracking metrics that capture adversarial behavior patterns:

from dataclasses import dataclass
from collections import defaultdict
import time
 
@dataclass
class SecurityMetrics:
    """Track security-relevant metrics for LLM applications."""
 
    # Counters
    total_requests: int = 0
    blocked_requests: int = 0
    filtered_outputs: int = 0
    anomalous_sessions: int = 0
 
    # Rate tracking
    _request_times: list = None
    _block_times: list = None
 
    def __post_init__(self):
        self._request_times = []
        self._block_times = []
 
    def record_request(self, was_blocked: bool = False, was_filtered: bool = False):
        """Record a request and its disposition."""
        now = time.time()
        self.total_requests += 1
        self._request_times.append(now)
 
        if was_blocked:
            self.blocked_requests += 1
            self._block_times.append(now)
 
        if was_filtered:
            self.filtered_outputs += 1
 
    def get_block_rate(self, window_seconds: int = 300) -> float:
        """Calculate the block rate over a time window."""
        now = time.time()
        cutoff = now - window_seconds
        recent_requests = sum(1 for t in self._request_times if t > cutoff)
        recent_blocks = sum(1 for t in self._block_times if t > cutoff)
        if recent_requests == 0:
            return 0.0
        return recent_blocks / recent_requests
 
    def should_alert(self) -> bool:
        """Determine if current metrics warrant an alert."""
        block_rate = self.get_block_rate()
        # Alert if block rate exceeds threshold
        if block_rate > 0.3:  # >30% of requests blocked in last 5 min
            return True
        return False

Security Testing in CI/CD

Integrating AI security testing into the development pipeline catches regressions before they reach production:

Unit-level tests: Test individual security components (classifiers, filters) against known payloads
Integration tests: Test the full security pipeline end-to-end
Regression tests: Maintain a suite of previously-discovered attack payloads and verify they remain blocked
Adversarial tests: Periodically run automated red team tools (Garak, Promptfoo) as part of the deployment pipeline

Emerging Trends

Current Research Directions

The field of LLM security is evolving rapidly. Key research directions that are likely to shape the landscape include:

Formal verification for LLM behavior: Researchers are exploring mathematical frameworks for proving properties about model behavior under adversarial conditions. While full formal verification of neural networks remains intractable, bounded verification of specific properties shows promise.
Adversarial training for LLM robustness: Beyond standard RLHF, researchers are developing training procedures that explicitly expose models to adversarial inputs during safety training, improving robustness against known attack patterns.
Interpretability-guided defense: Mechanistic interpretability research is enabling defenders to understand why specific attacks succeed at the neuron and circuit level, informing more targeted defensive measures.
Multi-agent security: As LLM agents become more prevalent, securing inter-agent communication and maintaining trust boundaries across agent systems is an active area of research with significant practical implications.
Automated red teaming at scale: Tools like NVIDIA's Garak, Microsoft's PyRIT, and the UK AISI's Inspect framework are enabling automated security testing at scales previously impossible, but the quality and coverage of automated testing remains an open challenge.

The integration of these research directions into production systems will define the next generation of AI security practices.

Implementation Considerations

Architecture Patterns

When implementing systems that interact with LLMs, several architectural patterns affect the security posture of the overall application:

from dataclasses import dataclass
from typing import Optional
import time
 
@dataclass
class SecurityGateway:
    """Gateway pattern for securing LLM application access."""
 
    input_classifier: object  # ML-based input classifier
    output_filter: object     # Output content filter
    rate_limiter: object      # Rate limiting service
    audit_logger: object      # Audit trail logger
 
    def process_request(self, user_id: str, message: str, session_id: str) -> dict:
        """Process a request through all security layers."""
        request_id = self._generate_request_id()
 
        # Layer 1: Rate limiting
        if not self.rate_limiter.allow(user_id):
            self.audit_logger.log(request_id, "rate_limited", user_id)
            return {"error": "Rate limit exceeded", "retry_after": 60}
 
        # Layer 2: Input classification
        classification = self.input_classifier.classify(message)
        if classification.is_adversarial:
            self.audit_logger.log(
                request_id, "input_blocked",
                user_id, classification.category
            )
            return {"error": "Request could not be processed"}
 
        # Layer 3: LLM processing
        response = self._call_llm(message, session_id)
 
        # Layer 4: Output filtering
        filtered = self.output_filter.filter(response)
        if filtered.was_modified:
            self.audit_logger.log(
                request_id, "output_filtered",
                user_id, filtered.reason
            )
 
        # Layer 5: Audit logging
        self.audit_logger.log(
            request_id, "completed",
            user_id, len(message), len(filtered.content)
        )
 
        return {"response": filtered.content}
 
    def _generate_request_id(self) -> str:
        import uuid
        return str(uuid.uuid4())
 
    def _call_llm(self, message: str, session_id: str) -> str:
        # LLM API call implementation
        pass

Mesh pattern: For multi-agent systems, each agent has its own security perimeter with authentication, authorization, and auditing. Inter-agent communication follows zero-trust principles.

Performance Implications

Security measures inevitably add latency and computational overhead. Understanding these trade-offs is essential for production deployments:

Security Layer	Typical Latency	Computational Cost	Impact on UX
Keyword filter	<1ms	Negligible	None
Regex filter	1-5ms	Low	None
ML classifier (small)	10-50ms	Moderate	Minimal
ML classifier (large)	50-200ms	High	Noticeable
LLM-as-judge	500-2000ms	Very High	Significant
Full pipeline	100-500ms	High	Moderate

Monitoring and Observability

Effective security monitoring for LLM applications requires tracking metrics that capture adversarial behavior patterns:

from dataclasses import dataclass
from collections import defaultdict
import time
 
@dataclass
class SecurityMetrics:
    """Track security-relevant metrics for LLM applications."""
 
    # Counters
    total_requests: int = 0
    blocked_requests: int = 0
    filtered_outputs: int = 0
    anomalous_sessions: int = 0
 
    # Rate tracking
    _request_times: list = None
    _block_times: list = None
 
    def __post_init__(self):
        self._request_times = []
        self._block_times = []
 
    def record_request(self, was_blocked: bool = False, was_filtered: bool = False):
        """Record a request and its disposition."""
        now = time.time()
        self.total_requests += 1
        self._request_times.append(now)
 
        if was_blocked:
            self.blocked_requests += 1
            self._block_times.append(now)
 
        if was_filtered:
            self.filtered_outputs += 1
 
    def get_block_rate(self, window_seconds: int = 300) -> float:
        """Calculate the block rate over a time window."""
        now = time.time()
        cutoff = now - window_seconds
        recent_requests = sum(1 for t in self._request_times if t > cutoff)
        recent_blocks = sum(1 for t in self._block_times if t > cutoff)
        if recent_requests == 0:
            return 0.0
        return recent_blocks / recent_requests
 
    def should_alert(self) -> bool:
        """Determine if current metrics warrant an alert."""
        block_rate = self.get_block_rate()
        # Alert if block rate exceeds threshold
        if block_rate > 0.3:  # >30% of requests blocked in last 5 min
            return True
        return False

Security Testing in CI/CD

Integrating AI security testing into the development pipeline catches regressions before they reach production:

Unit-level tests: Test individual security components (classifiers, filters) against known payloads
Integration tests: Test the full security pipeline end-to-end
Regression tests: Maintain a suite of previously-discovered attack payloads and verify they remain blocked
Adversarial tests: Periodically run automated red team tools (Garak, Promptfoo) as part of the deployment pipeline

Emerging Trends

Current Research Directions

The field of LLM security is evolving rapidly. Key research directions that are likely to shape the landscape include:

Formal verification for LLM behavior: Researchers are exploring mathematical frameworks for proving properties about model behavior under adversarial conditions. While full formal verification of neural networks remains intractable, bounded verification of specific properties shows promise.
Adversarial training for LLM robustness: Beyond standard RLHF, researchers are developing training procedures that explicitly expose models to adversarial inputs during safety training, improving robustness against known attack patterns.
Interpretability-guided defense: Mechanistic interpretability research is enabling defenders to understand why specific attacks succeed at the neuron and circuit level, informing more targeted defensive measures.
Multi-agent security: As LLM agents become more prevalent, securing inter-agent communication and maintaining trust boundaries across agent systems is an active area of research with significant practical implications.
Automated red teaming at scale: Tools like NVIDIA's Garak, Microsoft's PyRIT, and the UK AISI's Inspect framework are enabling automated security testing at scales previously impossible, but the quality and coverage of automated testing remains an open challenge.

The integration of these research directions into production systems will define the next generation of AI security practices.

References

"PTES: Penetration Testing Execution Standard" - PTES Technical Guidelines (2024) - Engagement workflow standards adaptable to AI red teaming
"Agile Project Management for Security Teams" - SANS Institute (2024) - Kanban and agile methodologies applied to security engagement tracking
"NIST Cybersecurity Framework 2.0" - National Institute of Standards and Technology (2024) - Risk assessment and tracking frameworks that inform engagement scope management

Knowledge Check

What percentage of total engagement time should be allocated to analysis, reporting, and review?

Edit this page on GitHub

Engagement Tracking & Project Management

intermediate16 min readUpdated 2026-03-13

Managing AI red team engagements with structured tracking tools, progress metrics, time management, and Kanban/Jira templates for professional red teaming.

tracking project-management engagement workflow

Engagement Tracking & Project Management

Engagement Phases and Time Allocation

Most engagements follow a standard phase structure. The time allocation below reflects typical proportions for a two-week engagement:

Phase	Duration	Activities	Deliverables
Scoping & Planning	10% (1 day)	Target analysis, rules of engagement, test plan	Scope document, test plan
Reconnaissance	15% (1.5 days)	System prompt extraction, capability mapping, architecture analysis	Recon report, attack surface map
Active Testing	50% (5 days)	Attack execution, evidence collection, finding documentation	Raw findings, evidence packages
Analysis & Reporting	20% (2 days)	Severity assessment, report writing, reproduction verification	Draft report
Review & Delivery	5% (0.5 days)	Internal QA, client presentation, knowledge transfer	Final report, presentation

Kanban Board Structure

A red team Kanban board needs columns that reflect the evidence-driven workflow:

Columns

Column	Purpose	Entry Criteria	Exit Criteria
Backlog	Attack vectors to investigate	Identified during recon or planning	Prioritized and assigned
In Progress	Currently being tested	Assigned to analyst, lab environment ready	Initial result obtained
Evidence Collection	Attack worked, collecting full evidence	Successful initial test	Complete evidence package with reproduction steps
Blocked	Waiting on access, information, or tools	Identified blocker documented	Blocker resolved
Ready for Review	Findings documented, needs peer review	Evidence package complete	Peer-reviewed and validated
Done	Finding complete and report-ready	Reviewed and approved	Included in report draft

Card Template

Each card on the board should include:

title: "System prompt extraction via role-play"
attack_surface: "system_prompt"
category: "information_disclosure"
priority: "high"
assigned_to: "analyst_name"
estimated_hours: 2
actual_hours: null  # filled on completion
status: "in_progress"
finding_id: null    # assigned when confirmed
evidence_ids: []
notes: |
  Hypothesis: Role-play scenarios may bypass
  instruction-following guardrails.
blockers: []

Attack Surface Coverage Tracking

Coverage Matrix

Attack Surface	Planned Tests	Completed	Findings	Coverage
Direct prompt injection	12	10	2	83%
Indirect prompt injection	8	8	1	100%
System prompt extraction	6	4	1	67%
Tool/function abuse	10	3	0	30%
Safety bypass (jailbreak)	15	12	3	80%
Data exfiltration	5	5	0	100%
Multi-turn manipulation	8	2	0	25%
Total	64	44	7	69%

Daily Stand-up Questions

For team engagements, daily stand-ups should address:

What attack surfaces did you test yesterday?
What did you find? (including negative results)
What are you testing today?
Are you blocked on anything?
Any findings that need urgent escalation?

Time Tracking

Track time at the activity level, not just by day. This data informs future engagement scoping and pricing.

Activity Category	Description	Example
`recon`	Target analysis and information gathering	System prompt extraction attempts
`testing`	Active attack execution	Running injection payloads
`evidence`	Evidence collection and organization	Capturing API logs, screenshots
`analysis`	Analyzing results and assessing severity	Determining exploitability and impact
`reporting`	Writing report content	Drafting findings, executive summary
`admin`	Project management, meetings, communication	Client check-ins, team coordination

import json
import datetime
from pathlib import Path
 
class TimeTracker:
    """Simple time tracking for red team engagements."""
 
    CATEGORIES = ["recon", "testing", "evidence", "analysis", "reporting", "admin"]
 
    def __init__(self, engagement_id: str, analyst: str):
        self.engagement_id = engagement_id
        self.analyst = analyst
        self.log_file = Path(f"./time-logs/{engagement_id}_{analyst}.jsonl")
        self.log_file.parent.mkdir(parents=True, exist_ok=True)
        self.current_entry = None
 
    def start(self, category: str, description: str):
        if category not in self.CATEGORIES:
            raise ValueError(f"Category must be one of {self.CATEGORIES}")
        if self.current_entry:
            self.stop()
        self.current_entry = {
            "category": category,
            "description": description,
            "start": datetime.datetime.now(datetime.timezone.utc).isoformat(),
        }
 
    def stop(self):
        if not self.current_entry:
            return
        self.current_entry["end"] = datetime.datetime.now(
            datetime.timezone.utc
        ).isoformat()
        with open(self.log_file, "a") as f:
            f.write(json.dumps(self.current_entry) + "\n")
        self.current_entry = None
 
    def summary(self) -> dict:
        totals = {cat: 0.0 for cat in self.CATEGORIES}
        if not self.log_file.exists():
            return totals
        for line in self.log_file.read_text().strip().split("\n"):
            entry = json.loads(line)
            start = datetime.datetime.fromisoformat(entry["start"])
            end = datetime.datetime.fromisoformat(entry["end"])
            hours = (end - start).total_seconds() / 3600
            totals[entry["category"]] += hours
        return totals

Status Reporting

Regular status updates maintain client confidence and surface blockers early.

Weekly Status Report Template

# Red Team Engagement Status Report
**Engagement:** ENG-2026-042 | **Week:** 1 of 2 | **Date:** 2026-03-13
 
## Summary
- Testing is 45% complete across 7 attack surface categories
- 4 findings identified (1 Critical, 1 High, 2 Medium)
- On track for draft report delivery on 2026-03-20
 
## Findings Summary
| ID | Severity | Category | Status |
|----|----------|----------|--------|
| F001 | Critical | Prompt Injection | Evidence complete |
| F002 | High | Data Exfiltration | Under review |
| F003 | Medium | Safety Bypass | Evidence collection |
| F004 | Medium | Info Disclosure | Evidence collection |
 
## Coverage
- Completed: Direct injection, indirect injection, data exfiltration
- In progress: Safety bypass, system prompt extraction
- Upcoming: Tool abuse, multi-turn manipulation
 
## Blockers
- Awaiting API credentials for the production endpoint (requested 2026-03-10)
 
## Next Week Plan
- Complete remaining attack surface testing
- Begin report drafting (target: Wednesday)
- Schedule findings review meeting

Red Team Lab & Operations -- operational foundations
Evidence Collection & Chain of Custody -- evidence standards that integrate with tracking
Metrics, KPIs & Demonstrating ROI -- using engagement data for program metrics

Implementation Considerations

Architecture Patterns

When implementing systems that interact with LLMs, several architectural patterns affect the security posture of the overall application:

from dataclasses import dataclass
from typing import Optional
import time
 
@dataclass
class SecurityGateway:
    """Gateway pattern for securing LLM application access."""
 
    input_classifier: object  # ML-based input classifier
    output_filter: object     # Output content filter
    rate_limiter: object      # Rate limiting service
    audit_logger: object      # Audit trail logger
 
    def process_request(self, user_id: str, message: str, session_id: str) -> dict:
        """Process a request through all security layers."""
        request_id = self._generate_request_id()
 
        # Layer 1: Rate limiting
        if not self.rate_limiter.allow(user_id):
            self.audit_logger.log(request_id, "rate_limited", user_id)
            return {"error": "Rate limit exceeded", "retry_after": 60}
 
        # Layer 2: Input classification
        classification = self.input_classifier.classify(message)
        if classification.is_adversarial:
            self.audit_logger.log(
                request_id, "input_blocked",
                user_id, classification.category
            )
            return {"error": "Request could not be processed"}
 
        # Layer 3: LLM processing
        response = self._call_llm(message, session_id)
 
        # Layer 4: Output filtering
        filtered = self.output_filter.filter(response)
        if filtered.was_modified:
            self.audit_logger.log(
                request_id, "output_filtered",
                user_id, filtered.reason
            )
 
        # Layer 5: Audit logging
        self.audit_logger.log(
            request_id, "completed",
            user_id, len(message), len(filtered.content)
        )
 
        return {"response": filtered.content}
 
    def _generate_request_id(self) -> str:
        import uuid
        return str(uuid.uuid4())
 
    def _call_llm(self, message: str, session_id: str) -> str:
        # LLM API call implementation
        pass

Mesh pattern: For multi-agent systems, each agent has its own security perimeter with authentication, authorization, and auditing. Inter-agent communication follows zero-trust principles.

Performance Implications

Security measures inevitably add latency and computational overhead. Understanding these trade-offs is essential for production deployments:

Security Layer	Typical Latency	Computational Cost	Impact on UX
Keyword filter	<1ms	Negligible	None
Regex filter	1-5ms	Low	None
ML classifier (small)	10-50ms	Moderate	Minimal
ML classifier (large)	50-200ms	High	Noticeable
LLM-as-judge	500-2000ms	Very High	Significant
Full pipeline	100-500ms	High	Moderate

Monitoring and Observability

Effective security monitoring for LLM applications requires tracking metrics that capture adversarial behavior patterns:

from dataclasses import dataclass
from collections import defaultdict
import time
 
@dataclass
class SecurityMetrics:
    """Track security-relevant metrics for LLM applications."""
 
    # Counters
    total_requests: int = 0
    blocked_requests: int = 0
    filtered_outputs: int = 0
    anomalous_sessions: int = 0
 
    # Rate tracking
    _request_times: list = None
    _block_times: list = None
 
    def __post_init__(self):
        self._request_times = []
        self._block_times = []
 
    def record_request(self, was_blocked: bool = False, was_filtered: bool = False):
        """Record a request and its disposition."""
        now = time.time()
        self.total_requests += 1
        self._request_times.append(now)
 
        if was_blocked:
            self.blocked_requests += 1
            self._block_times.append(now)
 
        if was_filtered:
            self.filtered_outputs += 1
 
    def get_block_rate(self, window_seconds: int = 300) -> float:
        """Calculate the block rate over a time window."""
        now = time.time()
        cutoff = now - window_seconds
        recent_requests = sum(1 for t in self._request_times if t > cutoff)
        recent_blocks = sum(1 for t in self._block_times if t > cutoff)
        if recent_requests == 0:
            return 0.0
        return recent_blocks / recent_requests
 
    def should_alert(self) -> bool:
        """Determine if current metrics warrant an alert."""
        block_rate = self.get_block_rate()
        # Alert if block rate exceeds threshold
        if block_rate > 0.3:  # >30% of requests blocked in last 5 min
            return True
        return False

Security Testing in CI/CD

Integrating AI security testing into the development pipeline catches regressions before they reach production:

Unit-level tests: Test individual security components (classifiers, filters) against known payloads
Integration tests: Test the full security pipeline end-to-end
Regression tests: Maintain a suite of previously-discovered attack payloads and verify they remain blocked
Adversarial tests: Periodically run automated red team tools (Garak, Promptfoo) as part of the deployment pipeline

Emerging Trends

Current Research Directions

The field of LLM security is evolving rapidly. Key research directions that are likely to shape the landscape include:

Formal verification for LLM behavior: Researchers are exploring mathematical frameworks for proving properties about model behavior under adversarial conditions. While full formal verification of neural networks remains intractable, bounded verification of specific properties shows promise.
Adversarial training for LLM robustness: Beyond standard RLHF, researchers are developing training procedures that explicitly expose models to adversarial inputs during safety training, improving robustness against known attack patterns.
Interpretability-guided defense: Mechanistic interpretability research is enabling defenders to understand why specific attacks succeed at the neuron and circuit level, informing more targeted defensive measures.
Multi-agent security: As LLM agents become more prevalent, securing inter-agent communication and maintaining trust boundaries across agent systems is an active area of research with significant practical implications.
Automated red teaming at scale: Tools like NVIDIA's Garak, Microsoft's PyRIT, and the UK AISI's Inspect framework are enabling automated security testing at scales previously impossible, but the quality and coverage of automated testing remains an open challenge.

The integration of these research directions into production systems will define the next generation of AI security practices.

Implementation Considerations

Architecture Patterns

When implementing systems that interact with LLMs, several architectural patterns affect the security posture of the overall application:

from dataclasses import dataclass
from typing import Optional
import time
 
@dataclass
class SecurityGateway:
    """Gateway pattern for securing LLM application access."""
 
    input_classifier: object  # ML-based input classifier
    output_filter: object     # Output content filter
    rate_limiter: object      # Rate limiting service
    audit_logger: object      # Audit trail logger
 
    def process_request(self, user_id: str, message: str, session_id: str) -> dict:
        """Process a request through all security layers."""
        request_id = self._generate_request_id()
 
        # Layer 1: Rate limiting
        if not self.rate_limiter.allow(user_id):
            self.audit_logger.log(request_id, "rate_limited", user_id)
            return {"error": "Rate limit exceeded", "retry_after": 60}
 
        # Layer 2: Input classification
        classification = self.input_classifier.classify(message)
        if classification.is_adversarial:
            self.audit_logger.log(
                request_id, "input_blocked",
                user_id, classification.category
            )
            return {"error": "Request could not be processed"}
 
        # Layer 3: LLM processing
        response = self._call_llm(message, session_id)
 
        # Layer 4: Output filtering
        filtered = self.output_filter.filter(response)
        if filtered.was_modified:
            self.audit_logger.log(
                request_id, "output_filtered",
                user_id, filtered.reason
            )
 
        # Layer 5: Audit logging
        self.audit_logger.log(
            request_id, "completed",
            user_id, len(message), len(filtered.content)
        )
 
        return {"response": filtered.content}
 
    def _generate_request_id(self) -> str:
        import uuid
        return str(uuid.uuid4())
 
    def _call_llm(self, message: str, session_id: str) -> str:
        # LLM API call implementation
        pass

Mesh pattern: For multi-agent systems, each agent has its own security perimeter with authentication, authorization, and auditing. Inter-agent communication follows zero-trust principles.

Performance Implications

Security measures inevitably add latency and computational overhead. Understanding these trade-offs is essential for production deployments:

Security Layer	Typical Latency	Computational Cost	Impact on UX
Keyword filter	<1ms	Negligible	None
Regex filter	1-5ms	Low	None
ML classifier (small)	10-50ms	Moderate	Minimal
ML classifier (large)	50-200ms	High	Noticeable
LLM-as-judge	500-2000ms	Very High	Significant
Full pipeline	100-500ms	High	Moderate

Monitoring and Observability

Effective security monitoring for LLM applications requires tracking metrics that capture adversarial behavior patterns:

from dataclasses import dataclass
from collections import defaultdict
import time
 
@dataclass
class SecurityMetrics:
    """Track security-relevant metrics for LLM applications."""
 
    # Counters
    total_requests: int = 0
    blocked_requests: int = 0
    filtered_outputs: int = 0
    anomalous_sessions: int = 0
 
    # Rate tracking
    _request_times: list = None
    _block_times: list = None
 
    def __post_init__(self):
        self._request_times = []
        self._block_times = []
 
    def record_request(self, was_blocked: bool = False, was_filtered: bool = False):
        """Record a request and its disposition."""
        now = time.time()
        self.total_requests += 1
        self._request_times.append(now)
 
        if was_blocked:
            self.blocked_requests += 1
            self._block_times.append(now)
 
        if was_filtered:
            self.filtered_outputs += 1
 
    def get_block_rate(self, window_seconds: int = 300) -> float:
        """Calculate the block rate over a time window."""
        now = time.time()
        cutoff = now - window_seconds
        recent_requests = sum(1 for t in self._request_times if t > cutoff)
        recent_blocks = sum(1 for t in self._block_times if t > cutoff)
        if recent_requests == 0:
            return 0.0
        return recent_blocks / recent_requests
 
    def should_alert(self) -> bool:
        """Determine if current metrics warrant an alert."""
        block_rate = self.get_block_rate()
        # Alert if block rate exceeds threshold
        if block_rate > 0.3:  # >30% of requests blocked in last 5 min
            return True
        return False

Security Testing in CI/CD

Integrating AI security testing into the development pipeline catches regressions before they reach production:

Unit-level tests: Test individual security components (classifiers, filters) against known payloads
Integration tests: Test the full security pipeline end-to-end
Regression tests: Maintain a suite of previously-discovered attack payloads and verify they remain blocked
Adversarial tests: Periodically run automated red team tools (Garak, Promptfoo) as part of the deployment pipeline

Emerging Trends

Current Research Directions

The field of LLM security is evolving rapidly. Key research directions that are likely to shape the landscape include:

Formal verification for LLM behavior: Researchers are exploring mathematical frameworks for proving properties about model behavior under adversarial conditions. While full formal verification of neural networks remains intractable, bounded verification of specific properties shows promise.
Adversarial training for LLM robustness: Beyond standard RLHF, researchers are developing training procedures that explicitly expose models to adversarial inputs during safety training, improving robustness against known attack patterns.
Interpretability-guided defense: Mechanistic interpretability research is enabling defenders to understand why specific attacks succeed at the neuron and circuit level, informing more targeted defensive measures.
Multi-agent security: As LLM agents become more prevalent, securing inter-agent communication and maintaining trust boundaries across agent systems is an active area of research with significant practical implications.
Automated red teaming at scale: Tools like NVIDIA's Garak, Microsoft's PyRIT, and the UK AISI's Inspect framework are enabling automated security testing at scales previously impossible, but the quality and coverage of automated testing remains an open challenge.

The integration of these research directions into production systems will define the next generation of AI security practices.

References

"PTES: Penetration Testing Execution Standard" - PTES Technical Guidelines (2024) - Engagement workflow standards adaptable to AI red teaming
"Agile Project Management for Security Teams" - SANS Institute (2024) - Kanban and agile methodologies applied to security engagement tracking
"NIST Cybersecurity Framework 2.0" - National Institute of Standards and Technology (2024) - Risk assessment and tracking frameworks that inform engagement scope management

Knowledge Check

What percentage of total engagement time should be allocated to analysis, reporting, and review?

Edit this page on GitHub

Engagement Tracking & Project Management

Related articles

Engagement Tracking & Project Management

Related articles