# walkthroughs
272 articlestagged with “walkthroughs”
A2A Message Interception Walkthrough
Intercept and manipulate messages between agents in A2A protocol-based multi-agent systems.
A2A Protocol Injection Walkthrough
Walkthrough of exploiting Google's Agent-to-Agent protocol for inter-agent prompt injection.
Adversarial Suffix Crafting Walkthrough
Craft adversarial suffixes using gradient-based and gradient-free optimization methods for black-box models.
API Abuse Chain Attack Walkthrough
Chain multiple API calls to achieve unauthorized actions that no single call would permit.
API Chaining Exploitation Walkthrough
Walkthrough of chaining multiple API calls in agent systems to achieve multi-step unauthorized actions.
Attention Hijacking Attack Walkthrough
Hijack transformer attention mechanisms to redirect model focus toward adversarial instructions in the context.
AutoDAN Stealthy Jailbreak Walkthrough
Walkthrough of generating stealthy, human-readable jailbreak prompts using the AutoDAN methodology.
Batch Processing Injection Walkthrough
Inject payloads through batch processing pipelines where individual items are processed without isolation.
LLM Cache Poisoning Walkthrough
Poison LLM response caches to serve adversarial content to other users without direct injection.
Competition-Style Jailbreak Techniques
Walkthrough of jailbreak techniques used in AI security competitions and CTF events.
Computer Use Agent Injection Walkthrough
Walkthrough of injecting prompts through UI elements and screenshots processed by computer-use agents.
Advanced Context Manipulation Walkthrough
Manipulate conversation context through history injection, context overflow, and attention weight exploitation.
Context Manipulation Deep Dive
Advanced walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.
Crescendo Multi-Turn Jailbreak Walkthrough
Step-by-step walkthrough of the Crescendo technique for gradually escalating multi-turn jailbreaks.
Cross-Plugin Data Exfiltration Walkthrough
Walkthrough of chaining multiple plugins/tools to exfiltrate data from LLM agent systems.
Data Harvesting Through LLM Apps
Complete walkthrough of systematic data extraction from LLM applications using various exfiltration channels.
Document-Based Injection Walkthrough
Inject prompts through documents processed by LLM applications including PDFs, spreadsheets, and presentations.
Embedding Collision Attack Walkthrough
Craft documents that collide in embedding space with target queries to hijack RAG retrieval results.
Embedding Inversion Attack Walkthrough
Walkthrough of inverting text embeddings to recover original documents from vector databases.
Encoding Chain Attack Walkthrough
Chain multiple encoding transformations to bypass input filters that only decode one layer of encoding.
Encoding Chain Bypass Walkthrough
Walkthrough of chaining Base64, URL encoding, and Unicode tricks to bypass multi-layer input filters.
Few-Shot Example Poisoning Walkthrough
Poison few-shot examples in prompts to establish behavioral patterns that override system instructions.
Fine-Tuning Safety Bypass Walkthrough
Walkthrough of using fine-tuning API access to remove safety behaviors from aligned models.
Function Calling Exploitation Guide
Complete walkthrough of exploiting function calling in OpenAI, Anthropic, and Google AI APIs.
Function Schema Poisoning Walkthrough
Poison function schemas to inject hidden instructions that redirect model tool selection and parameter filling.
GCG Adversarial Suffix Attack Walkthrough
Complete walkthrough of implementing the Greedy Coordinate Gradient adversarial suffix attack.
Real-World Indirect Prompt Injection
Walkthrough of planting and triggering indirect prompt injection in web-browsing AI assistants.
JSON Injection Attack Walkthrough
Exploit JSON parsing and generation in LLM applications to inject payloads through structured data boundaries.
Logit Bias Exploitation Walkthrough
Exploit logit bias API parameters to force models toward specific token sequences and bypass safety filters.
Many-Shot Jailbreaking Walkthrough
Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.
Advanced Markdown Injection Walkthrough
Inject Markdown that triggers data exfiltration through image rendering, link generation, and code block escape.
MCP Tool Poisoning Attack Walkthrough
Walkthrough of exploiting MCP tool descriptions to redirect agent behavior via hidden instructions.
Memory Persistence Attack Walkthrough
Walkthrough of achieving persistent memory manipulation in agent systems for cross-session influence.
Model Context Window Overflow Walkthrough
Overflow the context window to push safety instructions outside the effective attention range.
Model Extraction Attack Walkthrough
Walkthrough of extracting model weights/behavior through systematic API querying.
Multi-Modal Document Attack Walkthrough
Combine visual and textual injection in documents processed by multimodal LLM applications.
Multi-Turn Persona Escalation Walkthrough
Gradually escalate through persona manipulation across multiple turns to achieve full safety bypass.
Multimodal Image Injection Walkthrough
Step-by-step walkthrough of embedding adversarial prompts in images for vision model exploitation.
PAIR Automated Jailbreak Walkthrough
End-to-end walkthrough of the PAIR attack methodology for automated black-box jailbreaking.
Agent Privilege Escalation Walkthrough
Walkthrough of escalating privileges in multi-agent systems through trust chain exploitation.
Advanced Prompt Leaking Walkthrough
Advanced techniques for extracting system prompts including iterative reconstruction and side-channel methods.
RAG Poisoning End-to-End Walkthrough
Complete walkthrough of poisoning a RAG system from document injection through information extraction.
Reasoning Model Exploitation Walkthrough
Exploit extended thinking in reasoning models to inject false premises and manipulate conclusion generation.
Regex Filter Bypass Walkthrough
Systematically bypass regex-based input filters using Unicode tricks, encoding, and pattern-specific evasion.
Role Confusion Attack Walkthrough
Exploit role confusion between system, user, and assistant messages to override safety instructions.
Role Confusion Exploitation Walkthrough
Exploit role boundaries in chat APIs by injecting assistant-role messages and system-level instructions.
Semantic Camouflage Walkthrough
Walkthrough of crafting semantically camouflaged injections that evade both classifiers and human review.
Semantic Deception Attack Walkthrough
Craft semantically deceptive prompts that appear benign to classifiers while achieving adversarial objectives.
Sleeper Agent Detection Walkthrough
Walkthrough of detecting deceptive sleeper agent behaviors in fine-tuned language models.
Supply Chain Prompt Injection Walkthrough
Plant injection payloads in upstream data sources consumed by LLM applications including packages and documentation.
Model Supply Chain Poisoning
Walkthrough of poisoning ML supply chains through dependency confusion, model weight manipulation, and hub attacks.
Synthetic Identity Injection Walkthrough
Create synthetic identities that exploit LLM trust mechanisms to achieve elevated instruction priority.
TAP Tree of Attacks Walkthrough
Detailed walkthrough of building and running Tree of Attacks for systematic jailbreak generation.
Thought Injection in Reasoning Models Walkthrough
Inject adversarial thoughts into the reasoning chain of thinking models to manipulate final outputs.
Timing Side-Channel Attack Walkthrough
Extract information from LLM applications through timing differences in response generation and safety filter processing.
Token-Level Attack Optimization Walkthrough
Walkthrough of optimizing adversarial token sequences for maximum jailbreak effectiveness.
Tool Shadowing Attack Walkthrough
Register shadow tools that override legitimate tool definitions to intercept and manipulate agent actions.
Vision Model Attack Walkthrough
Attack vision-language models through adversarial images with embedded text, perturbations, and visual trojans.
Vision Model Attack Walkthrough (Attack Walkthrough)
Step-by-step walkthrough of visual prompt injection, adversarial images, and OCR exploitation in vision-language models.
A2A Agent Impersonation Walkthrough
Walkthrough of impersonating a trusted agent in an A2A network to intercept and redirect delegated tasks.
API Parameter Exploitation Walkthrough
Walkthrough of exploiting undocumented and misconfigured API parameters in LLM service endpoints.
Batch API Exploitation Walkthrough
Walkthrough of exploiting batch processing APIs for large-scale prompt injection and output analysis.
Chain-of-Thought Hijacking Walkthrough
Walkthrough of hijacking visible reasoning traces in CoT models to redirect conclusions and bypass safety checks.
Cognitive Overload Jailbreak Walkthrough
Step-by-step walkthrough of the cognitive overload technique that overwhelms safety training with complex nested instructions.
Context Virtualization Attack Walkthrough
Walkthrough of creating virtualized contexts that make the model believe it is operating in a different environment.
Conversation Poisoning Attack Walkthrough
Step-by-step walkthrough of poisoning conversation history to alter model behavior in future turns.
Document Parsing Exploit Chain Walkthrough
End-to-end walkthrough of exploiting document parsers to inject instructions through PDFs, DOCX, and HTML files.
Few-Shot Attack Scaling Analysis
Detailed analysis of how few-shot examples scale to influence model behavior, from 2-shot to many-shot regime.
Image Steganography Injection Walkthrough
Walkthrough of hiding prompt injection payloads in images using steganographic techniques for vision model attacks.
Instruction Hierarchy Exploitation Walkthrough
Walkthrough of exploiting how models prioritize system, user, and tool instructions to override safety directives.
Knowledge Graph Injection Attack Walkthrough
Walkthrough of injecting adversarial facts into knowledge graphs consumed by LLM-based reasoning systems.
LangChain Exploit Chain Walkthrough
Walkthrough of chaining LangChain CVEs for remote code execution from prompt injection through to shell access.
MCP Sampling API Abuse Walkthrough
Walkthrough of exploiting the MCP sampling API to manipulate how servers request and process LLM completions.
Memory Injection and Persistence Walkthrough
Walkthrough of injecting persistent instructions into agent memory systems that survive across sessions.
Multi-Provider Attack Campaign Walkthrough
End-to-end walkthrough of conducting an attack campaign across OpenAI, Anthropic, Google, and open-source models.
Output Format Exfiltration Techniques Walkthrough
Walkthrough of using output format manipulation to exfiltrate data through code blocks, markdown, and structured outputs.
RAG Hybrid Search Poisoning Walkthrough
Walkthrough of poisoning both vector and keyword search in hybrid RAG architectures for maximum retrieval influence.
Recursive Prompt Injection Walkthrough
Walkthrough of creating self-replicating injection payloads that persist through model output-to-input loops.
Refusal Suppression Technique Walkthrough
Walkthrough of refusal suppression techniques that prime models to avoid safety refusal patterns.
Semantic Obfuscation Walkthrough
Walkthrough of semantically obfuscating adversarial payloads so they appear benign to both classifiers and humans.
Structured Output Data Leakage Walkthrough
Walkthrough of using structured output requirements to extract sensitive data embedded in model responses.
Tool Use Confusion Attack Walkthrough
Walkthrough of confusing model tool-use decisions to invoke unintended functions or skip safety-critical tools.
Unicode Normalization Bypass Walkthrough
Step-by-step guide to exploiting Unicode normalization differences between input filters and model tokenizers.
Voice AI Prompt Injection Walkthrough
Walkthrough of injecting prompts into voice-based AI assistants through adversarial audio and ultrasonic signals.
LLM Watermark Analysis Walkthrough
Walkthrough of detecting and analyzing watermarks in LLM-generated text using statistical methods.
XML Injection in LLM Systems Walkthrough
Exploit XML parsing in LLM application pipelines to inject instructions through entity expansion and CDATA sections.
XML and JSON Injection in LLM Apps
Walkthrough of exploiting XML and JSON parsing in LLM applications for injection and data manipulation.
Adversarial Robustness Testing Framework
Build a framework for continuously testing adversarial robustness of deployed LLM defense mechanisms.
Adversarial Training for LLM Defense (Defense Walkthrough)
Implement adversarial training techniques to improve LLM robustness against prompt injection and jailbreaking.
Agent Permission Boundary Enforcement
Implement fine-grained permission boundaries for LLM agents that limit tool access based on context and user role.
Automated Defense Regression Testing
Build automated regression tests for LLM security defenses using attack replay and continuous monitoring.
Automated Red Team Defense Loop
Build an automated continuous red team testing loop that discovers and patches vulnerabilities iteratively.
Behavioral Anomaly Detection for LLMs
Implement behavioral anomaly detection that identifies when model outputs deviate from expected safety profiles.
Building an Input Safety Classifier
Build a production-quality input classifier that detects prompt injection attempts.
Constitutional AI Implementation Guide
Implement constitutional AI principles in a custom fine-tuning and RLHF pipeline.
Context Isolation Pattern Implementation
Implement context isolation patterns that prevent instruction leakage between system prompts and user data.
Data Loss Prevention for LLM Apps
Build a data loss prevention layer for LLM applications that prevents sensitive data exposure in outputs.
Defense Benchmarking System
Build a benchmarking system to continuously evaluate defense effectiveness against known attack classes.
Defense-in-Depth Architecture for LLM Apps
Design and implement a complete defense-in-depth architecture for production LLM applications.
AI Incident Response System Setup
Set up comprehensive incident response capabilities for AI-specific security incidents.
Input Embedding Firewall Deployment
Deploy an embedding-based firewall that detects injection attempts through semantic similarity to known attack patterns.
Secure Input/Output Logging for Defense
Implement secure logging for LLM input/output pairs that supports forensic analysis without exposing sensitive data.
LLM Firewall Architecture Design (Defense Walkthrough)
Design and implement a comprehensive LLM firewall with input analysis, output filtering, and behavioral monitoring.
LLM Honeypot Deployment
Deploy LLM honeypots to detect and study attacker behavior patterns and techniques.
Model Behavior Monitoring Setup
Set up comprehensive model behavior monitoring to detect drift, anomalies, and potential compromise.
Monitoring LLM Applications for Abuse
Build a monitoring and alerting system to detect ongoing attacks against LLM applications.
Multi-Model Safety Consensus
Implement safety consensus mechanisms where multiple models must agree before executing sensitive actions.
Output Filtering Pipeline Design
Design and implement a multi-stage output filtering pipeline for LLM applications.
Output Grounding and Verification
Implement output grounding verification to ensure LLM responses are factually supported by provided context.
Prompt Armor Implementation Guide
Implement a comprehensive prompt armoring system with instruction isolation, delimiter hardening, and priority enforcement.
Prompt Injection Honeypot Setup
Deploy honeypot prompts and canary data that detect and characterize prompt injection attempts.
RAG Input Sanitization Walkthrough
Implement input sanitization for RAG systems to prevent document-based injection attacks.
Real-Time Attack Detection System
Build a real-time attack detection system that monitors LLM interactions for adversarial patterns.
Red Team-Defense Feedback Loop
Build a continuous red team-defense improvement loop with automated testing and metric tracking.
Response Watermarking Implementation
Implement response watermarking to enable traceability and detect unauthorized reproduction of LLM outputs.
Runtime Safety Monitor Implementation
Implement a runtime safety monitor that detects and blocks unsafe model outputs in real-time.
Training Custom Safety Classifiers
Train custom safety classifiers tuned to your application's specific threat model and content policy.
Secure Agent Architecture Design
Design a secure architecture for LLM agent systems with sandboxing, capability controls, and audit trails.
Secure Function Calling Patterns
Implement secure function calling with input validation, output sanitization, and capability restrictions.
Secure RAG Architecture Walkthrough
Design and implement a secure RAG architecture with document sanitization, access controls, and output validation.
Secure RAG Pipeline Architecture
Build a secure RAG pipeline with document sanitization, retrieval validation, and injection-resistant augmentation.
System Prompt Protection Techniques
Implement multiple layers of system prompt protection against extraction, override, and leakage attacks.
AI Security Threat Intelligence
Build a threat intelligence pipeline for staying current with AI security threats and attack techniques.
Token-Level Input Filtering
Implement token-level input filtering that detects injection patterns at the tokenization stage before model processing.
Tool Call Authorization Framework
Implement a tool call authorization framework that validates tool invocations against policy before execution.
Agent Tool Access Control Implementation
Implement fine-grained tool access control for LLM agents with capability-based security and approval workflows.
Anomaly Detection for LLM Traffic
Build anomaly detection for LLM API traffic to identify attack patterns, abuse, and prompt injection attempts.
Conversation Integrity Monitoring
Build a conversation integrity monitoring system that detects manipulation across multi-turn interactions.
Defense Effectiveness Metrics Dashboard
Build a dashboard to measure and visualize the effectiveness of defensive measures against attack categories.
Automated Defense Testing Pipeline
Build an automated pipeline that continuously tests defensive measures against evolving attack techniques.
Embedding Poisoning Detection System
Build a detection system for identifying poisoned documents in vector databases using statistical analysis.
Function Calling Guardrails Implementation
Implement guardrails for function calling that validate tool selection, parameters, and execution scope.
LLM Honeypot Deployment Guide
Deploy LLM-based honeypots to detect, analyze, and learn from attacker techniques in production environments.
Incident Response Playbook for LLM Applications
Design and implement an incident response playbook specific to LLM application security incidents.
LLM Guard Production Deployment Guide
Deploy LLM Guard in a production environment with custom scanners, performance optimization, and monitoring.
MCP Server Security Hardening Guide
Harden MCP server implementations against tool poisoning, transport attacks, and capability escalation.
Multi-Model Defense Ensemble
Build an ensemble defense system using multiple models to cross-validate inputs and outputs for safety.
NeMo Guardrails Advanced Configuration
Advanced NeMo Guardrails configuration including custom actions, knowledge base integration, and multi-model pipelines.
Building an Output Scanning Pipeline
Build a production output scanning pipeline that detects data leakage, harmful content, and injection artifacts.
PII Detection and Redaction for LLMs
Build a PII detection and redaction system for LLM inputs and outputs to prevent data exposure.
Training a Prompt Injection Classifier
Train a custom prompt injection detection classifier using labeled datasets and modern NLP techniques.
RAG Document Sandboxing Implementation
Implement document-level sandboxing for RAG systems to prevent cross-document injection and privilege escalation.
Adaptive Rate Limiting for LLM APIs
Implement adaptive rate limiting that adjusts thresholds based on detected attack patterns and user behavior.
Secure RAG Architecture Implementation
Implement a security-hardened RAG architecture with input sanitization, access control, and output validation.
System Prompt Protection Layers
Implement multi-layer system prompt protection including injection-resistant formatting and runtime validation.
Zero Trust Architecture for LLM Apps
Implement zero trust principles in LLM application architecture with continuous verification and least privilege.
Full Engagement: AI Code Assistant
End-to-end engagement for assessing an AI-powered code assistant with repository access.
Full Engagement: Content Generation Platform
Full engagement walkthrough for a content generation AI platform with brand safety and copyright concerns.
Full Engagement: Customer Analytics AI
Red team engagement for an AI-powered customer analytics system processing sensitive behavioral data.
Full Engagement: AI Developer Tools
End-to-end engagement for AI-powered developer tools with repository access and code execution capabilities.
Full Engagement: Educational AI Platform
End-to-end engagement walkthrough for an AI-powered educational platform with student safety requirements.
Full Engagement: AI Financial Advisor
Full engagement for assessing an AI financial advisor chatbot handling investment recommendations.
Full Engagement: Government AI System
Full red team engagement for a government-deployed AI system with classification and compliance requirements.
Full Engagement: Healthcare AI System
End-to-end engagement walkthrough for a healthcare AI system with HIPAA compliance requirements.
Full Engagement: Internal Knowledge Bot
Full engagement walkthrough for an internal enterprise knowledge bot with RAG and document access.
Full Engagement: Multi-Agent Platform
Full red team engagement of a multi-agent platform with MCP servers and A2A communication.
Full Engagement: SaaS Customer Chatbot
End-to-end engagement walkthrough for assessing a SaaS customer-facing chatbot with tool access.
Full Engagement: ML Supply Chain Audit
End-to-end engagement walkthrough for auditing an organization's ML model supply chain security.
Full Engagement: Autonomous Coding Agent
End-to-end engagement for an autonomous AI coding agent with repo access, CI/CD integration, and deployment.
Full Engagement: AI Content Generation Platform
Red team engagement of an AI content generation platform used for marketing, SEO, and social media.
Full Engagement: Multi-Tool Customer Support Agent
End-to-end engagement for a customer support agent with order management, refund processing, and CRM access.
Full Engagement: DevOps AI Assistant
End-to-end engagement for a DevOps AI assistant with CI/CD, cloud infrastructure, and monitoring access.
Full Engagement: Document Processing Pipeline
Full engagement of an AI document processing pipeline handling invoices, contracts, and compliance documents.
Full Engagement: AI Education Platform
End-to-end engagement walkthrough for assessing an AI-powered education platform with tutoring and grading.
Full Engagement: Government Citizen Services AI
Red team engagement of a government citizen services AI handling tax, benefits, and identity verification.
Full Engagement: HR Benefits Chatbot
Full engagement walkthrough for an HR benefits chatbot with access to employee records and benefits administration.
Full Engagement: AI Insurance Underwriting
Full engagement for an AI underwriting system with risk scoring, policy generation, and claims data access.
Full Engagement: Legal Research AI System
Full red team engagement of a legal research AI with case law access, privilege management, and citation generation.
Full Engagement: Enterprise Multi-Agent System
Full engagement of an enterprise multi-agent system with specialized agents for HR, IT, Finance, and Legal.
Full Engagement: Retail Personalization AI
Red team engagement of an AI personalization engine with access to customer profiles and purchase history.
Full Engagement: AI Security Copilot
Red team engagement of an AI security copilot with access to SIEM, vulnerability scanners, and threat intelligence.
Full Engagement: Supply Chain AI Optimizer
End-to-end engagement for a supply chain AI with access to logistics, inventory, and supplier management systems.
Full Engagement: Telehealth AI Assistant
End-to-end engagement for a telehealth AI assistant with appointment scheduling, symptom assessment, and EHR access.
Walkthroughs
Step-by-step guided walkthroughs covering red team tools, engagement methodology, defense implementation, platform-specific testing, and full engagement workflows.
Adversarial Simulation Design
Design realistic adversarial simulations that model real-world threat actors and attack scenarios for AI systems.
Agentic System Assessment Methodology
Comprehensive methodology for assessing agentic AI systems including tool use, memory, and multi-agent interactions.
AI Penetration Test Planning
Complete methodology for planning AI-specific penetration tests including scope definition, resource allocation, and timeline.
AI Red Team Maturity Model (Methodology Walkthrough)
Maturity model for assessing and improving an organization's AI red teaming capabilities.
AI Security Metrics Framework
Framework for measuring and reporting on AI security posture using quantitative metrics.
AI Vulnerability Classification System
Structured system for classifying AI-specific vulnerabilities by type, impact, and exploitability.
Attack Prioritization Framework
Prioritize attack techniques based on target architecture, time constraints, and likelihood of success.
Attack Tree Construction for LLM Systems
Build systematic attack trees for LLM system assessments using MITRE ATLAS and OWASP mappings.
Automated AI Reconnaissance Workflow
Build an automated reconnaissance workflow that maps AI application architecture, models, and defense configurations.
Collaborative AI Red Team Assessment
Coordinate multi-person red team assessments with role assignments, communication protocols, and finding deconfliction.
Competitive Analysis of AI Security Tools
Methodology for evaluating and comparing AI security tools for red team operations.
Compliance-Driven Testing Methodology
Map regulatory requirements to specific test cases for compliance-driven AI red team assessments.
Continuous Monitoring Integration Methodology
Integrate red team findings into continuous monitoring systems for ongoing threat detection and defense validation.
Finding Deduplication and Triage
Deduplicate and triage findings from automated and manual testing into actionable, prioritized vulnerability reports.
Multi-Model Testing Methodology
Structured methodology for testing applications that use multiple LLM models in their processing pipeline.
Post-Engagement Analysis Methodology
Conduct thorough post-engagement analysis including lessons learned, technique effectiveness, and methodology refinement.
Purple Team AI Assessment Methodology
Conduct collaborative purple team AI assessments with real-time feedback between red and blue team operations.
AI Security Regression Testing Methodology
Design regression testing suites that verify security fixes remain effective across model updates and deployments.
Risk-Based AI Testing Approach
Apply risk-based testing approaches to focus assessment effort on the highest-impact vulnerability categories.
Stakeholder Management in AI Red Teaming
Managing stakeholder expectations and communication throughout AI red team engagements.
Stakeholder-Specific Reporting Methodology
Tailor red team reports for different stakeholders including executives, developers, security teams, and compliance officers.
AI Security Tabletop Exercises
Designing and facilitating tabletop exercises focused on AI security incident scenarios.
Time-Boxed AI Assessment Methodology
Maximize testing coverage within strict time constraints using prioritized attack trees and parallel testing.
AI Attack Surface Enumeration Methodology
Systematic methodology for enumerating the complete attack surface of an AI-powered application.
AI Compliance Testing Methodology
Methodology for testing AI systems against regulatory compliance requirements including EU AI Act and NIST.
Mapping Findings to MITRE ATLAS
Methodology for mapping AI red team findings to MITRE ATLAS tactics, techniques, and procedures.
AI Penetration Test Report Structure
Detailed report structure for AI penetration tests with finding templates and severity scoring.
AI Red Team Scoping Templates
Templates and procedures for scoping AI red team engagements across different application types.
AI Risk Quantification Methodology
Quantitative risk assessment methodology for AI vulnerabilities with probability and impact scoring.
AI Security Tabletop Exercise Design
Design and facilitate AI security tabletop exercises for organizational preparedness assessment.
Evidence Collection During AI Testing
Best practices for collecting, organizing, and preserving evidence during AI red team assessments.
Multi-Model Assessment Methodology
Methodology for assessing applications that use multiple AI models in pipelines or ensemble configurations.
OWASP LLM Top 10 Testing Methodology
Comprehensive testing methodology for each vulnerability in the OWASP LLM Top 10 2025.
Purple Team Operations for AI Security
Methodology for conducting purple team operations that combine red team attacks with blue team defense improvement.
AI Security Regression Testing Methodology (Methodology Walkthrough)
Methodology for continuous regression testing of AI application security after updates and model changes.
Communicating AI Risks to Stakeholders
Guide for communicating AI security risks to technical and non-technical stakeholders effectively.
Threat Intelligence for AI Systems
Methodology for gathering and applying threat intelligence specific to AI system attacks and defenses.
AI Vulnerability Prioritization Framework
Framework for prioritizing AI vulnerabilities by exploitability, impact, and remediation cost.
Writing AI Red Team Reports
Guide to writing clear, actionable AI red team assessment reports with findings and recommendations.
Testing AI21 Labs Models
Red team testing guide for AI21 Labs Jamba models including long context and efficiency features.
Testing Anthropic Claude: Complete Guide
Complete red team testing guide for Anthropic's Claude including tool use, extended thinking, and computer use.
Testing AWS Bedrock Deployments
Red team testing guide for models deployed via AWS Bedrock including guardrails and access controls.
Testing Azure OpenAI Service
Red team testing guide for Azure OpenAI including content filtering, managed identity, and network controls.
Testing Cohere Models
Red team testing guide for Cohere's Command-R models including RAG and tool use features.
Testing DeepSeek Models
Red team testing guide for DeepSeek models including R1 reasoning exploitation and MoE architecture.
Testing Fireworks AI Platform
Red team testing guide for Fireworks AI including function calling and compound AI systems.
Testing Google Gemini: Complete Guide
Complete red team testing guide for Google's Gemini including multimodal inputs and long context.
Testing Groq Inference Platform
Red team testing guide for Groq's high-speed inference platform and its security characteristics.
Testing Hugging Face Hosted Models
Red team testing guide for models hosted on Hugging Face including Inference API and Spaces.
Testing Meta Llama: Complete Guide
Complete red team testing guide for Meta's Llama models including fine-tuning and self-hosted deployment.
Testing Mistral AI Models
Complete red team testing guide for Mistral AI models including Mixtral MoE architecture and chat endpoints.
Testing Ollama Local Deployments
Security testing guide for locally deployed models via Ollama including network exposure and API security.
Testing OpenAI GPT-4o: Complete Guide
Complete red team testing guide for OpenAI's GPT-4o including API, function calling, and structured outputs.
Testing Replicate-Hosted Models
Red team testing guide for models hosted on Replicate including open-source model deployments.
Testing Together AI Platform
Red team testing guide for Together AI including fine-tuned model endpoints and custom deployments.
Testing AI21 Jamba Models
Red team testing guide for AI21's Jamba models with SSM-Transformer hybrid architecture analysis.
Testing Anyscale/Ray Serve Endpoints
Red team testing guide for models served via Anyscale and Ray Serve including distributed inference security.
Testing Cohere Models: Complete Guide
Complete red team testing guide for Cohere's Command models including RAG integration and tool use.
Testing Fireworks AI Deployments
Red team testing guide for models deployed via Fireworks AI including function calling and fine-tuned models.
Testing GCP Vertex AI Deployments
Red team testing guide for models deployed via GCP Vertex AI including Model Garden and custom endpoints.
Testing Mistral/Mixtral: Complete Guide
Complete red team testing guide for Mistral and Mixtral models including MoE routing and API vulnerabilities.
Testing Ollama Local Deployments (Platform Walkthrough)
Red team testing guide for models deployed locally via Ollama including API endpoints and model management.
Testing Replicate Hosted Models
Red team testing guide for models hosted on Replicate including API exploitation and model versioning.
Testing Together AI Hosted Models
Red team testing guide for models hosted on Together AI's inference platform.
Testing vLLM Inference Deployments
Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.
Security Benchmark Runner Development
Build a benchmark runner for standardized evaluation of LLM security across models and configurations.
Building an Attack Replay Tool
Build a tool that records and replays attack sequences for regression testing and defense validation.
Counterfit ML Security Testing
Use Microsoft's Counterfit for adversarial ML testing of deployed model endpoints.
Building a Custom LLM Fuzzer
Build a custom fuzzing tool for LLM applications with mutation engines and coverage tracking.
Building a Custom Payload Mutation Engine
Develop a mutation engine that evolves prompt injection payloads through genetic algorithms.
Garak Advanced Configuration Guide
Deep dive into Garak's configuration system including custom probes, detectors, and generators.
Building Custom Garak Probes (Tool Walkthrough)
Develop custom probe modules for NVIDIA's Garak to test domain-specific vulnerabilities.
HarmBench Custom Behavior Sets
Extend HarmBench with custom behavior specifications for domain-specific vulnerability testing.
HarmBench Evaluation Walkthrough
Run standardized attack evaluations using the HarmBench framework and interpret results.
Tool Walkthroughs
End-to-end practical walkthroughs for essential AI red teaming tools, covering installation, configuration, execution, and result interpretation.
Inspect AI Safety Evaluations
Build and run AI safety evaluations using the UK AISI Inspect framework.
JailbreakBench Usage and Submission
Use JailbreakBench to evaluate jailbreak techniques and submit results to the benchmark.
LLM Guard Deployment and Testing
Deploy LLM Guard for input/output scanning and test its effectiveness against common attacks.
Building an LLM Traffic Analyzer
Build a proxy-based LLM traffic analyzer for intercepting and analyzing API communications.
Promptfoo CI/CD Pipeline Integration
Integrate Promptfoo into CI/CD pipelines for continuous LLM security regression testing.
Promptfoo for Red Team Evaluation
Configure Promptfoo for comprehensive red team evaluation with custom assertions and graders.
PyRIT Multi-Turn Campaign Orchestration
Build and run sophisticated multi-turn attack campaigns using PyRIT's orchestrator system.
PyRIT Custom Scoring Integration
Integrate custom scoring metrics into PyRIT for organization-specific red team evaluation criteria.
Agent Security Scanner Development
Build an automated security scanner for AI agent systems that tests tool use, memory, and workflow vulnerabilities.
Automated Red Team Report Generation
Build an automated system for generating structured red team reports from testing data and findings.
Building an LLM Attack Proxy
Build an intercepting proxy for LLM API traffic that enables payload injection, response analysis, and logging.
Defense Benchmarking Tool Development
Build a tool for benchmarking the effectiveness of defensive measures against standardized attack suites.
Building an Embedding Attack Toolkit
Build a toolkit for conducting embedding-level attacks including inversion, poisoning, and manipulation.
Garak Custom Plugin Development
Develop custom Garak plugins including probes, detectors, and generators for organization-specific testing.
Inspect AI Custom Solver Development
Build custom solvers in Inspect AI for specialized safety evaluation scenarios and attack simulations.
Jailbreak Dataset Curation Tool
Build a tool for curating, organizing, and testing jailbreak prompt datasets with effectiveness tracking.
LLM Traffic Analysis Tool
Build a tool for analyzing and visualizing LLM API traffic patterns to identify attack indicators.
MCP Security Audit Tool
Build a tool for auditing MCP server implementations for common security vulnerabilities and misconfigurations.
Multi-Model Test Harness Construction
Build a unified test harness for running attacks across OpenAI, Anthropic, Google, and local model endpoints.
Payload Mutation Framework Development
Build a payload mutation framework with grammar-based, semantic, and adversarial mutation strategies.
Promptfoo CI/CD Pipeline Integration (Tool Walkthrough)
Integrate Promptfoo red team evaluations into CI/CD pipelines for automated security regression testing.
PyRIT Custom Converter Development
Build custom PyRIT converters for payload transformation, encoding, and multi-modal attack generation.
RAG Security Testing Framework
Build a framework for systematic security testing of RAG applications including poisoning and exfiltration.