# walkthroughs
標記為「walkthroughs」的 544 篇文章
A2A Message Interception Walkthrough
Intercept and manipulate messages between agents in A2A protocol-based multi-agent systems.
A2A Protocol Injection Walkthrough
Walkthrough of exploiting Google's Agent-to-Agent protocol for inter-agent prompt injection.
Adversarial Suffix Crafting Walkthrough
Craft adversarial suffixes using gradient-based and gradient-free optimization methods for black-box models.
API Abuse Chain Attack Walkthrough
Chain multiple API calls to achieve unauthorized actions that no single call would permit.
API Chaining Exploitation Walkthrough
Walkthrough of chaining multiple API calls in agent systems to achieve multi-step unauthorized actions.
Attention Hijacking Attack Walkthrough
Hijack transformer attention mechanisms to redirect model focus toward adversarial instructions in the context.
AutoDAN Stealthy Jailbreak Walkthrough
Walkthrough of generating stealthy, human-readable jailbreak prompts using the AutoDAN methodology.
Batch Processing Injection Walkthrough
Inject payloads through batch processing pipelines where individual items are processed without isolation.
LLM Cache Poisoning Walkthrough
Poison LLM response caches to serve adversarial content to other users without direct injection.
Competition-Style Jailbreak Techniques
Walkthrough of jailbreak techniques used in AI security competitions and CTF events.
Computer Use Agent Injection Walkthrough
Walkthrough of injecting prompts through UI elements and screenshots processed by computer-use agents.
Advanced Context Manipulation Walkthrough
Manipulate conversation context through history injection, context overflow, and attention weight exploitation.
Context Manipulation Deep Dive
Advanced walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.
Crescendo Multi-Turn Jailbreak Walkthrough
Step-by-step walkthrough of the Crescendo technique for gradually escalating multi-turn jailbreaks.
Cross-Plugin Data Exfiltration Walkthrough
Walkthrough of chaining multiple plugins/tools to exfiltrate data from LLM agent systems.
Data Harvesting Through LLM Apps
Complete walkthrough of systematic data extraction from LLM applications using various exfiltration channels.
Document-Based Injection Walkthrough
Inject prompts through documents processed by LLM applications including PDFs, spreadsheets, and presentations.
Embedding Collision Attack Walkthrough
Craft documents that collide in embedding space with target queries to hijack RAG retrieval results.
Embedding Inversion Attack Walkthrough
Walkthrough of inverting text embeddings to recover original documents from vector databases.
Encoding Chain Attack Walkthrough
Chain multiple encoding transformations to bypass input filters that only decode one layer of encoding.
Encoding Chain Bypass Walkthrough
Walkthrough of chaining Base64, URL encoding, and Unicode tricks to bypass multi-layer input filters.
Few-Shot Example Poisoning Walkthrough
Poison few-shot examples in prompts to establish behavioral patterns that override system instructions.
Fine-Tuning Safety Bypass Walkthrough
Walkthrough of using fine-tuning API access to remove safety behaviors from aligned models.
Function Calling Exploitation Guide
Complete walkthrough of exploiting function calling in OpenAI, Anthropic, and Google AI APIs.
Function Schema Poisoning Walkthrough
Poison function schemas to inject hidden instructions that redirect model tool selection and parameter filling.
GCG Adversarial Suffix Attack Walkthrough
Complete walkthrough of implementing the Greedy Coordinate Gradient adversarial suffix attack.
Real-World Indirect Prompt Injection
Walkthrough of planting and triggering indirect prompt injection in web-browsing AI assistants.
JSON Injection Attack Walkthrough
Exploit JSON parsing and generation in LLM applications to inject payloads through structured data boundaries.
Logit Bias Exploitation Walkthrough
Exploit logit bias API parameters to force models toward specific token sequences and bypass safety filters.
Many-Shot Jailbreaking Walkthrough
Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.
Advanced Markdown Injection Walkthrough
Inject Markdown that triggers data exfiltration through image rendering, link generation, and code block escape.
MCP Tool Poisoning Attack Walkthrough
Walkthrough of exploiting MCP tool descriptions to redirect agent behavior via hidden instructions.
Memory Persistence Attack Walkthrough
Walkthrough of achieving persistent memory manipulation in agent systems for cross-session influence.
Model Context Window Overflow Walkthrough
Overflow the context window to push safety instructions outside the effective attention range.
Model Extraction Attack Walkthrough
Walkthrough of extracting model weights/behavior through systematic API querying.
Multi-Modal Document Attack Walkthrough
Combine visual and textual injection in documents processed by multimodal LLM applications.
Multi-Turn Persona Escalation Walkthrough
Gradually escalate through persona manipulation across multiple turns to achieve full safety bypass.
Multimodal Image Injection Walkthrough
Step-by-step walkthrough of embedding adversarial prompts in images for vision model exploitation.
PAIR Automated Jailbreak Walkthrough
End-to-end walkthrough of the PAIR attack methodology for automated black-box jailbreaking.
Agent Privilege Escalation Walkthrough
Walkthrough of escalating privileges in multi-agent systems through trust chain exploitation.
Advanced Prompt Leaking Walkthrough
Advanced techniques for extracting system prompts including iterative reconstruction and side-channel methods.
RAG Poisoning End-to-End Walkthrough
Complete walkthrough of poisoning a RAG system from document injection through information extraction.
Reasoning Model Exploitation Walkthrough
Exploit extended thinking in reasoning models to inject false premises and manipulate conclusion generation.
Regex Filter Bypass Walkthrough
Systematically bypass regex-based input filters using Unicode tricks, encoding, and pattern-specific evasion.
Role Confusion Attack Walkthrough
Exploit role confusion between system, user, and assistant messages to override safety instructions.
Role Confusion Exploitation Walkthrough
Exploit role boundaries in chat APIs by injecting assistant-role messages and system-level instructions.
Semantic Camouflage Walkthrough
Walkthrough of crafting semantically camouflaged injections that evade both classifiers and human review.
Semantic Deception Attack Walkthrough
Craft semantically deceptive prompts that appear benign to classifiers while achieving adversarial objectives.
Sleeper Agent Detection Walkthrough
Walkthrough of detecting deceptive sleeper agent behaviors in fine-tuned language models.
Supply Chain Prompt Injection Walkthrough
Plant injection payloads in upstream data sources consumed by LLM applications including packages and documentation.
Model Supply Chain Poisoning
Walkthrough of poisoning ML supply chains through dependency confusion, model weight manipulation, and hub attacks.
Synthetic Identity Injection Walkthrough
Create synthetic identities that exploit LLM trust mechanisms to achieve elevated instruction priority.
TAP Tree of Attacks Walkthrough
Detailed walkthrough of building and running Tree of Attacks for systematic jailbreak generation.
Thought Injection in Reasoning Models Walkthrough
Inject adversarial thoughts into the reasoning chain of thinking models to manipulate final outputs.
Timing Side-Channel Attack Walkthrough
Extract information from LLM applications through timing differences in response generation and safety filter processing.
Token-Level Attack Optimization Walkthrough
Walkthrough of optimizing adversarial token sequences for maximum jailbreak effectiveness.
Tool Shadowing Attack Walkthrough
Register shadow tools that override legitimate tool definitions to intercept and manipulate agent actions.
Vision Model Attack Walkthrough
Attack vision-language models through adversarial images with embedded text, perturbations, and visual trojans.
Vision Model Attack Walkthrough (Attack Walkthrough)
Step-by-step walkthrough of visual prompt injection, adversarial images, and OCR exploitation in vision-language models.
A2A Agent Impersonation Walkthrough
Walkthrough of impersonating a trusted agent in an A2A network to intercept and redirect delegated tasks.
API Parameter Exploitation Walkthrough
Walkthrough of exploiting undocumented and misconfigured API parameters in LLM service endpoints.
Batch API Exploitation Walkthrough
Walkthrough of exploiting batch processing APIs for large-scale prompt injection and output analysis.
Chain-of-Thought Hijacking Walkthrough
Walkthrough of hijacking visible reasoning traces in CoT models to redirect conclusions and bypass safety checks.
Cognitive Overload Jailbreak Walkthrough
Step-by-step walkthrough of the cognitive overload technique that overwhelms safety training with complex nested instructions.
Context Virtualization Attack Walkthrough
Walkthrough of creating virtualized contexts that make the model believe it is operating in a different environment.
Conversation Poisoning Attack Walkthrough
Step-by-step walkthrough of poisoning conversation history to alter model behavior in future turns.
Document Parsing Exploit Chain Walkthrough
End-to-end walkthrough of exploiting document parsers to inject instructions through PDFs, DOCX, and HTML files.
Few-Shot Attack Scaling Analysis
Detailed analysis of how few-shot examples scale to influence model behavior, from 2-shot to many-shot regime.
Image Steganography Injection Walkthrough
Walkthrough of hiding prompt injection payloads in images using steganographic techniques for vision model attacks.
Instruction Hierarchy Exploitation Walkthrough
Walkthrough of exploiting how models prioritize system, user, and tool instructions to override safety directives.
Knowledge Graph Injection Attack Walkthrough
Walkthrough of injecting adversarial facts into knowledge graphs consumed by LLM-based reasoning systems.
LangChain Exploit Chain Walkthrough
Walkthrough of chaining LangChain CVEs for remote code execution from prompt injection through to shell access.
MCP Sampling API Abuse Walkthrough
Walkthrough of exploiting the MCP sampling API to manipulate how servers request and process LLM completions.
Memory Injection and Persistence Walkthrough
Walkthrough of injecting persistent instructions into agent memory systems that survive across sessions.
Multi-Provider Attack Campaign Walkthrough
End-to-end walkthrough of conducting an attack campaign across OpenAI, Anthropic, Google, and open-source models.
Output Format Exfiltration Techniques Walkthrough
Walkthrough of using output format manipulation to exfiltrate data through code blocks, markdown, and structured outputs.
RAG Hybrid Search Poisoning Walkthrough
Walkthrough of poisoning both vector and keyword search in hybrid RAG architectures for maximum retrieval influence.
Recursive Prompt Injection Walkthrough
Walkthrough of creating self-replicating injection payloads that persist through model output-to-input loops.
Refusal Suppression Technique Walkthrough
Walkthrough of refusal suppression techniques that prime models to avoid safety refusal patterns.
Semantic Obfuscation Walkthrough
Walkthrough of semantically obfuscating adversarial payloads so they appear benign to both classifiers and humans.
Structured Output Data Leakage Walkthrough
Walkthrough of using structured output requirements to extract sensitive data embedded in model responses.
Tool Use Confusion Attack Walkthrough
Walkthrough of confusing model tool-use decisions to invoke unintended functions or skip safety-critical tools.
Unicode Normalization Bypass Walkthrough
Step-by-step guide to exploiting Unicode normalization differences between input filters and model tokenizers.
Voice AI Prompt Injection Walkthrough
Walkthrough of injecting prompts into voice-based AI assistants through adversarial audio and ultrasonic signals.
LLM Watermark Analysis Walkthrough
Walkthrough of detecting and analyzing watermarks in LLM-generated text using statistical methods.
XML Injection in LLM Systems Walkthrough
Exploit XML parsing in LLM application pipelines to inject instructions through entity expansion and CDATA sections.
XML and JSON Injection in LLM Apps
Walkthrough of exploiting XML and JSON parsing in LLM applications for injection and data manipulation.
Adversarial Robustness Testing Framework
Build a framework for continuously testing adversarial robustness of deployed LLM defense mechanisms.
Adversarial Training for LLM Defense (Defense Walkthrough)
Implement adversarial training techniques to improve LLM robustness against prompt injection and jailbreaking.
Agent Permission Boundary Enforcement
Implement fine-grained permission boundaries for LLM agents that limit tool access based on context and user role.
Automated Defense Regression Testing
Build automated regression tests for LLM security defenses using attack replay and continuous monitoring.
Automated Red Team Defense Loop
Build an automated continuous red team testing loop that discovers and patches vulnerabilities iteratively.
Behavioral Anomaly Detection for LLMs
Implement behavioral anomaly detection that identifies when model outputs deviate from expected safety profiles.
Building an Input Safety Classifier
Build a production-quality input classifier that detects prompt injection attempts.
Constitutional AI Implementation Guide
Implement constitutional AI principles in a custom fine-tuning and RLHF pipeline.
Context Isolation Pattern Implementation
Implement context isolation patterns that prevent instruction leakage between system prompts and user data.
Data Loss Prevention for LLM Apps
Build a data loss prevention layer for LLM applications that prevents sensitive data exposure in outputs.
Defense Benchmarking System
Build a benchmarking system to continuously evaluate defense effectiveness against known attack classes.
Defense-in-Depth Architecture for LLM Apps
Design and implement a complete defense-in-depth architecture for production LLM applications.
AI Incident Response System Setup
Set up comprehensive incident response capabilities for AI-specific security incidents.
Input Embedding Firewall Deployment
Deploy an embedding-based firewall that detects injection attempts through semantic similarity to known attack patterns.
Secure Input/Output Logging for Defense
Implement secure logging for LLM input/output pairs that supports forensic analysis without exposing sensitive data.
LLM Firewall Architecture Design (Defense Walkthrough)
Design and implement a comprehensive LLM firewall with input analysis, output filtering, and behavioral monitoring.
LLM Honeypot Deployment
Deploy LLM honeypots to detect and study attacker behavior patterns and techniques.
Model Behavior Monitoring Setup
Set up comprehensive model behavior monitoring to detect drift, anomalies, and potential compromise.
Monitoring LLM Applications for Abuse
Build a monitoring and alerting system to detect ongoing attacks against LLM applications.
Multi-Model Safety Consensus
Implement safety consensus mechanisms where multiple models must agree before executing sensitive actions.
Output Filtering Pipeline Design
Design and implement a multi-stage output filtering pipeline for LLM applications.
Output Grounding and Verification
Implement output grounding verification to ensure LLM responses are factually supported by provided context.
Prompt Armor Implementation Guide
Implement a comprehensive prompt armoring system with instruction isolation, delimiter hardening, and priority enforcement.
Prompt Injection Honeypot Setup
Deploy honeypot prompts and canary data that detect and characterize prompt injection attempts.
RAG Input Sanitization Walkthrough
Implement input sanitization for RAG systems to prevent document-based injection attacks.
Real-Time Attack Detection System
Build a real-time attack detection system that monitors LLM interactions for adversarial patterns.
Red Team-Defense Feedback Loop
Build a continuous red team-defense improvement loop with automated testing and metric tracking.
Response Watermarking Implementation
Implement response watermarking to enable traceability and detect unauthorized reproduction of LLM outputs.
Runtime Safety Monitor Implementation
Implement a runtime safety monitor that detects and blocks unsafe model outputs in real-time.
Training Custom Safety Classifiers
Train custom safety classifiers tuned to your application's specific threat model and content policy.
Secure Agent Architecture Design
Design a secure architecture for LLM agent systems with sandboxing, capability controls, and audit trails.
Secure Function Calling Patterns
Implement secure function calling with input validation, output sanitization, and capability restrictions.
Secure RAG Architecture Walkthrough
Design and implement a secure RAG architecture with document sanitization, access controls, and output validation.
Secure RAG Pipeline Architecture
Build a secure RAG pipeline with document sanitization, retrieval validation, and injection-resistant augmentation.
System Prompt Protection Techniques
Implement multiple layers of system prompt protection against extraction, override, and leakage attacks.
AI Security Threat Intelligence
Build a threat intelligence pipeline for staying current with AI security threats and attack techniques.
Token-Level Input Filtering
Implement token-level input filtering that detects injection patterns at the tokenization stage before model processing.
Tool Call Authorization Framework
Implement a tool call authorization framework that validates tool invocations against policy before execution.
Agent Tool Access Control Implementation
Implement fine-grained tool access control for LLM agents with capability-based security and approval workflows.
Anomaly Detection for LLM Traffic
Build anomaly detection for LLM API traffic to identify attack patterns, abuse, and prompt injection attempts.
Conversation Integrity Monitoring
Build a conversation integrity monitoring system that detects manipulation across multi-turn interactions.
Defense Effectiveness Metrics Dashboard
Build a dashboard to measure and visualize the effectiveness of defensive measures against attack categories.
Automated Defense Testing Pipeline
Build an automated pipeline that continuously tests defensive measures against evolving attack techniques.
Embedding Poisoning Detection System
Build a detection system for identifying poisoned documents in vector databases using statistical analysis.
Function Calling Guardrails Implementation
Implement guardrails for function calling that validate tool selection, parameters, and execution scope.
LLM Honeypot Deployment Guide
Deploy LLM-based honeypots to detect, analyze, and learn from attacker techniques in production environments.
Incident Response Playbook for LLM Applications
Design and implement an incident response playbook specific to LLM application security incidents.
LLM Guard Production Deployment Guide
Deploy LLM Guard in a production environment with custom scanners, performance optimization, and monitoring.
MCP Server Security Hardening Guide
Harden MCP server implementations against tool poisoning, transport attacks, and capability escalation.
Multi-Model Defense Ensemble
Build an ensemble defense system using multiple models to cross-validate inputs and outputs for safety.
NeMo Guardrails Advanced Configuration
Advanced NeMo Guardrails configuration including custom actions, knowledge base integration, and multi-model pipelines.
Building an Output Scanning Pipeline
Build a production output scanning pipeline that detects data leakage, harmful content, and injection artifacts.
PII Detection and Redaction for LLMs
Build a PII detection and redaction system for LLM inputs and outputs to prevent data exposure.
Training a Prompt Injection Classifier
Train a custom prompt injection detection classifier using labeled datasets and modern NLP techniques.
RAG Document Sandboxing Implementation
Implement document-level sandboxing for RAG systems to prevent cross-document injection and privilege escalation.
Adaptive Rate Limiting for LLM APIs
Implement adaptive rate limiting that adjusts thresholds based on detected attack patterns and user behavior.
Secure RAG Architecture Implementation
Implement a security-hardened RAG architecture with input sanitization, access control, and output validation.
System Prompt Protection Layers
Implement multi-layer system prompt protection including injection-resistant formatting and runtime validation.
Zero Trust Architecture for LLM Apps
Implement zero trust principles in LLM application architecture with continuous verification and least privilege.
Full Engagement: AI Code Assistant
End-to-end engagement for assessing an AI-powered code assistant with repository access.
Full Engagement: Content Generation Platform
Full engagement walkthrough for a content generation AI platform with brand safety and copyright concerns.
Full Engagement: Customer Analytics AI
Red team engagement for an AI-powered customer analytics system processing sensitive behavioral data.
Full Engagement: AI Developer Tools
End-to-end engagement for AI-powered developer tools with repository access and code execution capabilities.
Full Engagement: Educational AI Platform
End-to-end engagement walkthrough for an AI-powered educational platform with student safety requirements.
Full Engagement: AI Financial Advisor
Full engagement for assessing an AI financial advisor chatbot handling investment recommendations.
Full Engagement: Government AI System
Full red team engagement for a government-deployed AI system with classification and compliance requirements.
Full Engagement: Healthcare AI System
End-to-end engagement walkthrough for a healthcare AI system with HIPAA compliance requirements.
Full Engagement: Internal Knowledge Bot
Full engagement walkthrough for an internal enterprise knowledge bot with RAG and document access.
Full Engagement: Multi-Agent Platform
Full red team engagement of a multi-agent platform with MCP servers and A2A communication.
Full Engagement: SaaS Customer Chatbot
End-to-end engagement walkthrough for assessing a SaaS customer-facing chatbot with tool access.
Full Engagement: ML Supply Chain Audit
End-to-end engagement walkthrough for auditing an organization's ML model supply chain security.
Full Engagement: Autonomous Coding Agent
End-to-end engagement for an autonomous AI coding agent with repo access, CI/CD integration, and deployment.
Full Engagement: AI Content Generation Platform
Red team engagement of an AI content generation platform used for marketing, SEO, and social media.
Full Engagement: Multi-Tool Customer Support Agent
End-to-end engagement for a customer support agent with order management, refund processing, and CRM access.
Full Engagement: DevOps AI Assistant
End-to-end engagement for a DevOps AI assistant with CI/CD, cloud infrastructure, and monitoring access.
Full Engagement: Document Processing Pipeline
Full engagement of an AI document processing pipeline handling invoices, contracts, and compliance documents.
Full Engagement: AI Education Platform
End-to-end engagement walkthrough for assessing an AI-powered education platform with tutoring and grading.
Full Engagement: Government Citizen Services AI
Red team engagement of a government citizen services AI handling tax, benefits, and identity verification.
Full Engagement: HR Benefits Chatbot
Full engagement walkthrough for an HR benefits chatbot with access to employee records and benefits administration.
Full Engagement: AI Insurance Underwriting
Full engagement for an AI underwriting system with risk scoring, policy generation, and claims data access.
Full Engagement: Legal Research AI System
Full red team engagement of a legal research AI with case law access, privilege management, and citation generation.
Full Engagement: Enterprise Multi-Agent System
Full engagement of an enterprise multi-agent system with specialized agents for HR, IT, Finance, and Legal.
Full Engagement: Retail Personalization AI
Red team engagement of an AI personalization engine with access to customer profiles and purchase history.
Full Engagement: AI Security Copilot
Red team engagement of an AI security copilot with access to SIEM, vulnerability scanners, and threat intelligence.
Full Engagement: Supply Chain AI Optimizer
End-to-end engagement for a supply chain AI with access to logistics, inventory, and supplier management systems.
Full Engagement: Telehealth AI Assistant
End-to-end engagement for a telehealth AI assistant with appointment scheduling, symptom assessment, and EHR access.
Walkthroughs
Step-by-step guided walkthroughs covering red team tools, engagement methodology, defense implementation, platform-specific testing, and full engagement workflows.
Adversarial Simulation Design
Design realistic adversarial simulations that model real-world threat actors and attack scenarios for AI systems.
Agentic System Assessment Methodology
Comprehensive methodology for assessing agentic AI systems including tool use, memory, and multi-agent interactions.
AI Penetration Test Planning
Complete methodology for planning AI-specific penetration tests including scope definition, resource allocation, and timeline.
AI Red Team Maturity Model (Methodology Walkthrough)
Maturity model for assessing and improving an organization's AI red teaming capabilities.
AI Security Metrics Framework
Framework for measuring and reporting on AI security posture using quantitative metrics.
AI Vulnerability Classification System
Structured system for classifying AI-specific vulnerabilities by type, impact, and exploitability.
Attack Prioritization Framework
Prioritize attack techniques based on target architecture, time constraints, and likelihood of success.
Attack Tree Construction for LLM Systems
Build systematic attack trees for LLM system assessments using MITRE ATLAS and OWASP mappings.
Automated AI Reconnaissance Workflow
Build an automated reconnaissance workflow that maps AI application architecture, models, and defense configurations.
Collaborative AI Red Team Assessment
Coordinate multi-person red team assessments with role assignments, communication protocols, and finding deconfliction.
Competitive Analysis of AI Security Tools
Methodology for evaluating and comparing AI security tools for red team operations.
Compliance-Driven Testing Methodology
Map regulatory requirements to specific test cases for compliance-driven AI red team assessments.
Continuous Monitoring Integration Methodology
Integrate red team findings into continuous monitoring systems for ongoing threat detection and defense validation.
Finding Deduplication and Triage
Deduplicate and triage findings from automated and manual testing into actionable, prioritized vulnerability reports.
Multi-Model Testing Methodology
Structured methodology for testing applications that use multiple LLM models in their processing pipeline.
Post-Engagement Analysis Methodology
Conduct thorough post-engagement analysis including lessons learned, technique effectiveness, and methodology refinement.
Purple Team AI Assessment Methodology
Conduct collaborative purple team AI assessments with real-time feedback between red and blue team operations.
AI Security Regression Testing Methodology
Design regression testing suites that verify security fixes remain effective across model updates and deployments.
Risk-Based AI Testing Approach
Apply risk-based testing approaches to focus assessment effort on the highest-impact vulnerability categories.
Stakeholder Management in AI Red Teaming
Managing stakeholder expectations and communication throughout AI red team engagements.
Stakeholder-Specific Reporting Methodology
Tailor red team reports for different stakeholders including executives, developers, security teams, and compliance officers.
AI Security Tabletop Exercises
Designing and facilitating tabletop exercises focused on AI security incident scenarios.
Time-Boxed AI Assessment Methodology
Maximize testing coverage within strict time constraints using prioritized attack trees and parallel testing.
AI Attack Surface Enumeration Methodology
Systematic methodology for enumerating the complete attack surface of an AI-powered application.
AI Compliance Testing Methodology
Methodology for testing AI systems against regulatory compliance requirements including EU AI Act and NIST.
Mapping Findings to MITRE ATLAS
Methodology for mapping AI red team findings to MITRE ATLAS tactics, techniques, and procedures.
AI Penetration Test Report Structure
Detailed report structure for AI penetration tests with finding templates and severity scoring.
AI Red Team Scoping Templates
Templates and procedures for scoping AI red team engagements across different application types.
AI Risk Quantification Methodology
Quantitative risk assessment methodology for AI vulnerabilities with probability and impact scoring.
AI Security Tabletop Exercise Design
Design and facilitate AI security tabletop exercises for organizational preparedness assessment.
Evidence Collection During AI Testing
Best practices for collecting, organizing, and preserving evidence during AI red team assessments.
Multi-Model Assessment Methodology
Methodology for assessing applications that use multiple AI models in pipelines or ensemble configurations.
OWASP LLM Top 10 Testing Methodology
Comprehensive testing methodology for each vulnerability in the OWASP LLM Top 10 2025.
Purple Team Operations for AI Security
Methodology for conducting purple team operations that combine red team attacks with blue team defense improvement.
AI Security Regression Testing Methodology (Methodology Walkthrough)
Methodology for continuous regression testing of AI application security after updates and model changes.
Communicating AI Risks to Stakeholders
Guide for communicating AI security risks to technical and non-technical stakeholders effectively.
Threat Intelligence for AI Systems
Methodology for gathering and applying threat intelligence specific to AI system attacks and defenses.
AI Vulnerability Prioritization Framework
Framework for prioritizing AI vulnerabilities by exploitability, impact, and remediation cost.
Writing AI Red Team Reports
Guide to writing clear, actionable AI red team assessment reports with findings and recommendations.
Testing AI21 Labs Models
Red team testing guide for AI21 Labs Jamba models including long context and efficiency features.
Testing Anthropic Claude: Complete Guide
Complete red team testing guide for Anthropic's Claude including tool use, extended thinking, and computer use.
Testing AWS Bedrock Deployments
Red team testing guide for models deployed via AWS Bedrock including guardrails and access controls.
Testing Azure OpenAI Service
Red team testing guide for Azure OpenAI including content filtering, managed identity, and network controls.
Testing Cohere Models
Red team testing guide for Cohere's Command-R models including RAG and tool use features.
Testing DeepSeek Models
Red team testing guide for DeepSeek models including R1 reasoning exploitation and MoE architecture.
Testing Fireworks AI Platform
Red team testing guide for Fireworks AI including function calling and compound AI systems.
Testing Google Gemini: Complete Guide
Complete red team testing guide for Google's Gemini including multimodal inputs and long context.
Testing Groq Inference Platform
Red team testing guide for Groq's high-speed inference platform and its security characteristics.
Testing Hugging Face Hosted Models
Red team testing guide for models hosted on Hugging Face including Inference API and Spaces.
Testing Meta Llama: Complete Guide
Complete red team testing guide for Meta's Llama models including fine-tuning and self-hosted deployment.
Testing Mistral AI Models
Complete red team testing guide for Mistral AI models including Mixtral MoE architecture and chat endpoints.
Testing Ollama Local Deployments
Security testing guide for locally deployed models via Ollama including network exposure and API security.
Testing OpenAI GPT-4o: Complete Guide
Complete red team testing guide for OpenAI's GPT-4o including API, function calling, and structured outputs.
Testing Replicate-Hosted Models
Red team testing guide for models hosted on Replicate including open-source model deployments.
Testing Together AI Platform
Red team testing guide for Together AI including fine-tuned model endpoints and custom deployments.
Testing AI21 Jamba Models
Red team testing guide for AI21's Jamba models with SSM-Transformer hybrid architecture analysis.
Testing Anyscale/Ray Serve Endpoints
Red team testing guide for models served via Anyscale and Ray Serve including distributed inference security.
Testing Cohere Models: Complete Guide
Complete red team testing guide for Cohere's Command models including RAG integration and tool use.
Testing Fireworks AI Deployments
Red team testing guide for models deployed via Fireworks AI including function calling and fine-tuned models.
Testing GCP Vertex AI Deployments
Red team testing guide for models deployed via GCP Vertex AI including Model Garden and custom endpoints.
Testing Mistral/Mixtral: Complete Guide
Complete red team testing guide for Mistral and Mixtral models including MoE routing and API vulnerabilities.
Testing Ollama Local Deployments (Platform Walkthrough)
Red team testing guide for models deployed locally via Ollama including API endpoints and model management.
Testing Replicate Hosted Models
Red team testing guide for models hosted on Replicate including API exploitation and model versioning.
Testing Together AI Hosted Models
Red team testing guide for models hosted on Together AI's inference platform.
Testing vLLM Inference Deployments
Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.
Security Benchmark Runner Development
Build a benchmark runner for standardized evaluation of LLM security across models and configurations.
Building an Attack Replay Tool
Build a tool that records and replays attack sequences for regression testing and defense validation.
Counterfit ML Security Testing
Use Microsoft's Counterfit for adversarial ML testing of deployed model endpoints.
Building a Custom LLM Fuzzer
Build a custom fuzzing tool for LLM applications with mutation engines and coverage tracking.
Building a Custom Payload Mutation Engine
Develop a mutation engine that evolves prompt injection payloads through genetic algorithms.
Garak Advanced Configuration Guide
Deep dive into Garak's configuration system including custom probes, detectors, and generators.
Building Custom Garak Probes (Tool Walkthrough)
Develop custom probe modules for NVIDIA's Garak to test domain-specific vulnerabilities.
HarmBench Custom Behavior Sets
Extend HarmBench with custom behavior specifications for domain-specific vulnerability testing.
HarmBench Evaluation Walkthrough
Run standardized attack evaluations using the HarmBench framework and interpret results.
Tool Walkthroughs
End-to-end practical walkthroughs for essential AI red teaming tools, covering installation, configuration, execution, and result interpretation.
Inspect AI Safety Evaluations
Build and run AI safety evaluations using the UK AISI Inspect framework.
JailbreakBench Usage and Submission
Use JailbreakBench to evaluate jailbreak techniques and submit results to the benchmark.
LLM Guard Deployment and Testing
Deploy LLM Guard for input/output scanning and test its effectiveness against common attacks.
Building an LLM Traffic Analyzer
Build a proxy-based LLM traffic analyzer for intercepting and analyzing API communications.
Promptfoo CI/CD Pipeline Integration
Integrate Promptfoo into CI/CD pipelines for continuous LLM security regression testing.
Promptfoo for Red Team Evaluation
Configure Promptfoo for comprehensive red team evaluation with custom assertions and graders.
PyRIT Multi-Turn Campaign Orchestration
Build and run sophisticated multi-turn attack campaigns using PyRIT's orchestrator system.
PyRIT Custom Scoring Integration
Integrate custom scoring metrics into PyRIT for organization-specific red team evaluation criteria.
Agent Security Scanner Development
Build an automated security scanner for AI agent systems that tests tool use, memory, and workflow vulnerabilities.
Automated Red Team Report Generation
Build an automated system for generating structured red team reports from testing data and findings.
Building an LLM Attack Proxy
Build an intercepting proxy for LLM API traffic that enables payload injection, response analysis, and logging.
Defense Benchmarking Tool Development
Build a tool for benchmarking the effectiveness of defensive measures against standardized attack suites.
Building an Embedding Attack Toolkit
Build a toolkit for conducting embedding-level attacks including inversion, poisoning, and manipulation.
Garak Custom Plugin Development
Develop custom Garak plugins including probes, detectors, and generators for organization-specific testing.
Inspect AI Custom Solver Development
Build custom solvers in Inspect AI for specialized safety evaluation scenarios and attack simulations.
Jailbreak Dataset Curation Tool
Build a tool for curating, organizing, and testing jailbreak prompt datasets with effectiveness tracking.
LLM Traffic Analysis Tool
Build a tool for analyzing and visualizing LLM API traffic patterns to identify attack indicators.
MCP Security Audit Tool
Build a tool for auditing MCP server implementations for common security vulnerabilities and misconfigurations.
Multi-Model Test Harness Construction
Build a unified test harness for running attacks across OpenAI, Anthropic, Google, and local model endpoints.
Payload Mutation Framework Development
Build a payload mutation framework with grammar-based, semantic, and adversarial mutation strategies.
Promptfoo CI/CD Pipeline Integration (Tool Walkthrough)
Integrate Promptfoo red team evaluations into CI/CD pipelines for automated security regression testing.
PyRIT Custom Converter Development
Build custom PyRIT converters for payload transformation, encoding, and multi-modal attack generation.
RAG Security Testing Framework
Build a framework for systematic security testing of RAG applications including poisoning and exfiltration.
A2A Message Interception 導覽
Intercept and manipulate messages between agents in A2A protocol-based multi-agent systems.
A2A Protocol Injection 導覽
Walkthrough of exploiting Google's Agent-to-Agent protocol for inter-agent prompt injection.
Adversarial Suffix Crafting 導覽
Craft adversarial suffixes using gradient-based and gradient-free optimization methods for black-box models.
API Abuse Chain 攻擊 導覽
Chain multiple API calls to achieve unauthorized actions that no single call would permit.
API Chaining 利用ation 導覽
導覽 of chaining multiple API calls in agent systems to achieve multi-step unauthorized actions.
Attention Hijacking 攻擊 導覽
Hijack transformer attention mechanisms to redirect model focus toward adversarial instructions in the context.
AutoDAN Stealthy 越獄 導覽
導覽 of generating stealthy, human-readable jailbreak prompts using the AutoDAN methodology.
Batch Processing Injection 導覽
Inject payloads through batch processing pipelines where individual items are processed without isolation.
LLM Cache 投毒 導覽
Poison LLM response caches to serve adversarial content to other users without direct injection.
Competition-Style 越獄 Techniques
導覽 of jailbreak techniques used in AI security competitions and CTF events.
Computer Use 代理 Injection 導覽
導覽 of injecting prompts through UI elements and screenshots processed by computer-use agents.
進階 Context Manipulation 導覽
Manipulate conversation context through history injection, context overflow, and attention weight exploitation.
Context Manipulation Deep Dive
進階 walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.
Crescendo Multi-Turn 越獄 導覽
Step-by-step walkthrough of the Crescendo technique for gradually escalating multi-turn jailbreaks.
Cross-Plugin Data Exfiltration 導覽
導覽 of chaining multiple plugins/tools to exfiltrate data from LLM agent systems.
Data Harvesting Through LLM Apps
Complete walkthrough of systematic data extraction from LLM applications using various exfiltration channels.
Document-Based Injection 導覽
Inject prompts through documents processed by LLM applications including PDFs, spreadsheets, and presentations.
Embedding Collision 攻擊 導覽
Craft documents that collide in embedding space with target queries to hijack RAG retrieval results.
Embedding Inversion 攻擊 導覽
導覽 of inverting text embeddings to recover original documents from vector databases.
Encoding Chain 攻擊 導覽
Chain multiple encoding transformations to bypass input filters that only decode one layer of encoding.
Encoding Chain Bypass 導覽
導覽 of chaining Base64, URL encoding, and Unicode tricks to bypass multi-layer input filters.
Few-Shot Example 投毒 導覽
Poison few-shot examples in prompts to establish behavioral patterns that override system instructions.
Fine-Tuning Safety Bypass 導覽
導覽 of using fine-tuning API access to remove safety behaviors from aligned models.
Function Calling 利用ation 指南
Complete walkthrough of exploiting function calling in OpenAI, Anthropic, and Google AI APIs.
Function Schema 投毒 導覽
Poison function schemas to inject hidden instructions that redirect model tool selection and parameter filling.
GCG Adversarial Suffix 攻擊 導覽
Complete walkthrough of implementing the Greedy Coordinate Gradient adversarial suffix attack.
Real-World Indirect 提示詞注入
導覽 of planting and triggering indirect prompt injection in web-browsing AI assistants.
JSON Injection 攻擊 導覽
利用 JSON parsing and generation in LLM applications to inject payloads through structured data boundaries.
Logit Bias 利用ation 導覽
利用 logit bias API parameters to force models toward specific token sequences and bypass safety filters.
Many-Shot 越獄ing 導覽
Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.
進階 Markdown Injection 導覽
Inject Markdown that triggers data exfiltration through image rendering, link generation, and code block escape.
MCP 工具 投毒 攻擊 導覽
導覽 of exploiting MCP tool descriptions to redirect agent behavior via hidden instructions.
記憶體 Persistence 攻擊 導覽
導覽 of achieving persistent memory manipulation in agent systems for cross-session influence.
模型 Context Window Overflow 導覽
Overflow the context window to push safety instructions outside the effective attention range.
模型 Extraction 攻擊 導覽
導覽 of extracting model weights/behavior through systematic API querying.
Multi-Modal Document 攻擊 導覽
Combine visual and textual injection in documents processed by multimodal LLM applications.
Multi-Turn Persona Escalation 導覽
Gradually escalate through persona manipulation across multiple turns to achieve full safety bypass.
Multimodal Image Injection 導覽
Step-by-step walkthrough of embedding adversarial prompts in images for vision model exploitation.
PAIR Automated 越獄 導覽
End-to-end walkthrough of the PAIR attack methodology for automated black-box jailbreaking.
代理 Privilege Escalation 導覽
導覽 of escalating privileges in multi-agent systems through trust chain exploitation.
進階 Prompt Leaking 導覽
進階 techniques for extracting system prompts including iterative reconstruction and side-channel methods.
RAG 投毒 End-to-End 導覽
Complete walkthrough of poisoning a RAG system from document injection through information extraction.
Reasoning 模型 利用ation 導覽
利用 extended thinking in reasoning models to inject false premises and manipulate conclusion generation.
Regex Filter Bypass 導覽
Systematically bypass regex-based input filters using Unicode tricks, encoding, and pattern-specific evasion.
Role Confusion 攻擊 導覽
利用 role confusion between system, user, and assistant messages to override safety instructions.
Role Confusion 利用ation 導覽
利用 role boundaries in chat APIs by injecting assistant-role messages and system-level instructions.
Semantic Camouflage 導覽
導覽 of crafting semantically camouflaged injections that evade both classifiers and human review.
Semantic Deception 攻擊 導覽
Craft semantically deceptive prompts that appear benign to classifiers while achieving adversarial objectives.
Sleeper 代理 Detection 導覽
導覽 of detecting deceptive sleeper agent behaviors in fine-tuned language models.
Supply Chain 提示詞注入 導覽
Plant injection payloads in upstream data sources consumed by LLM applications including packages and documentation.
模型 Supply Chain 投毒
導覽 of poisoning ML supply chains through dependency confusion, model weight manipulation, and hub attacks.
Synthetic Identity Injection 導覽
Create synthetic identities that exploit LLM trust mechanisms to achieve elevated instruction priority.
TAP Tree of 攻擊s 導覽
Detailed walkthrough of building and running Tree of 攻擊s for systematic jailbreak generation.
Thought Injection in Reasoning 模型s 導覽
Inject adversarial thoughts into the reasoning chain of thinking models to manipulate final outputs.
Timing Side-Channel 攻擊 導覽
Extract information from LLM applications through timing differences in response generation and safety filter processing.
Token-Level 攻擊 Optimization 導覽
導覽 of optimizing adversarial token sequences for maximum jailbreak effectiveness.
工具 Shadowing 攻擊 導覽
Register shadow tools that override legitimate tool definitions to intercept and manipulate agent actions.
Vision 模型 攻擊 導覽
攻擊 vision-language models through adversarial images with embedded text, perturbations, and visual trojans.
Vision 模型 攻擊 導覽 (攻擊 導覽)
Step-by-step walkthrough of visual prompt injection, adversarial images, and OCR exploitation in vision-language models.
A2A 代理 Impersonation 導覽
導覽 of impersonating a trusted agent in an A2A network to intercept and redirect delegated tasks.
API Parameter 利用ation 導覽
導覽 of exploiting undocumented and misconfigured API parameters in LLM service endpoints.
Batch API 利用ation 導覽
導覽 of exploiting batch processing APIs for large-scale prompt injection and output analysis.
Chain-of-Thought Hijacking 導覽
導覽 of hijacking visible reasoning traces in CoT models to redirect conclusions and bypass safety checks.
Cognitive Overload 越獄 導覽
Step-by-step walkthrough of the cognitive overload technique that overwhelms safety training with complex nested instructions.
Context Virtualization 攻擊 導覽
導覽 of creating virtualized contexts that make the model believe it is operating in a different environment.
Conversation 投毒 攻擊 導覽
Step-by-step walkthrough of poisoning conversation history to alter model behavior in future turns.
Document Parsing 利用 Chain 導覽
End-to-end walkthrough of exploiting document parsers to inject instructions through PDFs, DOCX, and HTML files.
Few-Shot 攻擊 Scaling Analysis
Detailed analysis of how few-shot examples scale to influence model behavior, from 2-shot to many-shot regime.
Image Steganography Injection 導覽
導覽 of hiding prompt injection payloads in images using steganographic techniques for vision model attacks.
Instruction Hierarchy 利用ation 導覽
導覽 of exploiting how models prioritize system, user, and tool instructions to override safety directives.
Knowledge Graph Injection 攻擊 導覽
導覽 of injecting adversarial facts into knowledge graphs consumed by LLM-based reasoning systems.
LangChain 利用 Chain 導覽
導覽 of chaining LangChain CVEs for remote code execution from prompt injection through to shell access.
MCP Sampling API Abuse 導覽
導覽 of exploiting the MCP sampling API to manipulate how servers request and process LLM completions.
記憶體 Injection and Persistence 導覽
導覽 of injecting persistent instructions into agent memory systems that survive across sessions.
Multi-Provider 攻擊 Campaign 導覽
End-to-end walkthrough of conducting an attack campaign across OpenAI, Anthropic, Google, and open-source models.
Output Format Exfiltration Techniques 導覽
導覽 of using output format manipulation to exfiltrate data through code blocks, markdown, and structured outputs.
RAG Hybrid Search 投毒 導覽
導覽 of poisoning both vector and keyword search in hybrid RAG architectures for maximum retrieval influence.
Recursive 提示詞注入 導覽
導覽 of creating self-replicating injection payloads that persist through model output-to-input loops.
Refusal Suppression Technique 導覽
導覽 of refusal suppression techniques that prime models to avoid safety refusal patterns.
Semantic Obfuscation 導覽
導覽 of semantically obfuscating adversarial payloads so they appear benign to both classifiers and humans.
Structured Output Data Leakage 導覽
導覽 of using structured output requirements to extract sensitive data embedded in model responses.
工具 Use Confusion 攻擊 導覽
導覽 of confusing model tool-use decisions to invoke unintended functions or skip safety-critical tools.
Unicode Normalization Bypass 導覽
Step-by-step guide to exploiting Unicode normalization differences between input filters and model tokenizers.
Voice AI 提示詞注入 導覽
導覽 of injecting prompts into voice-based AI assistants through adversarial audio and ultrasonic signals.
LLM Watermark Analysis 導覽
導覽 of detecting and analyzing watermarks in LLM-generated text using statistical methods.
XML Injection in LLM Systems 導覽
利用 XML parsing in LLM application pipelines to inject instructions through entity expansion and CDATA sections.
XML and JSON Injection in LLM Apps
導覽 of exploiting XML and JSON parsing in LLM applications for injection and data manipulation.
Adversarial Robustness Testing Framework
Build a framework for continuously testing adversarial robustness of deployed LLM defense mechanisms.
Adversarial 訓練 for LLM 防禦 (防禦 導覽)
Implement adversarial training techniques to improve LLM robustness against prompt injection and jailbreaking.
代理 Permission Boundary Enforcement
Implement fine-grained permission boundaries for LLM agents that limit tool access based on context and user role.
Automated 防禦 Regression Testing
Build automated regression tests for LLM security defenses using attack replay and continuous monitoring.
Automated 紅隊 防禦 Loop
Build an automated continuous red team testing loop that discovers and patches vulnerabilities iteratively.
Behavioral Anomaly Detection for LLMs
Implement behavioral anomaly detection that identifies when model outputs deviate from expected safety profiles.
Building an Input Safety Classifier
Build a production-quality input classifier that detects prompt injection attempts.
Constitutional AI Implementation 指南
Implement constitutional AI principles in a custom fine-tuning and RLHF pipeline.
Context Isolation Pattern Implementation
Implement context isolation patterns that prevent instruction leakage between system prompts and user data.
Data Loss Prevention for LLM Apps
Build a data loss prevention layer for LLM applications that prevents sensitive data exposure in outputs.
防禦 Benchmarking System
Build a benchmarking system to continuously evaluate defense effectiveness against known attack classes.
防禦-in-Depth Architecture for LLM Apps
Design and implement a complete defense-in-depth architecture for production LLM applications.
AI Incident Response System Setup
Set up comprehensive incident response capabilities for AI-specific security incidents.
Input Embedding Firewall Deployment
Deploy an embedding-based firewall that detects injection attempts through semantic similarity to known attack patterns.
Secure Input/Output Logging for 防禦
Implement secure logging for LLM input/output pairs that supports forensic analysis without exposing sensitive data.
LLM Firewall Architecture Design (防禦 導覽)
Design and implement a comprehensive LLM firewall with input analysis, output filtering, and behavioral monitoring.
LLM Honeypot Deployment
Deploy LLM honeypots to detect and study attacker behavior patterns and techniques.
模型 Behavior Monitoring Setup
Set up comprehensive model behavior monitoring to detect drift, anomalies, and potential compromise.
Monitoring LLM Applications for Abuse
Build a monitoring and alerting system to detect ongoing attacks against LLM applications.
Multi-模型 Safety Consensus
Implement safety consensus mechanisms where multiple models must agree before executing sensitive actions.
Output Filtering Pipeline Design
Design and implement a multi-stage output filtering pipeline for LLM applications.
Output Grounding and Verification
Implement output grounding verification to ensure LLM responses are factually supported by provided context.
Prompt Armor Implementation 指南
Implement a comprehensive prompt armoring system with instruction isolation, delimiter hardening, and priority enforcement.
提示詞注入 Honeypot Setup
Deploy honeypot prompts and canary data that detect and characterize prompt injection attempts.
RAG Input Sanitization 導覽
Implement input sanitization for RAG systems to prevent document-based injection attacks.
Real-Time 攻擊 Detection System
Build a real-time attack detection system that monitors LLM interactions for adversarial patterns.
紅隊-防禦 Feedback Loop
Build a continuous red team-defense improvement loop with automated testing and metric tracking.
Response Watermarking Implementation
Implement response watermarking to enable traceability and detect unauthorized reproduction of LLM outputs.
Runtime Safety Monitor Implementation
Implement a runtime safety monitor that detects and blocks unsafe model outputs in real-time.
訓練 Custom Safety Classifiers
Train custom safety classifiers tuned to your application's specific threat model and content policy.
Secure 代理 Architecture Design
Design a secure architecture for LLM agent systems with sandboxing, capability controls, and audit trails.
Secure Function Calling Patterns
Implement secure function calling with input validation, output sanitization, and capability restrictions.
Secure RAG Architecture 導覽
Design and implement a secure RAG architecture with document sanitization, access controls, and output validation.
Secure RAG Pipeline Architecture
Build a secure RAG pipeline with document sanitization, retrieval validation, and injection-resistant augmentation.
System Prompt Protection Techniques
Implement multiple layers of system prompt protection against extraction, override, and leakage attacks.
AI 安全 Threat Intelligence
Build a threat intelligence pipeline for staying current with AI security threats and attack techniques.
Token-Level Input Filtering
Implement token-level input filtering that detects injection patterns at the tokenization stage before model processing.
工具 Call Authorization Framework
Implement a tool call authorization framework that validates tool invocations against policy before execution.
代理 工具 Access Control Implementation
Implement fine-grained tool access control for LLM agents with capability-based security and approval workflows.
Anomaly Detection for LLM Traffic
Build anomaly detection for LLM API traffic to identify attack patterns, abuse, and prompt injection attempts.
Conversation Integrity Monitoring
Build a conversation integrity monitoring system that detects manipulation across multi-turn interactions.
防禦 Effectiveness Metrics Dashboard
Build a dashboard to measure and visualize the effectiveness of defensive measures against attack categories.
Automated 防禦 Testing Pipeline
Build an automated pipeline that continuously tests defensive measures against evolving attack techniques.
Embedding 投毒 Detection System
Build a detection system for identifying poisoned documents in vector databases using statistical analysis.
Function Calling Guardrails Implementation
Implement guardrails for function calling that validate tool selection, parameters, and execution scope.
LLM Honeypot Deployment 指南
Deploy LLM-based honeypots to detect, analyze, and learn from attacker techniques in production environments.
Incident Response Playbook for LLM Applications
Design and implement an incident response playbook specific to LLM application security incidents.
LLM Guard Production Deployment 指南
Deploy LLM Guard in a production environment with custom scanners, performance optimization, and monitoring.
MCP Server 安全 Hardening 指南
Harden MCP server implementations against tool poisoning, transport attacks, and capability escalation.
Multi-模型 防禦 Ensemble
Build an ensemble defense system using multiple models to cross-validate inputs and outputs for safety.
NeMo Guardrails 進階 Configuration
進階 NeMo Guardrails configuration including custom actions, knowledge base integration, and multi-model pipelines.
Building an Output Scanning Pipeline
Build a production output scanning pipeline that detects data leakage, harmful content, and injection artifacts.
PII Detection and Redaction for LLMs
Build a PII detection and redaction system for LLM inputs and outputs to prevent data exposure.
訓練 a 提示詞注入 Classifier
Train a custom prompt injection detection classifier using labeled datasets and modern NLP techniques.
RAG Document Sandboxing Implementation
Implement document-level sandboxing for RAG systems to prevent cross-document injection and privilege escalation.
Adaptive Rate Limiting for LLM APIs
Implement adaptive rate limiting that adjusts thresholds based on detected attack patterns and user behavior.
Secure RAG Architecture Implementation
Implement a security-hardened RAG architecture with input sanitization, access control, and output validation.
System Prompt Protection Layers
Implement multi-layer system prompt protection including injection-resistant formatting and runtime validation.
Zero Trust Architecture for LLM Apps
Implement zero trust principles in LLM application architecture with continuous verification and least privilege.
Full Engagement: AI Code Assistant
End-to-end engagement for assessing an AI-powered code assistant with repository access.
Full Engagement: Content Generation Platform
Full engagement walkthrough for a content generation AI platform with brand safety and copyright concerns.
Full Engagement: Customer Analytics AI
Red team engagement for an AI-powered customer analytics system processing sensitive behavioral data.
Full Engagement: AI Developer 工具s
End-to-end engagement for AI-powered developer tools with repository access and code execution capabilities.
Full Engagement: Educational AI Platform
End-to-end engagement walkthrough for an AI-powered educational platform with student safety requirements.
Full Engagement: AI Financial Advisor
Full engagement for assessing an AI financial advisor chatbot handling investment recommendations.
Full Engagement: Government AI System
Full red team engagement for a government-deployed AI system with classification and compliance requirements.
Full Engagement: Healthcare AI System
End-to-end engagement walkthrough for a healthcare AI system with HIPAA compliance requirements.
Full Engagement: Internal Knowledge Bot
Full engagement walkthrough for an internal enterprise knowledge bot with RAG and document access.
Full Engagement: Multi-代理 Platform
Full red team engagement of a multi-agent platform with MCP servers and A2A communication.
Full Engagement: SaaS Customer Chatbot
End-to-end engagement walkthrough for assessing a SaaS customer-facing chatbot with tool access.
Full Engagement: ML Supply Chain Audit
End-to-end engagement walkthrough for auditing an organization's ML model supply chain security.
Full Engagement: Autonomous Coding 代理
End-to-end engagement for an autonomous AI coding agent with repo access, CI/CD integration, and deployment.
Full Engagement: AI Content Generation Platform
Red team engagement of an AI content generation platform used for marketing, SEO, and social media.
Full Engagement: Multi-工具 Customer Support 代理
End-to-end engagement for a customer support agent with order management, refund processing, and CRM access.
Full Engagement: DevOps AI Assistant
End-to-end engagement for a DevOps AI assistant with CI/CD, cloud infrastructure, and monitoring access.
Full Engagement: Document Processing Pipeline
Full engagement of an AI document processing pipeline handling invoices, contracts, and compliance documents.
Full Engagement: AI Education Platform
End-to-end engagement walkthrough for assessing an AI-powered education platform with tutoring and grading.
Full Engagement: Government Citizen Services AI
Red team engagement of a government citizen services AI handling tax, benefits, and identity verification.
Full Engagement: HR Benefits Chatbot
Full engagement walkthrough for an HR benefits chatbot with access to employee records and benefits administration.
Full Engagement: AI Insurance Underwriting
Full engagement for an AI underwriting system with risk scoring, policy generation, and claims data access.
Full Engagement: Legal Research AI System
Full red team engagement of a legal research AI with case law access, privilege management, and citation generation.
Full Engagement: Enterprise Multi-代理 System
Full engagement of an enterprise multi-agent system with specialized agents for HR, IT, Finance, and Legal.
Full Engagement: Retail Personalization AI
Red team engagement of an AI personalization engine with access to customer profiles and purchase history.
Full Engagement: AI 安全 Copilot
Red team engagement of an AI security copilot with access to SIEM, vulnerability scanners, and threat intelligence.
Full Engagement: Supply Chain AI Optimizer
End-to-end engagement for a supply chain AI with access to logistics, inventory, and supplier management systems.
Full Engagement: Telehealth AI Assistant
End-to-end engagement for a telehealth AI assistant with appointment scheduling, symptom assessment, and EHR access.
逐步導覽
涵蓋紅隊工具、案件方法論、防禦實作、平台特定測試與完整案件工作流程的逐步引導式導覽。
Adversarial Simulation Design
Design realistic adversarial simulations that model real-world threat actors and attack scenarios for AI systems.
代理式 System 評量 Methodology
Comprehensive methodology for assessing agentic AI systems including tool use, memory, and multi-agent interactions.
AI Penetration Test Planning
Complete methodology for planning AI-specific penetration tests including scope definition, resource allocation, and timeline.
AI 紅隊 Maturity 模型 (Methodology 導覽)
Maturity model for assessing and improving an organization's AI red teaming capabilities.
AI 安全 Metrics Framework
Framework for measuring and reporting on AI security posture using quantitative metrics.
AI 漏洞 Classification System
Structured system for classifying AI-specific vulnerabilities by type, impact, and exploitability.
攻擊 Prioritization Framework
Prioritize attack techniques based on target architecture, time constraints, and likelihood of success.
攻擊 Tree Construction for LLM Systems
Build systematic attack trees for LLM system assessments using MITRE ATLAS and OWASP mappings.
Automated AI Reconnaissance Workflow
Build an automated reconnaissance workflow that maps AI application architecture, models, and defense configurations.
Collaborative AI 紅隊 評量
Coordinate multi-person red team assessments with role assignments, communication protocols, and finding deconfliction.
Competitive Analysis of AI 安全 工具s
Methodology for evaluating and comparing AI security tools for red team operations.
Compliance-Driven Testing Methodology
Map regulatory requirements to specific test cases for compliance-driven AI red team assessments.
Continuous Monitoring Integration Methodology
Integrate red team findings into continuous monitoring systems for ongoing threat detection and defense validation.
Finding Deduplication and Triage
Deduplicate and triage findings from automated and manual testing into actionable, prioritized vulnerability reports.
Multi-模型 Testing Methodology
Structured methodology for testing applications that use multiple LLM models in their processing pipeline.
Post-Engagement Analysis Methodology
Conduct thorough post-engagement analysis including lessons learned, technique effectiveness, and methodology refinement.
Purple Team AI 評量 Methodology
Conduct collaborative purple team AI assessments with real-time feedback between red and blue team operations.
AI 安全 Regression Testing Methodology
Design regression testing suites that verify security fixes remain effective across model updates and deployments.
Risk-Based AI Testing Approach
Apply risk-based testing approaches to focus assessment effort on the highest-impact vulnerability categories.
Stakeholder Management in AI 紅隊ing
Managing stakeholder expectations and communication throughout AI red team engagements.
Stakeholder-Specific Reporting Methodology
Tailor red team reports for different stakeholders including executives, developers, security teams, and compliance officers.
AI 安全 Tabletop 練習s
Designing and facilitating tabletop exercises focused on AI security incident scenarios.
Time-Boxed AI 評量 Methodology
Maximize testing coverage within strict time constraints using prioritized attack trees and parallel testing.
AI 攻擊 Surface Enumeration Methodology
Systematic methodology for enumerating the complete attack surface of an AI-powered application.
AI Compliance Testing Methodology
Methodology for testing AI systems against regulatory compliance requirements including EU AI Act and NIST.
Mapping Findings to MITRE ATLAS
Methodology for mapping AI red team findings to MITRE ATLAS tactics, techniques, and procedures.
AI Penetration Test Report Structure
Detailed report structure for AI penetration tests with finding templates and severity scoring.
AI 紅隊 Scoping Templates
Templates and procedures for scoping AI red team engagements across different application types.
AI Risk Quantification Methodology
Quantitative risk assessment methodology for AI vulnerabilities with probability and impact scoring.
AI 安全 Tabletop 練習 Design
Design and facilitate AI security tabletop exercises for organizational preparedness assessment.
Evidence Collection During AI Testing
Best practices for collecting, organizing, and preserving evidence during AI red team assessments.
Multi-模型 評量 Methodology
Methodology for assessing applications that use multiple AI models in pipelines or ensemble configurations.
OWASP LLM Top 10 Testing Methodology
Comprehensive testing methodology for each vulnerability in the OWASP LLM Top 10 2025.
Purple Team Operations for AI 安全
Methodology for conducting purple team operations that combine red team attacks with blue team defense improvement.
AI 安全 Regression Testing Methodology (Methodology 導覽)
Methodology for continuous regression testing of AI application security after updates and model changes.
Communicating AI Risks to Stakeholders
指南 for communicating AI security risks to technical and non-technical stakeholders effectively.
Threat Intelligence for AI Systems
Methodology for gathering and applying threat intelligence specific to AI system attacks and defenses.
AI 漏洞 Prioritization Framework
Framework for prioritizing AI vulnerabilities by exploitability, impact, and remediation cost.
Writing AI 紅隊 Reports
指南 to writing clear, actionable AI red team assessment reports with findings and recommendations.
Testing AI21 實驗室s 模型s
Red team testing guide for AI21 實驗室s Jamba models including long context and efficiency features.
Testing Anthropic Claude: Complete 指南
Complete red team testing guide for Anthropic's Claude including tool use, extended thinking, and computer use.
Testing AWS Bedrock Deployments
Red team testing guide for models deployed via AWS Bedrock including guardrails and access controls.
Testing Azure OpenAI Service
Red team testing guide for Azure OpenAI including content filtering, managed identity, and network controls.
Testing Cohere 模型s
Red team testing guide for Cohere's Command-R models including RAG and tool use features.
Testing DeepSeek 模型s
Red team testing guide for DeepSeek models including R1 reasoning exploitation and MoE architecture.
Testing Fireworks AI Platform
Red team testing guide for Fireworks AI including function calling and compound AI systems.
Testing Google Gemini: Complete 指南
Complete red team testing guide for Google's Gemini including multimodal inputs and long context.
Testing Groq Inference Platform
Red team testing guide for Groq's high-speed inference platform and its security characteristics.
Testing Hugging Face Hosted 模型s
Red team testing guide for models hosted on Hugging Face including Inference API and Spaces.
Testing Meta Llama: Complete 指南
Complete red team testing guide for Meta's Llama models including fine-tuning and self-hosted deployment.
Testing Mistral AI 模型s
Complete red team testing guide for Mistral AI models including Mixtral MoE architecture and chat endpoints.
Testing Ollama Local Deployments
安全 testing guide for locally deployed models via Ollama including network exposure and API security.
Testing OpenAI GPT-4o: Complete 指南
Complete red team testing guide for OpenAI's GPT-4o including API, function calling, and structured outputs.
Testing Replicate-Hosted 模型s
Red team testing guide for models hosted on Replicate including open-source model deployments.
Testing Together AI Platform
Red team testing guide for Together AI including fine-tuned model endpoints and custom deployments.
Testing AI21 Jamba 模型s
Red team testing guide for AI21's Jamba models with SSM-Transformer hybrid architecture analysis.
Testing Anyscale/Ray Serve Endpoints
Red team testing guide for models served via Anyscale and Ray Serve including distributed inference security.
Testing Cohere 模型s: Complete 指南
Complete red team testing guide for Cohere's Command models including RAG integration and tool use.
Testing Fireworks AI Deployments
Red team testing guide for models deployed via Fireworks AI including function calling and fine-tuned models.
Testing GCP Vertex AI Deployments
Red team testing guide for models deployed via GCP Vertex AI including 模型 Garden and custom endpoints.
Testing Mistral/Mixtral: Complete 指南
Complete red team testing guide for Mistral and Mixtral models including MoE routing and API vulnerabilities.
Testing Ollama Local Deployments (Platform 導覽)
Red team testing guide for models deployed locally via Ollama including API endpoints and model management.
Testing Replicate Hosted 模型s
Red team testing guide for models hosted on Replicate including API exploitation and model versioning.
Testing Together AI Hosted 模型s
Red team testing guide for models hosted on Together AI's inference platform.
Testing vLLM Inference Deployments
Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.
安全 Benchmark Runner Development
Build a benchmark runner for standardized evaluation of LLM security across models and configurations.
Building an 攻擊 Replay 工具
Build a tool that records and replays attack sequences for regression testing and defense validation.
Counterfit ML 安全 Testing
Use Microsoft's Counterfit for adversarial ML testing of deployed model endpoints.
Building a Custom LLM Fuzzer
Build a custom fuzzing tool for LLM applications with mutation engines and coverage tracking.
Building a Custom Payload Mutation Engine
Develop a mutation engine that evolves prompt injection payloads through genetic algorithms.
Garak 進階 Configuration 指南
Deep dive into Garak's configuration system including custom probes, detectors, and generators.
Building Custom Garak Probes (工具 導覽)
Develop custom probe modules for NVIDIA's Garak to test domain-specific vulnerabilities.
HarmBench Custom Behavior Sets
Extend HarmBench with custom behavior specifications for domain-specific vulnerability testing.
HarmBench Evaluation 導覽
Run standardized attack evaluations using the HarmBench framework and interpret results.
工具導覽
必備 AI 紅隊演練工具的端對端實務導覽,涵蓋安裝、設定、執行與結果詮釋。
Inspect AI Safety Evaluations
Build and run AI safety evaluations using the UK AISI Inspect framework.
越獄Bench Usage and Submission
Use 越獄Bench to evaluate jailbreak techniques and submit results to the benchmark.
LLM Guard Deployment and Testing
Deploy LLM Guard for input/output scanning and test its effectiveness against common attacks.
Building an LLM Traffic Analyzer
Build a proxy-based LLM traffic analyzer for intercepting and analyzing API communications.
Promptfoo CI/CD Pipeline Integration
Integrate Promptfoo into CI/CD pipelines for continuous LLM security regression testing.
Promptfoo for 紅隊 Evaluation
Configure Promptfoo for comprehensive red team evaluation with custom assertions and graders.
PyRIT Multi-Turn Campaign Orchestration
Build and run sophisticated multi-turn attack campaigns using PyRIT's orchestrator system.
PyRIT Custom Scoring Integration
Integrate custom scoring metrics into PyRIT for organization-specific red team evaluation criteria.
代理 安全 Scanner Development
Build an automated security scanner for AI agent systems that tests tool use, memory, and workflow vulnerabilities.
Automated 紅隊 Report Generation
Build an automated system for generating structured red team reports from testing data and findings.
Building an LLM 攻擊 Proxy
Build an intercepting proxy for LLM API traffic that enables payload injection, response analysis, and logging.
防禦 Benchmarking 工具 Development
Build a tool for benchmarking the effectiveness of defensive measures against standardized attack suites.
Building an Embedding 攻擊 工具kit
Build a toolkit for conducting embedding-level attacks including inversion, poisoning, and manipulation.
Garak Custom Plugin Development
Develop custom Garak plugins including probes, detectors, and generators for organization-specific testing.
Inspect AI Custom Solver Development
Build custom solvers in Inspect AI for specialized safety evaluation scenarios and attack simulations.
越獄 Dataset Curation 工具
Build a tool for curating, organizing, and testing jailbreak prompt datasets with effectiveness tracking.
LLM Traffic Analysis 工具
Build a tool for analyzing and visualizing LLM API traffic patterns to identify attack indicators.
MCP 安全 Audit 工具
Build a tool for auditing MCP server implementations for common security vulnerabilities and misconfigurations.
Multi-模型 Test Harness Construction
Build a unified test harness for running attacks across OpenAI, Anthropic, Google, and local model endpoints.
Payload Mutation Framework Development
Build a payload mutation framework with grammar-based, semantic, and adversarial mutation strategies.
Promptfoo CI/CD Pipeline Integration (工具 導覽)
Integrate Promptfoo red team evaluations into CI/CD pipelines for automated security regression testing.
PyRIT Custom Converter Development
Build custom PyRIT converters for payload transformation, encoding, and multi-modal attack generation.
RAG 安全 Testing Framework
Build a framework for systematic security testing of RAG applications including poisoning and exfiltration.