Skip to main content

Topics Glossary Blog ATT&CK Navigator Challenges Resources

Loading...

© 2026 redteams.ai. All rights reserved.

Glossary Tags Blog Contribute Methodology Bookmarks RSS GitHub Contact

Built with Next.js

Privacy Cookies Terms Imprint

# walkthroughs

544 articlestagged with “walkthroughs”

A2A Message Interception Walkthrough

Intercept and manipulate messages between agents in A2A protocol-based multi-agent systems.

walkthroughsa2amessage-interceptionmulti-agent

A2A Protocol Injection Walkthrough

Walkthrough of exploiting Google's Agent-to-Agent protocol for inter-agent prompt injection.

walkthroughsa2aprotocolinjection

Adversarial Suffix Crafting Walkthrough

Craft adversarial suffixes using gradient-based and gradient-free optimization methods for black-box models.

walkthroughsadversarial-suffixcraftingoptimization

API Abuse Chain Attack Walkthrough

Chain multiple API calls to achieve unauthorized actions that no single call would permit.

walkthroughsapiabuse-chainsmulti-step

API Chaining Exploitation Walkthrough

Walkthrough of chaining multiple API calls in agent systems to achieve multi-step unauthorized actions.

chainingattacksexploitationapiwalkthroughs

Attention Hijacking Attack Walkthrough

Hijack transformer attention mechanisms to redirect model focus toward adversarial instructions in the context.

walkthroughsattentionhijackingtransformer

AutoDAN Stealthy Jailbreak Walkthrough

Walkthrough of generating stealthy, human-readable jailbreak prompts using the AutoDAN methodology.

walkthroughsautodanstealthyjailbreaking

Batch Processing Injection Walkthrough

Inject payloads through batch processing pipelines where individual items are processed without isolation.

walkthroughsbatch-processinginjectionpipeline

LLM Cache Poisoning Walkthrough

Poison LLM response caches to serve adversarial content to other users without direct injection.

walkthroughscachepoisoninginfrastructure

Competition-Style Jailbreak Techniques

Walkthrough of jailbreak techniques used in AI security competitions and CTF events.

attackscompetitivewalkthroughwalkthroughsjailbreak

Computer Use Agent Injection Walkthrough

Walkthrough of injecting prompts through UI elements and screenshots processed by computer-use agents.

attacksinjectionwalkthroughusecomputerwalkthroughs

Advanced Context Manipulation Walkthrough

Manipulate conversation context through history injection, context overflow, and attention weight exploitation.

walkthroughscontextmanipulationadvanced

Context Manipulation Deep Dive

Advanced walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.

diveattackscontextdeepmanipulationwalkthroughs

Crescendo Multi-Turn Jailbreak Walkthrough

Step-by-step walkthrough of the Crescendo technique for gradually escalating multi-turn jailbreaks.

walkthroughscrescendomulti-turnjailbreaking

Cross-Plugin Data Exfiltration Walkthrough

Walkthrough of chaining multiple plugins/tools to exfiltrate data from LLM agent systems.

walkthroughscross-pluginexfiltrationagent

Data Harvesting Through LLM Apps

Complete walkthrough of systematic data extraction from LLM applications using various exfiltration channels.

attacksharvestingdatawalkthroughwalkthroughs

Document-Based Injection Walkthrough

Inject prompts through documents processed by LLM applications including PDFs, spreadsheets, and presentations.

walkthroughsdocumentinjectionfile-processing

Embedding Collision Attack Walkthrough

Craft documents that collide in embedding space with target queries to hijack RAG retrieval results.

walkthroughsembeddingcollisionretrieval

Embedding Inversion Attack Walkthrough

Walkthrough of inverting text embeddings to recover original documents from vector databases.

walkthroughsembeddinginversionattack

Encoding Chain Attack Walkthrough

Chain multiple encoding transformations to bypass input filters that only decode one layer of encoding.

walkthroughsencodingchain-attacksevasion

Encoding Chain Bypass Walkthrough

Walkthrough of chaining Base64, URL encoding, and Unicode tricks to bypass multi-layer input filters.

attacksencodingchainbypasswalkthroughwalkthroughs

Few-Shot Example Poisoning Walkthrough

Poison few-shot examples in prompts to establish behavioral patterns that override system instructions.

walkthroughsfew-shotpoisoningin-context-learning

Fine-Tuning Safety Bypass Walkthrough

Walkthrough of using fine-tuning API access to remove safety behaviors from aligned models.

walkthroughsfine-tuningsafety-bypasstraining

Function Calling Exploitation Guide

Complete walkthrough of exploiting function calling in OpenAI, Anthropic, and Google AI APIs.

walkthroughsfunction-callingexploitationagent

Function Schema Poisoning Walkthrough

Poison function schemas to inject hidden instructions that redirect model tool selection and parameter filling.

walkthroughsfunction-schemapoisoningtool-use

GCG Adversarial Suffix Attack Walkthrough

Complete walkthrough of implementing the Greedy Coordinate Gradient adversarial suffix attack.

walkthroughsgcgadversarialoptimization

Real-World Indirect Prompt Injection

Walkthrough of planting and triggering indirect prompt injection in web-browsing AI assistants.

walkthroughsindirect-injectionreal-worldweb

JSON Injection Attack Walkthrough

Exploit JSON parsing and generation in LLM applications to inject payloads through structured data boundaries.

walkthroughsjsoninjectionstructured-output

Logit Bias Exploitation Walkthrough

Exploit logit bias API parameters to force models toward specific token sequences and bypass safety filters.

walkthroughslogit-biasexploitationapi-parameters

Many-Shot Jailbreaking Walkthrough

Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.

walkthroughsmany-shotjailbreakinganthropic

Advanced Markdown Injection Walkthrough

Inject Markdown that triggers data exfiltration through image rendering, link generation, and code block escape.

walkthroughsmarkdowninjectionrendering

MCP Tool Poisoning Attack Walkthrough

Walkthrough of exploiting MCP tool descriptions to redirect agent behavior via hidden instructions.

walkthroughsmcptool-poisoningagent

Memory Persistence Attack Walkthrough

Walkthrough of achieving persistent memory manipulation in agent systems for cross-session influence.

attacksmemorywalkthroughpersistencewalkthroughs

Model Context Window Overflow Walkthrough

Overflow the context window to push safety instructions outside the effective attention range.

walkthroughscontext-overflowattentioninjection

Model Extraction Attack Walkthrough

Walkthrough of extracting model weights/behavior through systematic API querying.

walkthroughsmodel-extractionstealingapi

Multi-Modal Document Attack Walkthrough

Combine visual and textual injection in documents processed by multimodal LLM applications.

walkthroughsmultimodaldocumentcombined-attack

Multi-Turn Persona Escalation Walkthrough

Gradually escalate through persona manipulation across multiple turns to achieve full safety bypass.

walkthroughsmulti-turnpersonaescalation

Multimodal Image Injection Walkthrough

Step-by-step walkthrough of embedding adversarial prompts in images for vision model exploitation.

walkthroughsmultimodalimageinjection

PAIR Automated Jailbreak Walkthrough

End-to-end walkthrough of the PAIR attack methodology for automated black-box jailbreaking.

walkthroughspairautomatedjailbreaking

Agent Privilege Escalation Walkthrough

Walkthrough of escalating privileges in multi-agent systems through trust chain exploitation.

attacksagentsprivilegeescalationwalkthroughwalkthroughs

Advanced Prompt Leaking Walkthrough

Advanced techniques for extracting system prompts including iterative reconstruction and side-channel methods.

walkthroughsprompt-leakingadvancedextraction

RAG Poisoning End-to-End Walkthrough

Complete walkthrough of poisoning a RAG system from document injection through information extraction.

walkthroughsragpoisoningdata

Reasoning Model Exploitation Walkthrough

Exploit extended thinking in reasoning models to inject false premises and manipulate conclusion generation.

walkthroughsreasoningexploitationchain-of-thought

Regex Filter Bypass Walkthrough

Systematically bypass regex-based input filters using Unicode tricks, encoding, and pattern-specific evasion.

walkthroughsregexfilter-bypassevasion

Role Confusion Attack Walkthrough

Exploit role confusion between system, user, and assistant messages to override safety instructions.

walkthroughsrole-confusioninstruction-hierarchyattacks

Role Confusion Exploitation Walkthrough

Exploit role boundaries in chat APIs by injecting assistant-role messages and system-level instructions.

attacksroleexploitationconfusionwalkthroughs

Semantic Camouflage Walkthrough

Walkthrough of crafting semantically camouflaged injections that evade both classifiers and human review.

attackscamouflagesemanticwalkthroughwalkthroughs

Semantic Deception Attack Walkthrough

Craft semantically deceptive prompts that appear benign to classifiers while achieving adversarial objectives.

walkthroughssemanticdeceptionattacks

Sleeper Agent Detection Walkthrough

Walkthrough of detecting deceptive sleeper agent behaviors in fine-tuned language models.

walkthroughssleeper-agentsdetectionalignment

Supply Chain Prompt Injection Walkthrough

Plant injection payloads in upstream data sources consumed by LLM applications including packages and documentation.

walkthroughssupply-chaininjectiondependencies

Model Supply Chain Poisoning

Walkthrough of poisoning ML supply chains through dependency confusion, model weight manipulation, and hub attacks.

attackssupplychainpoisoningwalkthroughwalkthroughs

Synthetic Identity Injection Walkthrough

Create synthetic identities that exploit LLM trust mechanisms to achieve elevated instruction priority.

walkthroughssynthetic-identityinjectionsocial-engineering

TAP Tree of Attacks Walkthrough

Detailed walkthrough of building and running Tree of Attacks for systematic jailbreak generation.

walkthroughstaptree-of-attacksjailbreaking

Thought Injection in Reasoning Models Walkthrough

Inject adversarial thoughts into the reasoning chain of thinking models to manipulate final outputs.

walkthroughsthought-injectionreasoningchain-of-thought

Timing Side-Channel Attack Walkthrough

Extract information from LLM applications through timing differences in response generation and safety filter processing.

walkthroughstimingside-channelinformation-leakage

Token-Level Attack Optimization Walkthrough

Walkthrough of optimizing adversarial token sequences for maximum jailbreak effectiveness.

attacksoptimizationtokenwalkthroughs

Tool Shadowing Attack Walkthrough

Register shadow tools that override legitimate tool definitions to intercept and manipulate agent actions.

walkthroughstool-shadowingmcpagent

Vision Model Attack Walkthrough

Attack vision-language models through adversarial images with embedded text, perturbations, and visual trojans.

walkthroughsvision-modelattacksmultimodal

Vision Model Attack Walkthrough (Attack Walkthrough)

Step-by-step walkthrough of visual prompt injection, adversarial images, and OCR exploitation in vision-language models.

attacksvisionmodelwalkthroughwalkthroughs

A2A Agent Impersonation Walkthrough

Walkthrough of impersonating a trusted agent in an A2A network to intercept and redirect delegated tasks.

walkthroughsa2aimpersonationmulti-agent

API Parameter Exploitation Walkthrough

Walkthrough of exploiting undocumented and misconfigured API parameters in LLM service endpoints.

walkthroughsapi-parametersexploitationtechnique

Batch API Exploitation Walkthrough

Walkthrough of exploiting batch processing APIs for large-scale prompt injection and output analysis.

walkthroughsbatch-apiexploitationtechnique

Chain-of-Thought Hijacking Walkthrough

Walkthrough of hijacking visible reasoning traces in CoT models to redirect conclusions and bypass safety checks.

walkthroughschain-of-thoughthijackingreasoning

Cognitive Overload Jailbreak Walkthrough

Step-by-step walkthrough of the cognitive overload technique that overwhelms safety training with complex nested instructions.

walkthroughscognitive-overloadjailbreakingtechnique

Context Virtualization Attack Walkthrough

Walkthrough of creating virtualized contexts that make the model believe it is operating in a different environment.

walkthroughscontext-virtualizationjailbreakingtechnique

Conversation Poisoning Attack Walkthrough

Step-by-step walkthrough of poisoning conversation history to alter model behavior in future turns.

walkthroughsconversation-poisoningmulti-turntechnique

Document Parsing Exploit Chain Walkthrough

End-to-end walkthrough of exploiting document parsers to inject instructions through PDFs, DOCX, and HTML files.

walkthroughsdocument-parsingexploit-chainpdf

Few-Shot Attack Scaling Analysis

Detailed analysis of how few-shot examples scale to influence model behavior, from 2-shot to many-shot regime.

walkthroughsfew-shotscalingjailbreaking

Image Steganography Injection Walkthrough

Walkthrough of hiding prompt injection payloads in images using steganographic techniques for vision model attacks.

walkthroughssteganographyimage-injectionmultimodal

Instruction Hierarchy Exploitation Walkthrough

Walkthrough of exploiting how models prioritize system, user, and tool instructions to override safety directives.

walkthroughsinstruction-hierarchyexploitationtechnique

Knowledge Graph Injection Attack Walkthrough

Walkthrough of injecting adversarial facts into knowledge graphs consumed by LLM-based reasoning systems.

walkthroughsknowledge-graphinjectionattack

LangChain Exploit Chain Walkthrough

Walkthrough of chaining LangChain CVEs for remote code execution from prompt injection through to shell access.

walkthroughslangchainexploit-chaincve

MCP Sampling API Abuse Walkthrough

Walkthrough of exploiting the MCP sampling API to manipulate how servers request and process LLM completions.

walkthroughsmcpsampling-apiabuse

Memory Injection and Persistence Walkthrough

Walkthrough of injecting persistent instructions into agent memory systems that survive across sessions.

walkthroughsmemory-injectionpersistenceagent

Multi-Provider Attack Campaign Walkthrough

End-to-end walkthrough of conducting an attack campaign across OpenAI, Anthropic, Google, and open-source models.

walkthroughsmulti-providerattack-campaigntechnique

Output Format Exfiltration Techniques Walkthrough

Walkthrough of using output format manipulation to exfiltrate data through code blocks, markdown, and structured outputs.

walkthroughsoutput-formatexfiltrationtechnique

RAG Hybrid Search Poisoning Walkthrough

Walkthrough of poisoning both vector and keyword search in hybrid RAG architectures for maximum retrieval influence.

walkthroughsraghybrid-searchpoisoning

Recursive Prompt Injection Walkthrough

Walkthrough of creating self-replicating injection payloads that persist through model output-to-input loops.

walkthroughsrecursiveinjectionself-replicating

Refusal Suppression Technique Walkthrough

Walkthrough of refusal suppression techniques that prime models to avoid safety refusal patterns.

walkthroughsrefusal-suppressionjailbreakingtechnique

Semantic Obfuscation Walkthrough

Walkthrough of semantically obfuscating adversarial payloads so they appear benign to both classifiers and humans.

walkthroughssemantic-obfuscationevasiontechnique

Structured Output Data Leakage Walkthrough

Walkthrough of using structured output requirements to extract sensitive data embedded in model responses.

walkthroughsstructured-outputdata-leakagetechnique

Tool Use Confusion Attack Walkthrough

Walkthrough of confusing model tool-use decisions to invoke unintended functions or skip safety-critical tools.

walkthroughstool-useconfusionfunction-calling

Unicode Normalization Bypass Walkthrough

Step-by-step guide to exploiting Unicode normalization differences between input filters and model tokenizers.

walkthroughsunicodenormalizationbypass

Voice AI Prompt Injection Walkthrough

Walkthrough of injecting prompts into voice-based AI assistants through adversarial audio and ultrasonic signals.

walkthroughsvoice-aiinjectionspeech

LLM Watermark Analysis Walkthrough

Walkthrough of detecting and analyzing watermarks in LLM-generated text using statistical methods.

walkthroughswatermarkanalysisdetection

XML Injection in LLM Systems Walkthrough

Exploit XML parsing in LLM application pipelines to inject instructions through entity expansion and CDATA sections.

walkthroughsxmlinjectionstructured-data

XML and JSON Injection in LLM Apps

Walkthrough of exploiting XML and JSON parsing in LLM applications for injection and data manipulation.

attacksxmlinjectionwalkthroughjsonwalkthroughs

Adversarial Robustness Testing Framework

Build a framework for continuously testing adversarial robustness of deployed LLM defense mechanisms.

walkthroughsdefenserobustness-testingframework

Adversarial Training for LLM Defense (Defense Walkthrough)

Implement adversarial training techniques to improve LLM robustness against prompt injection and jailbreaking.

walkthroughsdefenseadversarial-trainingrobustness

Agent Permission Boundary Enforcement

Implement fine-grained permission boundaries for LLM agents that limit tool access based on context and user role.

walkthroughsdefenseagent-permissionsboundaries

Automated Defense Regression Testing

Build automated regression tests for LLM security defenses using attack replay and continuous monitoring.

automatedtestingwalkthroughsdefense

Automated Red Team Defense Loop

Build an automated continuous red team testing loop that discovers and patches vulnerabilities iteratively.

walkthroughsdefenseautomated-red-teamcontinuous

Behavioral Anomaly Detection for LLMs

Implement behavioral anomaly detection that identifies when model outputs deviate from expected safety profiles.

walkthroughsdefensebehavioral-anomalydetection

Building an Input Safety Classifier

Build a production-quality input classifier that detects prompt injection attempts.

walkthroughsdefenseclassifierinput-safety

Constitutional AI Implementation Guide

Implement constitutional AI principles in a custom fine-tuning and RLHF pipeline.

walkthroughsdefenseconstitutional-aialignment

Context Isolation Pattern Implementation

Implement context isolation patterns that prevent instruction leakage between system prompts and user data.

walkthroughsdefensecontext-isolationpatterns

Data Loss Prevention for LLM Apps

Build a data loss prevention layer for LLM applications that prevents sensitive data exposure in outputs.

walkthroughsdefensedlpdata-protection

Defense Benchmarking System

Build a benchmarking system to continuously evaluate defense effectiveness against known attack classes.

benchmarkingsystemwalkthroughsdefense

Defense-in-Depth Architecture for LLM Apps

Design and implement a complete defense-in-depth architecture for production LLM applications.

walkthroughsdefensedefense-in-deptharchitecture

AI Incident Response System Setup

Set up comprehensive incident response capabilities for AI-specific security incidents.

setupdefenseresponseincidentwalkthroughs

Input Embedding Firewall Deployment

Deploy an embedding-based firewall that detects injection attempts through semantic similarity to known attack patterns.

walkthroughsdefenseembedding-firewallinput-filtering

Secure Input/Output Logging for Defense

Implement secure logging for LLM input/output pairs that supports forensic analysis without exposing sensitive data.

walkthroughsdefenseloggingforensics

LLM Firewall Architecture Design (Defense Walkthrough)

Design and implement a comprehensive LLM firewall with input analysis, output filtering, and behavioral monitoring.

walkthroughsdefensellm-firewallarchitecture

LLM Honeypot Deployment

Deploy LLM honeypots to detect and study attacker behavior patterns and techniques.

deploymentdefensellmhoneypotwalkthroughs

Model Behavior Monitoring Setup

Set up comprehensive model behavior monitoring to detect drift, anomalies, and potential compromise.

defensemonitoringmodelbehaviorwalkthroughs

Monitoring LLM Applications for Abuse

Build a monitoring and alerting system to detect ongoing attacks against LLM applications.

walkthroughsdefensemonitoringabuse-detection

Multi-Model Safety Consensus

Implement safety consensus mechanisms where multiple models must agree before executing sensitive actions.

walkthroughsdefensemulti-modelconsensus

Output Filtering Pipeline Design

Design and implement a multi-stage output filtering pipeline for LLM applications.

walkthroughsdefenseoutput-filteringpipeline

Output Grounding and Verification

Implement output grounding verification to ensure LLM responses are factually supported by provided context.

walkthroughsdefenseoutput-groundingverification

Prompt Armor Implementation Guide

Implement a comprehensive prompt armoring system with instruction isolation, delimiter hardening, and priority enforcement.

defenseimplementationarmorpromptwalkthroughs

Prompt Injection Honeypot Setup

Deploy honeypot prompts and canary data that detect and characterize prompt injection attempts.

walkthroughsdefensehoneypotdetection

RAG Input Sanitization Walkthrough

Implement input sanitization for RAG systems to prevent document-based injection attacks.

walkthroughsdefenseragsanitization

Real-Time Attack Detection System

Build a real-time attack detection system that monitors LLM interactions for adversarial patterns.

defenserealtimedetectionattackwalkthroughs

Red Team-Defense Feedback Loop

Build a continuous red team-defense improvement loop with automated testing and metric tracking.

walkthroughsdefensered-teamfeedback-loop

Response Watermarking Implementation

Implement response watermarking to enable traceability and detect unauthorized reproduction of LLM outputs.

walkthroughsdefensewatermarkingtraceability

Runtime Safety Monitor Implementation

Implement a runtime safety monitor that detects and blocks unsafe model outputs in real-time.

walkthroughsdefenseruntime-monitorsafety

Training Custom Safety Classifiers

Train custom safety classifiers tuned to your application's specific threat model and content policy.

walkthroughsdefensesafety-classifiertraining

Secure Agent Architecture Design

Design a secure architecture for LLM agent systems with sandboxing, capability controls, and audit trails.

architecturesecuredefenseagentwalkthroughs

Secure Function Calling Patterns

Implement secure function calling with input validation, output sanitization, and capability restrictions.

callingsecuredefensepatternsfunctionwalkthroughs

Secure RAG Architecture Walkthrough

Design and implement a secure RAG architecture with document sanitization, access controls, and output validation.

architecturesecuredefenseragwalkthroughs

Secure RAG Pipeline Architecture

Build a secure RAG pipeline with document sanitization, retrieval validation, and injection-resistant augmentation.

walkthroughsdefensesecure-ragpipeline

System Prompt Protection Techniques

Implement multiple layers of system prompt protection against extraction, override, and leakage attacks.

walkthroughsdefensesystem-promptprotection

AI Security Threat Intelligence

Build a threat intelligence pipeline for staying current with AI security threats and attack techniques.

threatsecuritydefenseintelligencewalkthroughs

Token-Level Input Filtering

Implement token-level input filtering that detects injection patterns at the tokenization stage before model processing.

walkthroughsdefensetoken-filteringinput-safety

Tool Call Authorization Framework

Implement a tool call authorization framework that validates tool invocations against policy before execution.

walkthroughsdefensetool-authorizationframework

Agent Tool Access Control Implementation

Implement fine-grained tool access control for LLM agents with capability-based security and approval workflows.

walkthroughsdefenseagentaccess-control

Anomaly Detection for LLM Traffic

Build anomaly detection for LLM API traffic to identify attack patterns, abuse, and prompt injection attempts.

walkthroughsdefenseanomaly-detectiontraffic

Conversation Integrity Monitoring

Build a conversation integrity monitoring system that detects manipulation across multi-turn interactions.

walkthroughsdefenseconversation-integritymonitoring

Defense Effectiveness Metrics Dashboard

Build a dashboard to measure and visualize the effectiveness of defensive measures against attack categories.

walkthroughsdefensemetricsdashboard

Automated Defense Testing Pipeline

Build an automated pipeline that continuously tests defensive measures against evolving attack techniques.

walkthroughsdefensetestingautomation

Embedding Poisoning Detection System

Build a detection system for identifying poisoned documents in vector databases using statistical analysis.

walkthroughsdefenseembeddingpoisoning-detection

Function Calling Guardrails Implementation

Implement guardrails for function calling that validate tool selection, parameters, and execution scope.

walkthroughsdefensefunction-callingguardrails

LLM Honeypot Deployment Guide

Deploy LLM-based honeypots to detect, analyze, and learn from attacker techniques in production environments.

walkthroughsdefensehoneypotdeception

Incident Response Playbook for LLM Applications

Design and implement an incident response playbook specific to LLM application security incidents.

walkthroughsdefenseincident-responseplaybook

LLM Guard Production Deployment Guide

Deploy LLM Guard in a production environment with custom scanners, performance optimization, and monitoring.

walkthroughsdefensellm-guardproduction

MCP Server Security Hardening Guide

Harden MCP server implementations against tool poisoning, transport attacks, and capability escalation.

walkthroughsdefensemcpserver-hardening

Multi-Model Defense Ensemble

Build an ensemble defense system using multiple models to cross-validate inputs and outputs for safety.

walkthroughsdefensemulti-modelensemble

NeMo Guardrails Advanced Configuration

Advanced NeMo Guardrails configuration including custom actions, knowledge base integration, and multi-model pipelines.

walkthroughsdefensenemo-guardrailsadvanced

Building an Output Scanning Pipeline

Build a production output scanning pipeline that detects data leakage, harmful content, and injection artifacts.

walkthroughsdefenseoutput-scanningpipeline

PII Detection and Redaction for LLMs

Build a PII detection and redaction system for LLM inputs and outputs to prevent data exposure.

walkthroughsdefensepiiredaction

Training a Prompt Injection Classifier

Train a custom prompt injection detection classifier using labeled datasets and modern NLP techniques.

walkthroughsdefenseclassifiertraining

RAG Document Sandboxing Implementation

Implement document-level sandboxing for RAG systems to prevent cross-document injection and privilege escalation.

walkthroughsdefenseragsandboxing

Adaptive Rate Limiting for LLM APIs

Implement adaptive rate limiting that adjusts thresholds based on detected attack patterns and user behavior.

walkthroughsdefenserate-limitingadaptive

Secure RAG Architecture Implementation

Implement a security-hardened RAG architecture with input sanitization, access control, and output validation.

walkthroughsdefenseragsecure-architecture

System Prompt Protection Layers

Implement multi-layer system prompt protection including injection-resistant formatting and runtime validation.

walkthroughsdefensesystem-promptprotection

Zero Trust Architecture for LLM Apps

Implement zero trust principles in LLM application architecture with continuous verification and least privilege.

architecturetrustdefensellmzerowalkthroughs

Full Engagement: AI Code Assistant

End-to-end engagement for assessing an AI-powered code assistant with repository access.

walkthroughsengagementcode-assistantsecurity

Full Engagement: Content Generation Platform

Full engagement walkthrough for a content generation AI platform with brand safety and copyright concerns.

generationcontentengagementwalkthroughsengagements

Full Engagement: Customer Analytics AI

Red team engagement for an AI-powered customer analytics system processing sensitive behavioral data.

analyticsengagementcustomerwalkthroughsengagements

Full Engagement: AI Developer Tools

End-to-end engagement for AI-powered developer tools with repository access and code execution capabilities.

devtoolsengagementwalkthroughsengagements

Full Engagement: Educational AI Platform

End-to-end engagement walkthrough for an AI-powered educational platform with student safety requirements.

educationalplatformengagementwalkthroughsengagements

Full Engagement: AI Financial Advisor

Full engagement for assessing an AI financial advisor chatbot handling investment recommendations.

walkthroughsengagementfinancialadvisor

Full Engagement: Government AI System

Full red team engagement for a government-deployed AI system with classification and compliance requirements.

governmentengagementwalkthroughsengagements

Full Engagement: Healthcare AI System

End-to-end engagement walkthrough for a healthcare AI system with HIPAA compliance requirements.

walkthroughsengagementhealthcarecompliance

Full Engagement: Internal Knowledge Bot

Full engagement walkthrough for an internal enterprise knowledge bot with RAG and document access.

walkthroughsengagementinternalknowledge-bot

Full Engagement: Multi-Agent Platform

Full red team engagement of a multi-agent platform with MCP servers and A2A communication.

walkthroughsengagementmulti-agentplatform

Full Engagement: SaaS Customer Chatbot

End-to-end engagement walkthrough for assessing a SaaS customer-facing chatbot with tool access.

walkthroughsengagementsaaschatbot

Full Engagement: ML Supply Chain Audit

End-to-end engagement walkthrough for auditing an organization's ML model supply chain security.

auditsupplychainengagementwalkthroughsengagements

Full Engagement: Autonomous Coding Agent

End-to-end engagement for an autonomous AI coding agent with repo access, CI/CD integration, and deployment.

walkthroughsengagementcoding-agentautonomous

Full Engagement: AI Content Generation Platform

Red team engagement of an AI content generation platform used for marketing, SEO, and social media.

walkthroughsengagementcontent-generationplatform

Full Engagement: Multi-Tool Customer Support Agent

End-to-end engagement for a customer support agent with order management, refund processing, and CRM access.

walkthroughsengagementcustomer-supportmulti-tool

Full Engagement: DevOps AI Assistant

End-to-end engagement for a DevOps AI assistant with CI/CD, cloud infrastructure, and monitoring access.

walkthroughsengagementdevopsinfrastructure

Full Engagement: Document Processing Pipeline

Full engagement of an AI document processing pipeline handling invoices, contracts, and compliance documents.

walkthroughsengagementdocument-processingpipeline

Full Engagement: AI Education Platform

End-to-end engagement walkthrough for assessing an AI-powered education platform with tutoring and grading.

walkthroughsengagementeducationplatform

Full Engagement: Government Citizen Services AI

Red team engagement of a government citizen services AI handling tax, benefits, and identity verification.

walkthroughsengagementgovernmentcitizen-services

Full Engagement: HR Benefits Chatbot

Full engagement walkthrough for an HR benefits chatbot with access to employee records and benefits administration.

walkthroughsengagementhrbenefits

Full Engagement: AI Insurance Underwriting

Full engagement for an AI underwriting system with risk scoring, policy generation, and claims data access.

walkthroughsengagementinsuranceunderwriting

Full Engagement: Legal Research AI System

Full red team engagement of a legal research AI with case law access, privilege management, and citation generation.

walkthroughsengagementlegalresearch

Full Engagement: Enterprise Multi-Agent System

Full engagement of an enterprise multi-agent system with specialized agents for HR, IT, Finance, and Legal.

walkthroughsengagementmulti-agententerprise

Full Engagement: Retail Personalization AI

Red team engagement of an AI personalization engine with access to customer profiles and purchase history.

walkthroughsengagementretailpersonalization

Full Engagement: AI Security Copilot

Red team engagement of an AI security copilot with access to SIEM, vulnerability scanners, and threat intelligence.

walkthroughsengagementsecuritycopilot

Full Engagement: Supply Chain AI Optimizer

End-to-end engagement for a supply chain AI with access to logistics, inventory, and supplier management systems.

walkthroughsengagementsupply-chainoptimizer

Full Engagement: Telehealth AI Assistant

End-to-end engagement for a telehealth AI assistant with appointment scheduling, symptom assessment, and EHR access.

walkthroughsengagementtelehealthhealthcare

Walkthroughs

Step-by-step guided walkthroughs covering red team tools, engagement methodology, defense implementation, platform-specific testing, and full engagement workflows.

walkthroughstutorialstoolsmethodologydefenseplatformsengagements

Adversarial Simulation Design

Design realistic adversarial simulations that model real-world threat actors and attack scenarios for AI systems.

walkthroughsmethodologyadversarial-simulationdesign

Agentic System Assessment Methodology

Comprehensive methodology for assessing agentic AI systems including tool use, memory, and multi-agent interactions.

walkthroughsmethodologyagenticassessment

AI Penetration Test Planning

Complete methodology for planning AI-specific penetration tests including scope definition, resource allocation, and timeline.

pentestmethodologyplanningwalkthroughs

AI Red Team Maturity Model (Methodology Walkthrough)

Maturity model for assessing and improving an organization's AI red teaming capabilities.

methodologymaturityteamredwalkthroughs

AI Security Metrics Framework

Framework for measuring and reporting on AI security posture using quantitative metrics.

metricssecurityframeworkmethodologywalkthroughs

AI Vulnerability Classification System

Structured system for classifying AI-specific vulnerabilities by type, impact, and exploitability.

vulnerabilityclassificationmethodologywalkthroughs

Attack Prioritization Framework

Prioritize attack techniques based on target architecture, time constraints, and likelihood of success.

walkthroughsmethodologyattack-prioritizationframework

Attack Tree Construction for LLM Systems

Build systematic attack trees for LLM system assessments using MITRE ATLAS and OWASP mappings.

walkthroughsmethodologyattack-treesplanning

Automated AI Reconnaissance Workflow

Build an automated reconnaissance workflow that maps AI application architecture, models, and defense configurations.

walkthroughsmethodologyautomated-reconworkflow

Collaborative AI Red Team Assessment

Coordinate multi-person red team assessments with role assignments, communication protocols, and finding deconfliction.

walkthroughsmethodologycollaborativeteam

Competitive Analysis of AI Security Tools

Methodology for evaluating and comparing AI security tools for red team operations.

securityanalysismethodologycompetitivewalkthroughs

Compliance-Driven Testing Methodology

Map regulatory requirements to specific test cases for compliance-driven AI red team assessments.

walkthroughsmethodologycompliancetesting

Continuous Monitoring Integration Methodology

Integrate red team findings into continuous monitoring systems for ongoing threat detection and defense validation.

walkthroughsmethodologycontinuous-monitoringintegration

Finding Deduplication and Triage

Deduplicate and triage findings from automated and manual testing into actionable, prioritized vulnerability reports.

walkthroughsmethodologydeduplicationtriage

Multi-Model Testing Methodology

Structured methodology for testing applications that use multiple LLM models in their processing pipeline.

walkthroughsmethodologymulti-modeltesting

Post-Engagement Analysis Methodology

Conduct thorough post-engagement analysis including lessons learned, technique effectiveness, and methodology refinement.

walkthroughsmethodologypost-engagementanalysis

Purple Team AI Assessment Methodology

Conduct collaborative purple team AI assessments with real-time feedback between red and blue team operations.

walkthroughsmethodologypurple-teamcollaborative

AI Security Regression Testing Methodology

Design regression testing suites that verify security fixes remain effective across model updates and deployments.

walkthroughsmethodologyregression-testingcontinuous

Risk-Based AI Testing Approach

Apply risk-based testing approaches to focus assessment effort on the highest-impact vulnerability categories.

walkthroughsmethodologyrisk-basedprioritization

Stakeholder Management in AI Red Teaming

Managing stakeholder expectations and communication throughout AI red team engagements.

managementstakeholderredteammethodologywalkthroughs

Stakeholder-Specific Reporting Methodology

Tailor red team reports for different stakeholders including executives, developers, security teams, and compliance officers.

walkthroughsmethodologystakeholder-reportingcommunication

AI Security Tabletop Exercises

Designing and facilitating tabletop exercises focused on AI security incident scenarios.

securitytabletopmethodologyexercisewalkthroughs

Time-Boxed AI Assessment Methodology

Maximize testing coverage within strict time constraints using prioritized attack trees and parallel testing.

walkthroughsmethodologytime-boxedefficient

AI Attack Surface Enumeration Methodology

Systematic methodology for enumerating the complete attack surface of an AI-powered application.

walkthroughsmethodologyattack-surfaceenumeration

AI Compliance Testing Methodology

Methodology for testing AI systems against regulatory compliance requirements including EU AI Act and NIST.

walkthroughsmethodologycompliancetesting

Mapping Findings to MITRE ATLAS

Methodology for mapping AI red team findings to MITRE ATLAS tactics, techniques, and procedures.

walkthroughsmethodologymitre-atlasmapping

AI Penetration Test Report Structure

Detailed report structure for AI penetration tests with finding templates and severity scoring.

walkthroughsmethodologyreport-structurepentest

AI Red Team Scoping Templates

Templates and procedures for scoping AI red team engagements across different application types.

walkthroughsmethodologyscopingtemplates

AI Risk Quantification Methodology

Quantitative risk assessment methodology for AI vulnerabilities with probability and impact scoring.

walkthroughsmethodologyrisk-quantificationmetrics

AI Security Tabletop Exercise Design

Design and facilitate AI security tabletop exercises for organizational preparedness assessment.

walkthroughsmethodologytabletopexercise

Evidence Collection During AI Testing

Best practices for collecting, organizing, and preserving evidence during AI red team assessments.

walkthroughsmethodologyevidencecollection

Multi-Model Assessment Methodology

Methodology for assessing applications that use multiple AI models in pipelines or ensemble configurations.

walkthroughsmethodologymulti-modelassessment

OWASP LLM Top 10 Testing Methodology

Comprehensive testing methodology for each vulnerability in the OWASP LLM Top 10 2025.

walkthroughsmethodologyowasptesting-guide

Purple Team Operations for AI Security

Methodology for conducting purple team operations that combine red team attacks with blue team defense improvement.

walkthroughsmethodologypurple-teamoperations

AI Security Regression Testing Methodology (Methodology Walkthrough)

Methodology for continuous regression testing of AI application security after updates and model changes.

walkthroughsmethodologyregression-testingcontinuous

Communicating AI Risks to Stakeholders

Guide for communicating AI security risks to technical and non-technical stakeholders effectively.

walkthroughsmethodologystakeholder-communicationrisks

Threat Intelligence for AI Systems

Methodology for gathering and applying threat intelligence specific to AI system attacks and defenses.

walkthroughsmethodologythreat-intelligenceai-systems

AI Vulnerability Prioritization Framework

Framework for prioritizing AI vulnerabilities by exploitability, impact, and remediation cost.

walkthroughsmethodologyvulnerability-prioritizationframework

Writing AI Red Team Reports

Guide to writing clear, actionable AI red team assessment reports with findings and recommendations.

walkthroughsmethodologyreportingprofessional

Testing AI21 Labs Models

Red team testing guide for AI21 Labs Jamba models including long context and efficiency features.

ai21platformstestingmodelswalkthroughs

Testing Anthropic Claude: Complete Guide

Complete red team testing guide for Anthropic's Claude including tool use, extended thinking, and computer use.

walkthroughsplatformsanthropicclaude

Testing AWS Bedrock Deployments

Red team testing guide for models deployed via AWS Bedrock including guardrails and access controls.

walkthroughsplatformsawsbedrock

Testing Azure OpenAI Service

Red team testing guide for Azure OpenAI including content filtering, managed identity, and network controls.

walkthroughsplatformsazureopenai

Testing Cohere Models

Red team testing guide for Cohere's Command-R models including RAG and tool use features.

cohereplatformstestingmodelswalkthroughs

Testing DeepSeek Models

Red team testing guide for DeepSeek models including R1 reasoning exploitation and MoE architecture.

walkthroughsplatformsdeepseekreasoning

Testing Fireworks AI Platform

Red team testing guide for Fireworks AI including function calling and compound AI systems.

fireworksplatformstestingwalkthroughs

Testing Google Gemini: Complete Guide

Complete red team testing guide for Google's Gemini including multimodal inputs and long context.

walkthroughsplatformsgooglegemini

Testing Groq Inference Platform

Red team testing guide for Groq's high-speed inference platform and its security characteristics.

inferencegroqplatformstestingwalkthroughs

Testing Hugging Face Hosted Models

Red team testing guide for models hosted on Hugging Face including Inference API and Spaces.

walkthroughsplatformshuggingfacehosted

Testing Meta Llama: Complete Guide

Complete red team testing guide for Meta's Llama models including fine-tuning and self-hosted deployment.

walkthroughsplatformsmetallama

Testing Mistral AI Models

Complete red team testing guide for Mistral AI models including Mixtral MoE architecture and chat endpoints.

mistralplatformstestingmodelswalkthroughs

Testing Ollama Local Deployments

Security testing guide for locally deployed models via Ollama including network exposure and API security.

localollamaplatformstestingwalkthroughs

Testing OpenAI GPT-4o: Complete Guide

Complete red team testing guide for OpenAI's GPT-4o including API, function calling, and structured outputs.

walkthroughsplatformsopenaigpt-4o

Testing Replicate-Hosted Models

Red team testing guide for models hosted on Replicate including open-source model deployments.

replicateplatformstestingwalkthroughshosted

Testing Together AI Platform

Red team testing guide for Together AI including fine-tuned model endpoints and custom deployments.

togetherplatformstestingwalkthroughs

Testing AI21 Jamba Models

Red team testing guide for AI21's Jamba models with SSM-Transformer hybrid architecture analysis.

walkthroughsplatformsai21jamba

Testing Anyscale/Ray Serve Endpoints

Red team testing guide for models served via Anyscale and Ray Serve including distributed inference security.

walkthroughsplatformsanyscaleray-serve

Testing Cohere Models: Complete Guide

Complete red team testing guide for Cohere's Command models including RAG integration and tool use.

walkthroughsplatformscoherecommand

Testing Fireworks AI Deployments

Red team testing guide for models deployed via Fireworks AI including function calling and fine-tuned models.

walkthroughsplatformsfireworks-aiinference

Testing GCP Vertex AI Deployments

Red team testing guide for models deployed via GCP Vertex AI including Model Garden and custom endpoints.

walkthroughsplatformsgcpvertex-ai

Testing Mistral/Mixtral: Complete Guide

Complete red team testing guide for Mistral and Mixtral models including MoE routing and API vulnerabilities.

walkthroughsplatformsmistralmixtral

Testing Ollama Local Deployments (Platform Walkthrough)

Red team testing guide for models deployed locally via Ollama including API endpoints and model management.

walkthroughsplatformsollamalocal

Testing Replicate Hosted Models

Red team testing guide for models hosted on Replicate including API exploitation and model versioning.

walkthroughsplatformsreplicatehosted

Testing Together AI Hosted Models

Red team testing guide for models hosted on Together AI's inference platform.

walkthroughsplatformstogether-aiinference

Testing vLLM Inference Deployments

Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.

walkthroughsplatformsvllminference

Security Benchmark Runner Development

Build a benchmark runner for standardized evaluation of LLM security across models and configurations.

benchmarkrunnertoolsdevelopmentwalkthroughs

Building an Attack Replay Tool

Build a tool that records and replays attack sequences for regression testing and defense validation.

replaytoolsbuildingtoolattackwalkthroughs

Counterfit ML Security Testing

Use Microsoft's Counterfit for adversarial ML testing of deployed model endpoints.

walkthroughscounterfitml-securitytesting

Building a Custom LLM Fuzzer

Build a custom fuzzing tool for LLM applications with mutation engines and coverage tracking.

walkthroughsfuzzercustomdevelopment

Building a Custom Payload Mutation Engine

Develop a mutation engine that evolves prompt injection payloads through genetic algorithms.

customtoolsmutationwalkthroughsengine

Garak Advanced Configuration Guide

Deep dive into Garak's configuration system including custom probes, detectors, and generators.

walkthroughsgarakconfigurationadvanced

Building Custom Garak Probes (Tool Walkthrough)

Develop custom probe modules for NVIDIA's Garak to test domain-specific vulnerabilities.

garakcustomtoolsprobeswalkthroughs

HarmBench Custom Behavior Sets

Extend HarmBench with custom behavior specifications for domain-specific vulnerability testing.

customtoolsbehaviorsharmbenchwalkthroughs

HarmBench Evaluation Walkthrough

Run standardized attack evaluations using the HarmBench framework and interpret results.

walkthroughsharmbenchevaluationbenchmark

Tool Walkthroughs

End-to-end practical walkthroughs for essential AI red teaming tools, covering installation, configuration, execution, and result interpretation.

toolswalkthroughsgarakpyritpromptfooburp-suiteinspect-aiollamapython

Inspect AI Safety Evaluations

Build and run AI safety evaluations using the UK AISI Inspect framework.

walkthroughsinspect-aisafetyevaluations

JailbreakBench Usage and Submission

Use JailbreakBench to evaluate jailbreak techniques and submit results to the benchmark.

walkthroughsjailbreakbenchbenchmarktesting

LLM Guard Deployment and Testing

Deploy LLM Guard for input/output scanning and test its effectiveness against common attacks.

walkthroughsllm-guarddeploymentdefense

Building an LLM Traffic Analyzer

Build a proxy-based LLM traffic analyzer for intercepting and analyzing API communications.

llmtoolstrafficanalyzerwalkthroughs

Promptfoo CI/CD Pipeline Integration

Integrate Promptfoo into CI/CD pipelines for continuous LLM security regression testing.

integrationtoolscicdpromptfoowalkthroughs

Promptfoo for Red Team Evaluation

Configure Promptfoo for comprehensive red team evaluation with custom assertions and graders.

walkthroughspromptfoored-teamevaluation

PyRIT Multi-Turn Campaign Orchestration

Build and run sophisticated multi-turn attack campaigns using PyRIT's orchestrator system.

walkthroughspyritmulti-turncampaigns

PyRIT Custom Scoring Integration

Integrate custom scoring metrics into PyRIT for organization-specific red team evaluation criteria.

integrationtoolspyritscoringwalkthroughs

Agent Security Scanner Development

Build an automated security scanner for AI agent systems that tests tool use, memory, and workflow vulnerabilities.

walkthroughstoolsagent-scannerdevelopment

Automated Red Team Report Generation

Build an automated system for generating structured red team reports from testing data and findings.

walkthroughstoolsreport-generationautomation

Building an LLM Attack Proxy

Build an intercepting proxy for LLM API traffic that enables payload injection, response analysis, and logging.

walkthroughstoolsattack-proxydevelopment

Defense Benchmarking Tool Development

Build a tool for benchmarking the effectiveness of defensive measures against standardized attack suites.

walkthroughstoolsbenchmarkingdefense

Building an Embedding Attack Toolkit

Build a toolkit for conducting embedding-level attacks including inversion, poisoning, and manipulation.

walkthroughstoolsembedding-attackstoolkit

Garak Custom Plugin Development

Develop custom Garak plugins including probes, detectors, and generators for organization-specific testing.

walkthroughsgarakplugindevelopment

Inspect AI Custom Solver Development

Build custom solvers in Inspect AI for specialized safety evaluation scenarios and attack simulations.

walkthroughsinspect-aicustom-solversdevelopment

Jailbreak Dataset Curation Tool

Build a tool for curating, organizing, and testing jailbreak prompt datasets with effectiveness tracking.

walkthroughstoolsdataset-curationjailbreaks

LLM Traffic Analysis Tool

Build a tool for analyzing and visualizing LLM API traffic patterns to identify attack indicators.

walkthroughstoolstraffic-analysismonitoring

MCP Security Audit Tool

Build a tool for auditing MCP server implementations for common security vulnerabilities and misconfigurations.

walkthroughstoolsmcp-auditsecurity

Multi-Model Test Harness Construction

Build a unified test harness for running attacks across OpenAI, Anthropic, Google, and local model endpoints.

walkthroughstoolstest-harnessmulti-model

Payload Mutation Framework Development

Build a payload mutation framework with grammar-based, semantic, and adversarial mutation strategies.

walkthroughstoolsmutation-frameworkpayloads

Promptfoo CI/CD Pipeline Integration (Tool Walkthrough)

Integrate Promptfoo red team evaluations into CI/CD pipelines for automated security regression testing.

walkthroughspromptfooci-cdintegration

PyRIT Custom Converter Development

Build custom PyRIT converters for payload transformation, encoding, and multi-modal attack generation.

walkthroughspyritconverterdevelopment

RAG Security Testing Framework

Build a framework for systematic security testing of RAG applications including poisoning and exfiltration.

walkthroughstoolsrag-testingframework

A2A Message Interception 導覽

Intercept and manipulate messages between agents in A2A protocol-based multi-agent systems.

walkthroughsa2amessage-interceptionmulti-agent

A2A Protocol Injection 導覽

Walkthrough of exploiting Google's Agent-to-Agent protocol for inter-agent prompt injection.

walkthroughsa2aprotocolinjection

Adversarial Suffix Crafting 導覽

Craft adversarial suffixes using gradient-based and gradient-free optimization methods for black-box models.

walkthroughsadversarial-suffixcraftingoptimization

API Abuse Chain 攻擊導覽

Chain multiple API calls to achieve unauthorized actions that no single call would permit.

walkthroughsapiabuse-chainsmulti-step

API Chaining 利用ation 導覽

導覽 of chaining multiple API calls in agent systems to achieve multi-step unauthorized actions.

chainingattacksexploitationapiwalkthroughs

Attention Hijacking 攻擊導覽

Hijack transformer attention mechanisms to redirect model focus toward adversarial instructions in the context.

walkthroughsattentionhijackingtransformer

AutoDAN Stealthy 越獄導覽

導覽 of generating stealthy, human-readable jailbreak prompts using the AutoDAN methodology.

walkthroughsautodanstealthyjailbreaking

Batch Processing Injection 導覽

Inject payloads through batch processing pipelines where individual items are processed without isolation.

walkthroughsbatch-processinginjectionpipeline

LLM Cache 投毒導覽

Poison LLM response caches to serve adversarial content to other users without direct injection.

walkthroughscachepoisoninginfrastructure

Competition-Style 越獄 Techniques

導覽 of jailbreak techniques used in AI security competitions and CTF events.

attackscompetitivewalkthroughwalkthroughsjailbreak

Computer Use 代理 Injection 導覽

導覽 of injecting prompts through UI elements and screenshots processed by computer-use agents.

attacksinjectionwalkthroughusecomputerwalkthroughs

進階 Context Manipulation 導覽

Manipulate conversation context through history injection, context overflow, and attention weight exploitation.

walkthroughscontextmanipulationadvanced

Context Manipulation Deep Dive

進階 walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.

diveattackscontextdeepmanipulationwalkthroughs

Crescendo Multi-Turn 越獄導覽

Step-by-step walkthrough of the Crescendo technique for gradually escalating multi-turn jailbreaks.

walkthroughscrescendomulti-turnjailbreaking

Cross-Plugin Data Exfiltration 導覽

導覽 of chaining multiple plugins/tools to exfiltrate data from LLM agent systems.

walkthroughscross-pluginexfiltrationagent

Data Harvesting Through LLM Apps

Complete walkthrough of systematic data extraction from LLM applications using various exfiltration channels.

attacksharvestingdatawalkthroughwalkthroughs

Document-Based Injection 導覽

Inject prompts through documents processed by LLM applications including PDFs, spreadsheets, and presentations.

walkthroughsdocumentinjectionfile-processing

Embedding Collision 攻擊導覽

Craft documents that collide in embedding space with target queries to hijack RAG retrieval results.

walkthroughsembeddingcollisionretrieval

Embedding Inversion 攻擊導覽

導覽 of inverting text embeddings to recover original documents from vector databases.

walkthroughsembeddinginversionattack

Encoding Chain 攻擊導覽

Chain multiple encoding transformations to bypass input filters that only decode one layer of encoding.

walkthroughsencodingchain-attacksevasion

Encoding Chain Bypass 導覽

導覽 of chaining Base64, URL encoding, and Unicode tricks to bypass multi-layer input filters.

attacksencodingchainbypasswalkthroughwalkthroughs

Few-Shot Example 投毒導覽

Poison few-shot examples in prompts to establish behavioral patterns that override system instructions.

walkthroughsfew-shotpoisoningin-context-learning

Fine-Tuning Safety Bypass 導覽

導覽 of using fine-tuning API access to remove safety behaviors from aligned models.

walkthroughsfine-tuningsafety-bypasstraining

Function Calling 利用ation 指南

Complete walkthrough of exploiting function calling in OpenAI, Anthropic, and Google AI APIs.

walkthroughsfunction-callingexploitationagent

Function Schema 投毒導覽

Poison function schemas to inject hidden instructions that redirect model tool selection and parameter filling.

walkthroughsfunction-schemapoisoningtool-use

GCG Adversarial Suffix 攻擊導覽

Complete walkthrough of implementing the Greedy Coordinate Gradient adversarial suffix attack.

walkthroughsgcgadversarialoptimization

Real-World Indirect 提示詞注入

導覽 of planting and triggering indirect prompt injection in web-browsing AI assistants.

walkthroughsindirect-injectionreal-worldweb

JSON Injection 攻擊導覽

利用 JSON parsing and generation in LLM applications to inject payloads through structured data boundaries.

walkthroughsjsoninjectionstructured-output

Logit Bias 利用ation 導覽

利用 logit bias API parameters to force models toward specific token sequences and bypass safety filters.

walkthroughslogit-biasexploitationapi-parameters

Many-Shot 越獄ing 導覽

Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.

walkthroughsmany-shotjailbreakinganthropic

進階 Markdown Injection 導覽

Inject Markdown that triggers data exfiltration through image rendering, link generation, and code block escape.

walkthroughsmarkdowninjectionrendering

MCP 工具投毒攻擊導覽

導覽 of exploiting MCP tool descriptions to redirect agent behavior via hidden instructions.

walkthroughsmcptool-poisoningagent

記憶體 Persistence 攻擊導覽

導覽 of achieving persistent memory manipulation in agent systems for cross-session influence.

attacksmemorywalkthroughpersistencewalkthroughs

模型 Context Window Overflow 導覽

Overflow the context window to push safety instructions outside the effective attention range.

walkthroughscontext-overflowattentioninjection

模型 Extraction 攻擊導覽

導覽 of extracting model weights/behavior through systematic API querying.

walkthroughsmodel-extractionstealingapi

Multi-Modal Document 攻擊導覽

Combine visual and textual injection in documents processed by multimodal LLM applications.

walkthroughsmultimodaldocumentcombined-attack

Multi-Turn Persona Escalation 導覽

Gradually escalate through persona manipulation across multiple turns to achieve full safety bypass.

walkthroughsmulti-turnpersonaescalation

Multimodal Image Injection 導覽

Step-by-step walkthrough of embedding adversarial prompts in images for vision model exploitation.

walkthroughsmultimodalimageinjection

PAIR Automated 越獄導覽

End-to-end walkthrough of the PAIR attack methodology for automated black-box jailbreaking.

walkthroughspairautomatedjailbreaking

代理 Privilege Escalation 導覽

導覽 of escalating privileges in multi-agent systems through trust chain exploitation.

attacksagentsprivilegeescalationwalkthroughwalkthroughs

進階 Prompt Leaking 導覽

進階 techniques for extracting system prompts including iterative reconstruction and side-channel methods.

walkthroughsprompt-leakingadvancedextraction

RAG 投毒 End-to-End 導覽

Complete walkthrough of poisoning a RAG system from document injection through information extraction.

walkthroughsragpoisoningdata

Reasoning 模型利用ation 導覽

利用 extended thinking in reasoning models to inject false premises and manipulate conclusion generation.

walkthroughsreasoningexploitationchain-of-thought

Regex Filter Bypass 導覽

Systematically bypass regex-based input filters using Unicode tricks, encoding, and pattern-specific evasion.

walkthroughsregexfilter-bypassevasion

Role Confusion 攻擊導覽

利用 role confusion between system, user, and assistant messages to override safety instructions.

walkthroughsrole-confusioninstruction-hierarchyattacks

Role Confusion 利用ation 導覽

利用 role boundaries in chat APIs by injecting assistant-role messages and system-level instructions.

attacksroleexploitationconfusionwalkthroughs

Semantic Camouflage 導覽

導覽 of crafting semantically camouflaged injections that evade both classifiers and human review.

attackscamouflagesemanticwalkthroughwalkthroughs

Semantic Deception 攻擊導覽

Craft semantically deceptive prompts that appear benign to classifiers while achieving adversarial objectives.

walkthroughssemanticdeceptionattacks

Sleeper 代理 Detection 導覽

導覽 of detecting deceptive sleeper agent behaviors in fine-tuned language models.

walkthroughssleeper-agentsdetectionalignment

Supply Chain 提示詞注入導覽

Plant injection payloads in upstream data sources consumed by LLM applications including packages and documentation.

walkthroughssupply-chaininjectiondependencies

模型 Supply Chain 投毒

導覽 of poisoning ML supply chains through dependency confusion, model weight manipulation, and hub attacks.

attackssupplychainpoisoningwalkthroughwalkthroughs

Synthetic Identity Injection 導覽

Create synthetic identities that exploit LLM trust mechanisms to achieve elevated instruction priority.

walkthroughssynthetic-identityinjectionsocial-engineering

TAP Tree of 攻擊s 導覽

Detailed walkthrough of building and running Tree of 攻擊s for systematic jailbreak generation.

walkthroughstaptree-of-attacksjailbreaking

Thought Injection in Reasoning 模型s 導覽

Inject adversarial thoughts into the reasoning chain of thinking models to manipulate final outputs.

walkthroughsthought-injectionreasoningchain-of-thought

Timing Side-Channel 攻擊導覽

Extract information from LLM applications through timing differences in response generation and safety filter processing.

walkthroughstimingside-channelinformation-leakage

Token-Level 攻擊 Optimization 導覽

導覽 of optimizing adversarial token sequences for maximum jailbreak effectiveness.

attacksoptimizationtokenwalkthroughs

工具 Shadowing 攻擊導覽

Register shadow tools that override legitimate tool definitions to intercept and manipulate agent actions.

walkthroughstool-shadowingmcpagent

Vision 模型攻擊導覽

攻擊 vision-language models through adversarial images with embedded text, perturbations, and visual trojans.

walkthroughsvision-modelattacksmultimodal

Vision 模型攻擊導覽 (攻擊導覽)

Step-by-step walkthrough of visual prompt injection, adversarial images, and OCR exploitation in vision-language models.

attacksvisionmodelwalkthroughwalkthroughs

A2A 代理 Impersonation 導覽

導覽 of impersonating a trusted agent in an A2A network to intercept and redirect delegated tasks.

walkthroughsa2aimpersonationmulti-agent

API Parameter 利用ation 導覽

導覽 of exploiting undocumented and misconfigured API parameters in LLM service endpoints.

walkthroughsapi-parametersexploitationtechnique

Batch API 利用ation 導覽

導覽 of exploiting batch processing APIs for large-scale prompt injection and output analysis.

walkthroughsbatch-apiexploitationtechnique

Chain-of-Thought Hijacking 導覽

導覽 of hijacking visible reasoning traces in CoT models to redirect conclusions and bypass safety checks.

walkthroughschain-of-thoughthijackingreasoning

Cognitive Overload 越獄導覽

Step-by-step walkthrough of the cognitive overload technique that overwhelms safety training with complex nested instructions.

walkthroughscognitive-overloadjailbreakingtechnique

Context Virtualization 攻擊導覽

導覽 of creating virtualized contexts that make the model believe it is operating in a different environment.

walkthroughscontext-virtualizationjailbreakingtechnique

Conversation 投毒攻擊導覽

Step-by-step walkthrough of poisoning conversation history to alter model behavior in future turns.

walkthroughsconversation-poisoningmulti-turntechnique

Document Parsing 利用 Chain 導覽

End-to-end walkthrough of exploiting document parsers to inject instructions through PDFs, DOCX, and HTML files.

walkthroughsdocument-parsingexploit-chainpdf

Few-Shot 攻擊 Scaling Analysis

Detailed analysis of how few-shot examples scale to influence model behavior, from 2-shot to many-shot regime.

walkthroughsfew-shotscalingjailbreaking

Image Steganography Injection 導覽

導覽 of hiding prompt injection payloads in images using steganographic techniques for vision model attacks.

walkthroughssteganographyimage-injectionmultimodal

Instruction Hierarchy 利用ation 導覽

導覽 of exploiting how models prioritize system, user, and tool instructions to override safety directives.

walkthroughsinstruction-hierarchyexploitationtechnique

Knowledge Graph Injection 攻擊導覽

導覽 of injecting adversarial facts into knowledge graphs consumed by LLM-based reasoning systems.

walkthroughsknowledge-graphinjectionattack

LangChain 利用 Chain 導覽

導覽 of chaining LangChain CVEs for remote code execution from prompt injection through to shell access.

walkthroughslangchainexploit-chaincve

MCP Sampling API Abuse 導覽

導覽 of exploiting the MCP sampling API to manipulate how servers request and process LLM completions.

walkthroughsmcpsampling-apiabuse

記憶體 Injection and Persistence 導覽

導覽 of injecting persistent instructions into agent memory systems that survive across sessions.

walkthroughsmemory-injectionpersistenceagent

Multi-Provider 攻擊 Campaign 導覽

End-to-end walkthrough of conducting an attack campaign across OpenAI, Anthropic, Google, and open-source models.

walkthroughsmulti-providerattack-campaigntechnique

Output Format Exfiltration Techniques 導覽

導覽 of using output format manipulation to exfiltrate data through code blocks, markdown, and structured outputs.

walkthroughsoutput-formatexfiltrationtechnique

RAG Hybrid Search 投毒導覽

導覽 of poisoning both vector and keyword search in hybrid RAG architectures for maximum retrieval influence.

walkthroughsraghybrid-searchpoisoning

Recursive 提示詞注入導覽

導覽 of creating self-replicating injection payloads that persist through model output-to-input loops.

walkthroughsrecursiveinjectionself-replicating

Refusal Suppression Technique 導覽

導覽 of refusal suppression techniques that prime models to avoid safety refusal patterns.

walkthroughsrefusal-suppressionjailbreakingtechnique

Semantic Obfuscation 導覽

導覽 of semantically obfuscating adversarial payloads so they appear benign to both classifiers and humans.

walkthroughssemantic-obfuscationevasiontechnique

Structured Output Data Leakage 導覽

導覽 of using structured output requirements to extract sensitive data embedded in model responses.

walkthroughsstructured-outputdata-leakagetechnique

工具 Use Confusion 攻擊導覽

導覽 of confusing model tool-use decisions to invoke unintended functions or skip safety-critical tools.

walkthroughstool-useconfusionfunction-calling

Unicode Normalization Bypass 導覽

Step-by-step guide to exploiting Unicode normalization differences between input filters and model tokenizers.

walkthroughsunicodenormalizationbypass

Voice AI 提示詞注入導覽

導覽 of injecting prompts into voice-based AI assistants through adversarial audio and ultrasonic signals.

walkthroughsvoice-aiinjectionspeech

LLM Watermark Analysis 導覽

導覽 of detecting and analyzing watermarks in LLM-generated text using statistical methods.

walkthroughswatermarkanalysisdetection

XML Injection in LLM Systems 導覽

利用 XML parsing in LLM application pipelines to inject instructions through entity expansion and CDATA sections.

walkthroughsxmlinjectionstructured-data

XML and JSON Injection in LLM Apps

導覽 of exploiting XML and JSON parsing in LLM applications for injection and data manipulation.

attacksxmlinjectionwalkthroughjsonwalkthroughs

Adversarial Robustness Testing Framework

Build a framework for continuously testing adversarial robustness of deployed LLM defense mechanisms.

walkthroughsdefenserobustness-testingframework

Adversarial 訓練 for LLM 防禦 (防禦導覽)

Implement adversarial training techniques to improve LLM robustness against prompt injection and jailbreaking.

walkthroughsdefenseadversarial-trainingrobustness

代理 Permission Boundary Enforcement

Implement fine-grained permission boundaries for LLM agents that limit tool access based on context and user role.

walkthroughsdefenseagent-permissionsboundaries

Automated 防禦 Regression Testing

Build automated regression tests for LLM security defenses using attack replay and continuous monitoring.

automatedtestingwalkthroughsdefense

Automated 紅隊防禦 Loop

Build an automated continuous red team testing loop that discovers and patches vulnerabilities iteratively.

walkthroughsdefenseautomated-red-teamcontinuous

Behavioral Anomaly Detection for LLMs

Implement behavioral anomaly detection that identifies when model outputs deviate from expected safety profiles.

walkthroughsdefensebehavioral-anomalydetection

Building an Input Safety Classifier

Build a production-quality input classifier that detects prompt injection attempts.

walkthroughsdefenseclassifierinput-safety

Constitutional AI Implementation 指南

Implement constitutional AI principles in a custom fine-tuning and RLHF pipeline.

walkthroughsdefenseconstitutional-aialignment

Context Isolation Pattern Implementation

Implement context isolation patterns that prevent instruction leakage between system prompts and user data.

walkthroughsdefensecontext-isolationpatterns

Data Loss Prevention for LLM Apps

Build a data loss prevention layer for LLM applications that prevents sensitive data exposure in outputs.

walkthroughsdefensedlpdata-protection

防禦 Benchmarking System

Build a benchmarking system to continuously evaluate defense effectiveness against known attack classes.

benchmarkingsystemwalkthroughsdefense

防禦-in-Depth Architecture for LLM Apps

Design and implement a complete defense-in-depth architecture for production LLM applications.

walkthroughsdefensedefense-in-deptharchitecture

AI Incident Response System Setup

Set up comprehensive incident response capabilities for AI-specific security incidents.

setupdefenseresponseincidentwalkthroughs

Input Embedding Firewall Deployment

Deploy an embedding-based firewall that detects injection attempts through semantic similarity to known attack patterns.

walkthroughsdefenseembedding-firewallinput-filtering

Secure Input/Output Logging for 防禦

Implement secure logging for LLM input/output pairs that supports forensic analysis without exposing sensitive data.

walkthroughsdefenseloggingforensics

LLM Firewall Architecture Design (防禦導覽)

Design and implement a comprehensive LLM firewall with input analysis, output filtering, and behavioral monitoring.

walkthroughsdefensellm-firewallarchitecture

LLM Honeypot Deployment

Deploy LLM honeypots to detect and study attacker behavior patterns and techniques.

deploymentdefensellmhoneypotwalkthroughs

模型 Behavior Monitoring Setup

Set up comprehensive model behavior monitoring to detect drift, anomalies, and potential compromise.

defensemonitoringmodelbehaviorwalkthroughs

Monitoring LLM Applications for Abuse

Build a monitoring and alerting system to detect ongoing attacks against LLM applications.

walkthroughsdefensemonitoringabuse-detection

Multi-模型 Safety Consensus

Implement safety consensus mechanisms where multiple models must agree before executing sensitive actions.

walkthroughsdefensemulti-modelconsensus

Output Filtering Pipeline Design

Design and implement a multi-stage output filtering pipeline for LLM applications.

walkthroughsdefenseoutput-filteringpipeline

Output Grounding and Verification

Implement output grounding verification to ensure LLM responses are factually supported by provided context.

walkthroughsdefenseoutput-groundingverification

Prompt Armor Implementation 指南

Implement a comprehensive prompt armoring system with instruction isolation, delimiter hardening, and priority enforcement.

defenseimplementationarmorpromptwalkthroughs

提示詞注入 Honeypot Setup

Deploy honeypot prompts and canary data that detect and characterize prompt injection attempts.

walkthroughsdefensehoneypotdetection

RAG Input Sanitization 導覽

Implement input sanitization for RAG systems to prevent document-based injection attacks.

walkthroughsdefenseragsanitization

Real-Time 攻擊 Detection System

Build a real-time attack detection system that monitors LLM interactions for adversarial patterns.

defenserealtimedetectionattackwalkthroughs

紅隊-防禦 Feedback Loop

Build a continuous red team-defense improvement loop with automated testing and metric tracking.

walkthroughsdefensered-teamfeedback-loop

Response Watermarking Implementation

Implement response watermarking to enable traceability and detect unauthorized reproduction of LLM outputs.

walkthroughsdefensewatermarkingtraceability

Runtime Safety Monitor Implementation

Implement a runtime safety monitor that detects and blocks unsafe model outputs in real-time.

walkthroughsdefenseruntime-monitorsafety

訓練 Custom Safety Classifiers

Train custom safety classifiers tuned to your application's specific threat model and content policy.

walkthroughsdefensesafety-classifiertraining

Secure 代理 Architecture Design

Design a secure architecture for LLM agent systems with sandboxing, capability controls, and audit trails.

architecturesecuredefenseagentwalkthroughs

Secure Function Calling Patterns

Implement secure function calling with input validation, output sanitization, and capability restrictions.

callingsecuredefensepatternsfunctionwalkthroughs

Secure RAG Architecture 導覽

Design and implement a secure RAG architecture with document sanitization, access controls, and output validation.

architecturesecuredefenseragwalkthroughs

Secure RAG Pipeline Architecture

Build a secure RAG pipeline with document sanitization, retrieval validation, and injection-resistant augmentation.

walkthroughsdefensesecure-ragpipeline

System Prompt Protection Techniques

Implement multiple layers of system prompt protection against extraction, override, and leakage attacks.

walkthroughsdefensesystem-promptprotection

AI 安全 Threat Intelligence

Build a threat intelligence pipeline for staying current with AI security threats and attack techniques.

threatsecuritydefenseintelligencewalkthroughs

Token-Level Input Filtering

Implement token-level input filtering that detects injection patterns at the tokenization stage before model processing.

walkthroughsdefensetoken-filteringinput-safety

工具 Call Authorization Framework

Implement a tool call authorization framework that validates tool invocations against policy before execution.

walkthroughsdefensetool-authorizationframework

代理工具 Access Control Implementation

Implement fine-grained tool access control for LLM agents with capability-based security and approval workflows.

walkthroughsdefenseagentaccess-control

Anomaly Detection for LLM Traffic

Build anomaly detection for LLM API traffic to identify attack patterns, abuse, and prompt injection attempts.

walkthroughsdefenseanomaly-detectiontraffic

Conversation Integrity Monitoring

Build a conversation integrity monitoring system that detects manipulation across multi-turn interactions.

walkthroughsdefenseconversation-integritymonitoring

防禦 Effectiveness Metrics Dashboard

Build a dashboard to measure and visualize the effectiveness of defensive measures against attack categories.

walkthroughsdefensemetricsdashboard

Automated 防禦 Testing Pipeline

Build an automated pipeline that continuously tests defensive measures against evolving attack techniques.

walkthroughsdefensetestingautomation

Embedding 投毒 Detection System

Build a detection system for identifying poisoned documents in vector databases using statistical analysis.

walkthroughsdefenseembeddingpoisoning-detection

Function Calling Guardrails Implementation

Implement guardrails for function calling that validate tool selection, parameters, and execution scope.

walkthroughsdefensefunction-callingguardrails

LLM Honeypot Deployment 指南

Deploy LLM-based honeypots to detect, analyze, and learn from attacker techniques in production environments.

walkthroughsdefensehoneypotdeception

Incident Response Playbook for LLM Applications

Design and implement an incident response playbook specific to LLM application security incidents.

walkthroughsdefenseincident-responseplaybook

LLM Guard Production Deployment 指南

Deploy LLM Guard in a production environment with custom scanners, performance optimization, and monitoring.

walkthroughsdefensellm-guardproduction

MCP Server 安全 Hardening 指南

Harden MCP server implementations against tool poisoning, transport attacks, and capability escalation.

walkthroughsdefensemcpserver-hardening

Multi-模型防禦 Ensemble

Build an ensemble defense system using multiple models to cross-validate inputs and outputs for safety.

walkthroughsdefensemulti-modelensemble

NeMo Guardrails 進階 Configuration

進階 NeMo Guardrails configuration including custom actions, knowledge base integration, and multi-model pipelines.

walkthroughsdefensenemo-guardrailsadvanced

Building an Output Scanning Pipeline

Build a production output scanning pipeline that detects data leakage, harmful content, and injection artifacts.

walkthroughsdefenseoutput-scanningpipeline

PII Detection and Redaction for LLMs

Build a PII detection and redaction system for LLM inputs and outputs to prevent data exposure.

walkthroughsdefensepiiredaction

訓練 a 提示詞注入 Classifier

Train a custom prompt injection detection classifier using labeled datasets and modern NLP techniques.

walkthroughsdefenseclassifiertraining

RAG Document Sandboxing Implementation

Implement document-level sandboxing for RAG systems to prevent cross-document injection and privilege escalation.

walkthroughsdefenseragsandboxing

Adaptive Rate Limiting for LLM APIs

Implement adaptive rate limiting that adjusts thresholds based on detected attack patterns and user behavior.

walkthroughsdefenserate-limitingadaptive

Secure RAG Architecture Implementation

Implement a security-hardened RAG architecture with input sanitization, access control, and output validation.

walkthroughsdefenseragsecure-architecture

System Prompt Protection Layers

Implement multi-layer system prompt protection including injection-resistant formatting and runtime validation.

walkthroughsdefensesystem-promptprotection

Zero Trust Architecture for LLM Apps

Implement zero trust principles in LLM application architecture with continuous verification and least privilege.

architecturetrustdefensellmzerowalkthroughs

Full Engagement: AI Code Assistant

End-to-end engagement for assessing an AI-powered code assistant with repository access.

walkthroughsengagementcode-assistantsecurity

Full Engagement: Content Generation Platform

Full engagement walkthrough for a content generation AI platform with brand safety and copyright concerns.

generationcontentengagementwalkthroughsengagements

Full Engagement: Customer Analytics AI

Red team engagement for an AI-powered customer analytics system processing sensitive behavioral data.

analyticsengagementcustomerwalkthroughsengagements

Full Engagement: AI Developer 工具s

End-to-end engagement for AI-powered developer tools with repository access and code execution capabilities.

devtoolsengagementwalkthroughsengagements

Full Engagement: Educational AI Platform

End-to-end engagement walkthrough for an AI-powered educational platform with student safety requirements.

educationalplatformengagementwalkthroughsengagements

Full Engagement: AI Financial Advisor

Full engagement for assessing an AI financial advisor chatbot handling investment recommendations.

walkthroughsengagementfinancialadvisor

Full Engagement: Government AI System

Full red team engagement for a government-deployed AI system with classification and compliance requirements.

governmentengagementwalkthroughsengagements

Full Engagement: Healthcare AI System

End-to-end engagement walkthrough for a healthcare AI system with HIPAA compliance requirements.

walkthroughsengagementhealthcarecompliance

Full Engagement: Internal Knowledge Bot

Full engagement walkthrough for an internal enterprise knowledge bot with RAG and document access.

walkthroughsengagementinternalknowledge-bot

Full Engagement: Multi-代理 Platform

Full red team engagement of a multi-agent platform with MCP servers and A2A communication.

walkthroughsengagementmulti-agentplatform

Full Engagement: SaaS Customer Chatbot

End-to-end engagement walkthrough for assessing a SaaS customer-facing chatbot with tool access.

walkthroughsengagementsaaschatbot

Full Engagement: ML Supply Chain Audit

End-to-end engagement walkthrough for auditing an organization's ML model supply chain security.

auditsupplychainengagementwalkthroughsengagements

Full Engagement: Autonomous Coding 代理

End-to-end engagement for an autonomous AI coding agent with repo access, CI/CD integration, and deployment.

walkthroughsengagementcoding-agentautonomous

Full Engagement: AI Content Generation Platform

Red team engagement of an AI content generation platform used for marketing, SEO, and social media.

walkthroughsengagementcontent-generationplatform

Full Engagement: Multi-工具 Customer Support 代理

End-to-end engagement for a customer support agent with order management, refund processing, and CRM access.

walkthroughsengagementcustomer-supportmulti-tool

Full Engagement: DevOps AI Assistant

End-to-end engagement for a DevOps AI assistant with CI/CD, cloud infrastructure, and monitoring access.

walkthroughsengagementdevopsinfrastructure

Full Engagement: Document Processing Pipeline

Full engagement of an AI document processing pipeline handling invoices, contracts, and compliance documents.

walkthroughsengagementdocument-processingpipeline

Full Engagement: AI Education Platform

End-to-end engagement walkthrough for assessing an AI-powered education platform with tutoring and grading.

walkthroughsengagementeducationplatform

Full Engagement: Government Citizen Services AI

Red team engagement of a government citizen services AI handling tax, benefits, and identity verification.

walkthroughsengagementgovernmentcitizen-services

Full Engagement: HR Benefits Chatbot

Full engagement walkthrough for an HR benefits chatbot with access to employee records and benefits administration.

walkthroughsengagementhrbenefits

Full Engagement: AI Insurance Underwriting

Full engagement for an AI underwriting system with risk scoring, policy generation, and claims data access.

walkthroughsengagementinsuranceunderwriting

Full Engagement: Legal Research AI System

Full red team engagement of a legal research AI with case law access, privilege management, and citation generation.

walkthroughsengagementlegalresearch

Full Engagement: Enterprise Multi-代理 System

Full engagement of an enterprise multi-agent system with specialized agents for HR, IT, Finance, and Legal.

walkthroughsengagementmulti-agententerprise

Full Engagement: Retail Personalization AI

Red team engagement of an AI personalization engine with access to customer profiles and purchase history.

walkthroughsengagementretailpersonalization

Full Engagement: AI 安全 Copilot

Red team engagement of an AI security copilot with access to SIEM, vulnerability scanners, and threat intelligence.

walkthroughsengagementsecuritycopilot

Full Engagement: Supply Chain AI Optimizer

End-to-end engagement for a supply chain AI with access to logistics, inventory, and supplier management systems.

walkthroughsengagementsupply-chainoptimizer

Full Engagement: Telehealth AI Assistant

End-to-end engagement for a telehealth AI assistant with appointment scheduling, symptom assessment, and EHR access.

walkthroughsengagementtelehealthhealthcare

逐步導覽

涵蓋紅隊工具、案件方法論、防禦實作、平台特定測試與完整案件工作流程的逐步引導式導覽。

walkthroughstutorialstoolsmethodologydefenseplatformsengagements

Adversarial Simulation Design

Design realistic adversarial simulations that model real-world threat actors and attack scenarios for AI systems.

walkthroughsmethodologyadversarial-simulationdesign

代理式 System 評量 Methodology

Comprehensive methodology for assessing agentic AI systems including tool use, memory, and multi-agent interactions.

walkthroughsmethodologyagenticassessment

AI Penetration Test Planning

Complete methodology for planning AI-specific penetration tests including scope definition, resource allocation, and timeline.

pentestmethodologyplanningwalkthroughs

AI 紅隊 Maturity 模型 (Methodology 導覽)

Maturity model for assessing and improving an organization's AI red teaming capabilities.

methodologymaturityteamredwalkthroughs

AI 安全 Metrics Framework

Framework for measuring and reporting on AI security posture using quantitative metrics.

metricssecurityframeworkmethodologywalkthroughs

AI 漏洞 Classification System

Structured system for classifying AI-specific vulnerabilities by type, impact, and exploitability.

vulnerabilityclassificationmethodologywalkthroughs

攻擊 Prioritization Framework

Prioritize attack techniques based on target architecture, time constraints, and likelihood of success.

walkthroughsmethodologyattack-prioritizationframework

攻擊 Tree Construction for LLM Systems

Build systematic attack trees for LLM system assessments using MITRE ATLAS and OWASP mappings.

walkthroughsmethodologyattack-treesplanning

Automated AI Reconnaissance Workflow

Build an automated reconnaissance workflow that maps AI application architecture, models, and defense configurations.

walkthroughsmethodologyautomated-reconworkflow

Collaborative AI 紅隊評量

Coordinate multi-person red team assessments with role assignments, communication protocols, and finding deconfliction.

walkthroughsmethodologycollaborativeteam

Competitive Analysis of AI 安全工具s

Methodology for evaluating and comparing AI security tools for red team operations.

securityanalysismethodologycompetitivewalkthroughs

Compliance-Driven Testing Methodology

Map regulatory requirements to specific test cases for compliance-driven AI red team assessments.

walkthroughsmethodologycompliancetesting

Continuous Monitoring Integration Methodology

Integrate red team findings into continuous monitoring systems for ongoing threat detection and defense validation.

walkthroughsmethodologycontinuous-monitoringintegration

Finding Deduplication and Triage

Deduplicate and triage findings from automated and manual testing into actionable, prioritized vulnerability reports.

walkthroughsmethodologydeduplicationtriage

Multi-模型 Testing Methodology

Structured methodology for testing applications that use multiple LLM models in their processing pipeline.

walkthroughsmethodologymulti-modeltesting

Post-Engagement Analysis Methodology

Conduct thorough post-engagement analysis including lessons learned, technique effectiveness, and methodology refinement.

walkthroughsmethodologypost-engagementanalysis

Purple Team AI 評量 Methodology

Conduct collaborative purple team AI assessments with real-time feedback between red and blue team operations.

walkthroughsmethodologypurple-teamcollaborative

AI 安全 Regression Testing Methodology

Design regression testing suites that verify security fixes remain effective across model updates and deployments.

walkthroughsmethodologyregression-testingcontinuous

Risk-Based AI Testing Approach

Apply risk-based testing approaches to focus assessment effort on the highest-impact vulnerability categories.

walkthroughsmethodologyrisk-basedprioritization

Stakeholder Management in AI 紅隊ing

Managing stakeholder expectations and communication throughout AI red team engagements.

managementstakeholderredteammethodologywalkthroughs

Stakeholder-Specific Reporting Methodology

Tailor red team reports for different stakeholders including executives, developers, security teams, and compliance officers.

walkthroughsmethodologystakeholder-reportingcommunication

AI 安全 Tabletop 練習s

Designing and facilitating tabletop exercises focused on AI security incident scenarios.

securitytabletopmethodologyexercisewalkthroughs

Time-Boxed AI 評量 Methodology

Maximize testing coverage within strict time constraints using prioritized attack trees and parallel testing.

walkthroughsmethodologytime-boxedefficient

AI 攻擊 Surface Enumeration Methodology

Systematic methodology for enumerating the complete attack surface of an AI-powered application.

walkthroughsmethodologyattack-surfaceenumeration

AI Compliance Testing Methodology

Methodology for testing AI systems against regulatory compliance requirements including EU AI Act and NIST.

walkthroughsmethodologycompliancetesting

Mapping Findings to MITRE ATLAS

Methodology for mapping AI red team findings to MITRE ATLAS tactics, techniques, and procedures.

walkthroughsmethodologymitre-atlasmapping

AI Penetration Test Report Structure

Detailed report structure for AI penetration tests with finding templates and severity scoring.

walkthroughsmethodologyreport-structurepentest

AI 紅隊 Scoping Templates

Templates and procedures for scoping AI red team engagements across different application types.

walkthroughsmethodologyscopingtemplates

AI Risk Quantification Methodology

Quantitative risk assessment methodology for AI vulnerabilities with probability and impact scoring.

walkthroughsmethodologyrisk-quantificationmetrics

AI 安全 Tabletop 練習 Design

Design and facilitate AI security tabletop exercises for organizational preparedness assessment.

walkthroughsmethodologytabletopexercise

Evidence Collection During AI Testing

Best practices for collecting, organizing, and preserving evidence during AI red team assessments.

walkthroughsmethodologyevidencecollection

Multi-模型評量 Methodology

Methodology for assessing applications that use multiple AI models in pipelines or ensemble configurations.

walkthroughsmethodologymulti-modelassessment

OWASP LLM Top 10 Testing Methodology

Comprehensive testing methodology for each vulnerability in the OWASP LLM Top 10 2025.

walkthroughsmethodologyowasptesting-guide

Purple Team Operations for AI 安全

Methodology for conducting purple team operations that combine red team attacks with blue team defense improvement.

walkthroughsmethodologypurple-teamoperations

AI 安全 Regression Testing Methodology (Methodology 導覽)

Methodology for continuous regression testing of AI application security after updates and model changes.

walkthroughsmethodologyregression-testingcontinuous

Communicating AI Risks to Stakeholders

指南 for communicating AI security risks to technical and non-technical stakeholders effectively.

walkthroughsmethodologystakeholder-communicationrisks

Threat Intelligence for AI Systems

Methodology for gathering and applying threat intelligence specific to AI system attacks and defenses.

walkthroughsmethodologythreat-intelligenceai-systems

AI 漏洞 Prioritization Framework

Framework for prioritizing AI vulnerabilities by exploitability, impact, and remediation cost.

walkthroughsmethodologyvulnerability-prioritizationframework

Writing AI 紅隊 Reports

指南 to writing clear, actionable AI red team assessment reports with findings and recommendations.

walkthroughsmethodologyreportingprofessional

Testing AI21 實驗室s 模型s

Red team testing guide for AI21 實驗室s Jamba models including long context and efficiency features.

ai21platformstestingmodelswalkthroughs

Testing Anthropic Claude: Complete 指南

Complete red team testing guide for Anthropic's Claude including tool use, extended thinking, and computer use.

walkthroughsplatformsanthropicclaude

Testing AWS Bedrock Deployments

Red team testing guide for models deployed via AWS Bedrock including guardrails and access controls.

walkthroughsplatformsawsbedrock

Testing Azure OpenAI Service

Red team testing guide for Azure OpenAI including content filtering, managed identity, and network controls.

walkthroughsplatformsazureopenai

Testing Cohere 模型s

Red team testing guide for Cohere's Command-R models including RAG and tool use features.

cohereplatformstestingmodelswalkthroughs

Testing DeepSeek 模型s

Red team testing guide for DeepSeek models including R1 reasoning exploitation and MoE architecture.

walkthroughsplatformsdeepseekreasoning

Testing Fireworks AI Platform

Red team testing guide for Fireworks AI including function calling and compound AI systems.

fireworksplatformstestingwalkthroughs

Testing Google Gemini: Complete 指南

Complete red team testing guide for Google's Gemini including multimodal inputs and long context.

walkthroughsplatformsgooglegemini

Testing Groq Inference Platform

Red team testing guide for Groq's high-speed inference platform and its security characteristics.

inferencegroqplatformstestingwalkthroughs

Testing Hugging Face Hosted 模型s

Red team testing guide for models hosted on Hugging Face including Inference API and Spaces.

walkthroughsplatformshuggingfacehosted

Testing Meta Llama: Complete 指南

Complete red team testing guide for Meta's Llama models including fine-tuning and self-hosted deployment.

walkthroughsplatformsmetallama

Testing Mistral AI 模型s

Complete red team testing guide for Mistral AI models including Mixtral MoE architecture and chat endpoints.

mistralplatformstestingmodelswalkthroughs

Testing Ollama Local Deployments

安全 testing guide for locally deployed models via Ollama including network exposure and API security.

localollamaplatformstestingwalkthroughs

Testing OpenAI GPT-4o: Complete 指南

Complete red team testing guide for OpenAI's GPT-4o including API, function calling, and structured outputs.

walkthroughsplatformsopenaigpt-4o

Testing Replicate-Hosted 模型s

Red team testing guide for models hosted on Replicate including open-source model deployments.

replicateplatformstestingwalkthroughshosted

Testing Together AI Platform

Red team testing guide for Together AI including fine-tuned model endpoints and custom deployments.

togetherplatformstestingwalkthroughs

Testing AI21 Jamba 模型s

Red team testing guide for AI21's Jamba models with SSM-Transformer hybrid architecture analysis.

walkthroughsplatformsai21jamba

Testing Anyscale/Ray Serve Endpoints

Red team testing guide for models served via Anyscale and Ray Serve including distributed inference security.

walkthroughsplatformsanyscaleray-serve

Testing Cohere 模型s: Complete 指南

Complete red team testing guide for Cohere's Command models including RAG integration and tool use.

walkthroughsplatformscoherecommand

Testing Fireworks AI Deployments

Red team testing guide for models deployed via Fireworks AI including function calling and fine-tuned models.

walkthroughsplatformsfireworks-aiinference

Testing GCP Vertex AI Deployments

Red team testing guide for models deployed via GCP Vertex AI including 模型 Garden and custom endpoints.

walkthroughsplatformsgcpvertex-ai

Testing Mistral/Mixtral: Complete 指南

Complete red team testing guide for Mistral and Mixtral models including MoE routing and API vulnerabilities.

walkthroughsplatformsmistralmixtral

Testing Ollama Local Deployments (Platform 導覽)

Red team testing guide for models deployed locally via Ollama including API endpoints and model management.

walkthroughsplatformsollamalocal

Testing Replicate Hosted 模型s

Red team testing guide for models hosted on Replicate including API exploitation and model versioning.

walkthroughsplatformsreplicatehosted

Testing Together AI Hosted 模型s

Red team testing guide for models hosted on Together AI's inference platform.

walkthroughsplatformstogether-aiinference

Testing vLLM Inference Deployments

Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.

walkthroughsplatformsvllminference

安全 Benchmark Runner Development

Build a benchmark runner for standardized evaluation of LLM security across models and configurations.

benchmarkrunnertoolsdevelopmentwalkthroughs

Building an 攻擊 Replay 工具

Build a tool that records and replays attack sequences for regression testing and defense validation.

replaytoolsbuildingtoolattackwalkthroughs

Counterfit ML 安全 Testing

Use Microsoft's Counterfit for adversarial ML testing of deployed model endpoints.

walkthroughscounterfitml-securitytesting

Building a Custom LLM Fuzzer

Build a custom fuzzing tool for LLM applications with mutation engines and coverage tracking.

walkthroughsfuzzercustomdevelopment

Building a Custom Payload Mutation Engine

Develop a mutation engine that evolves prompt injection payloads through genetic algorithms.

customtoolsmutationwalkthroughsengine

Garak 進階 Configuration 指南

Deep dive into Garak's configuration system including custom probes, detectors, and generators.

walkthroughsgarakconfigurationadvanced

Building Custom Garak Probes (工具導覽)

Develop custom probe modules for NVIDIA's Garak to test domain-specific vulnerabilities.

garakcustomtoolsprobeswalkthroughs

HarmBench Custom Behavior Sets

Extend HarmBench with custom behavior specifications for domain-specific vulnerability testing.

customtoolsbehaviorsharmbenchwalkthroughs

HarmBench Evaluation 導覽

Run standardized attack evaluations using the HarmBench framework and interpret results.

walkthroughsharmbenchevaluationbenchmark

工具導覽

必備 AI 紅隊演練工具的端對端實務導覽，涵蓋安裝、設定、執行與結果詮釋。

toolswalkthroughsgarakpyritpromptfooburp-suiteinspect-aiollamapython

Inspect AI Safety Evaluations

Build and run AI safety evaluations using the UK AISI Inspect framework.

walkthroughsinspect-aisafetyevaluations

越獄Bench Usage and Submission

Use 越獄Bench to evaluate jailbreak techniques and submit results to the benchmark.

walkthroughsjailbreakbenchbenchmarktesting

LLM Guard Deployment and Testing

Deploy LLM Guard for input/output scanning and test its effectiveness against common attacks.

walkthroughsllm-guarddeploymentdefense

Building an LLM Traffic Analyzer

Build a proxy-based LLM traffic analyzer for intercepting and analyzing API communications.

llmtoolstrafficanalyzerwalkthroughs

Promptfoo CI/CD Pipeline Integration

Integrate Promptfoo into CI/CD pipelines for continuous LLM security regression testing.

integrationtoolscicdpromptfoowalkthroughs

Promptfoo for 紅隊 Evaluation

Configure Promptfoo for comprehensive red team evaluation with custom assertions and graders.

walkthroughspromptfoored-teamevaluation

PyRIT Multi-Turn Campaign Orchestration

Build and run sophisticated multi-turn attack campaigns using PyRIT's orchestrator system.

walkthroughspyritmulti-turncampaigns

PyRIT Custom Scoring Integration

Integrate custom scoring metrics into PyRIT for organization-specific red team evaluation criteria.

integrationtoolspyritscoringwalkthroughs

代理安全 Scanner Development

Build an automated security scanner for AI agent systems that tests tool use, memory, and workflow vulnerabilities.

walkthroughstoolsagent-scannerdevelopment

Automated 紅隊 Report Generation

Build an automated system for generating structured red team reports from testing data and findings.

walkthroughstoolsreport-generationautomation

Building an LLM 攻擊 Proxy

Build an intercepting proxy for LLM API traffic that enables payload injection, response analysis, and logging.

walkthroughstoolsattack-proxydevelopment

防禦 Benchmarking 工具 Development

Build a tool for benchmarking the effectiveness of defensive measures against standardized attack suites.

walkthroughstoolsbenchmarkingdefense

Building an Embedding 攻擊工具kit

Build a toolkit for conducting embedding-level attacks including inversion, poisoning, and manipulation.

walkthroughstoolsembedding-attackstoolkit

Garak Custom Plugin Development

Develop custom Garak plugins including probes, detectors, and generators for organization-specific testing.

walkthroughsgarakplugindevelopment

Inspect AI Custom Solver Development

Build custom solvers in Inspect AI for specialized safety evaluation scenarios and attack simulations.

walkthroughsinspect-aicustom-solversdevelopment

越獄 Dataset Curation 工具

Build a tool for curating, organizing, and testing jailbreak prompt datasets with effectiveness tracking.

walkthroughstoolsdataset-curationjailbreaks

LLM Traffic Analysis 工具

Build a tool for analyzing and visualizing LLM API traffic patterns to identify attack indicators.

walkthroughstoolstraffic-analysismonitoring

MCP 安全 Audit 工具

Build a tool for auditing MCP server implementations for common security vulnerabilities and misconfigurations.

walkthroughstoolsmcp-auditsecurity

Multi-模型 Test Harness Construction

Build a unified test harness for running attacks across OpenAI, Anthropic, Google, and local model endpoints.

walkthroughstoolstest-harnessmulti-model

Payload Mutation Framework Development

Build a payload mutation framework with grammar-based, semantic, and adversarial mutation strategies.

walkthroughstoolsmutation-frameworkpayloads

Promptfoo CI/CD Pipeline Integration (工具導覽)

Integrate Promptfoo red team evaluations into CI/CD pipelines for automated security regression testing.

walkthroughspromptfooci-cdintegration

PyRIT Custom Converter Development

Build custom PyRIT converters for payload transformation, encoding, and multi-modal attack generation.

walkthroughspyritconverterdevelopment

RAG 安全 Testing Framework

Build a framework for systematic security testing of RAG applications including poisoning and exfiltration.

walkthroughstoolsrag-testingframework