# evasion
40 articlestagged with “evasion”
Defender for AI Bypass
Red team techniques for understanding and bypassing Microsoft Defender for AI: detection capabilities, alert analysis, bypass strategies, coverage gaps, and alert fatigue exploitation.
AI Code Review Bypass Techniques
Techniques for crafting code changes that evade AI-powered security review tools while introducing vulnerabilities or backdoors.
Monthly Competition: Stealth Operations
Monthly competition challenging participants to achieve objectives while evading detection by increasingly sophisticated monitoring systems.
Data Deduplication Attacks
Exploiting and evading data deduplication processes used in training pipeline data cleaning to ensure poisoned samples survive preprocessing.
Watermark & Fingerprint Evasion
Deep dive into detecting and removing output watermarks, degrading weight watermarks, evading model fingerprinting, building provenance-stripping pipelines, and understanding the legal landscape of model ownership verification.
Backdoor Trigger Design
Methodology for designing effective backdoor triggers for LLMs, covering trigger taxonomy, poison rate optimization, trigger-target mapping, multi-trigger systems, evaluation evasion, and persistence through fine-tuning.
Training Data Watermark Attacks
Attacking and evading watermarking schemes designed to detect training data usage and enforce data licensing compliance.
Input/Output Filtering Systems
Deep dive into regex, ML classifier, and embedding-based filters for both input scanning and output scanning, with systematic bypass techniques for each type.
Embedding Watermarking Attacks
Attacking and evading embedding watermarking schemes used for content tracking and intellectual property protection.
Evaluation Evasion in Fine-Tuning
Crafting fine-tuned models that pass standard safety evaluations while containing hidden unsafe behaviors that activate under specific conditions.
Adversarial ML: Core Concepts
History and fundamentals of adversarial machine learning — perturbation attacks, evasion vs poisoning, robustness — bridging classical adversarial ML to LLM-specific attacks.
AI Fraud Detection Evasion
Techniques for evading AI-powered fraud detection systems through adversarial transaction crafting.
Evading AI Fraud Detection
Techniques for evading AI-powered fraud detection systems including adversarial transaction crafting, concept drift exploitation, feedback loop manipulation, and ensemble evasion strategies.
Semantic Injection Attacks
Meaning-preserving adversarial attacks that evade syntactic detection by encoding malicious intent in semantically equivalent but structurally different formulations.
Defense-Aware Injection Design
Designing prompt injections that account for and evade known defensive mechanisms.
Basic Classifier Evasion
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
Lab: Defense Bypass Basics
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
Classifier Gauntlet: 10-Stage Evasion
Bypass 10 progressively harder input classifiers using different evasion techniques at each stage.
Lab: Azure Content Filter Evasion
Hands-on lab for mapping and testing Azure OpenAI Service content filtering categories, severity levels, and bypass techniques.
Lab: Chunking Exploitation
Hands-on lab for crafting documents that split across chunks in ways that hide malicious content from chunk-level filtering while maintaining attack effectiveness.
ML Classifier Evasion Lab
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
Lab: Encoding Evasion Techniques
Hands-on lab using Base64, ROT13, Unicode normalization, and custom encoding schemes to evade input filters and safety classifiers in language model systems.
LLM Guard Evasion Techniques
Develop evasion techniques against LLM Guard input scanners and output detectors.
Multimodal Defense Bypass Techniques
Techniques for bypassing safety filters that only analyze individual modalities.
Multimodal Watermark Evasion
Techniques for evading and removing watermarks applied to AI-generated images, audio, and video content.
Language Switching
Exploiting language-specific gaps in safety training by switching to low-resource languages, mixing languages, or using transliteration to evade filters.
Advanced Payload Obfuscation
Advanced obfuscation techniques for prompt injection payloads including encoding chains and semantic disguise.
Payload Splitting
Breaking malicious instructions across multiple messages, variables, or data sources to evade single-point detection while the model reassembles the complete payload during processing.
Semantic Camouflage Attacks
Using semantic similarity and paraphrase techniques to disguise adversarial instructions as benign content while preserving attack effectiveness.
Time-Based Injection Attacks
Attacks that exploit temporal aspects of model interaction including conversation history management, cache behavior, and session handling.
Counter-Forensics for AI Attacks
Techniques for evading forensic analysis during and after AI red team operations, including log manipulation and behavioral normalization.
Evasion Techniques for AI Classifiers
Advanced techniques for evading input/output safety classifiers in LLM applications.
Encoding-Based Evasion
Using base64, ROT13, hexadecimal, Unicode, and other encoding schemes to evade input detection systems and bypass content filters in LLM applications.
Encoding Chain Attack Walkthrough
Chain multiple encoding transformations to bypass input filters that only decode one layer of encoding.
Payload Obfuscation Techniques
Methods for disguising prompt injection payloads through encoding, splitting, substitution, and other obfuscation techniques to bypass input filters and detection systems.
Regex Filter Bypass Walkthrough
Systematically bypass regex-based input filters using Unicode tricks, encoding, and pattern-specific evasion.
Typography Injection in Images
Using rendered text with specific fonts, styles, and typographic techniques in images to inject prompts into vision-language models while evading detection.
Semantic Obfuscation Walkthrough
Walkthrough of semantically obfuscating adversarial payloads so they appear benign to both classifiers and humans.
Building Converter Pipelines for Payload Transformation in PyRIT
Intermediate walkthrough on using PyRIT's converter system to transform attack payloads through encoding, translation, paraphrasing, and other obfuscation techniques to evade input filters.
Testing Prompt Injection Defenses with Rebuff
Walkthrough for using Rebuff to test and evaluate prompt injection detection capabilities, covering installation, detection pipeline analysis, adversarial evasion testing, custom rule development, and benchmarking detection accuracy.