# evasion
標記為「evasion」的 80 篇文章
Defender for AI Bypass
Red team techniques for understanding and bypassing Microsoft Defender for AI: detection capabilities, alert analysis, bypass strategies, coverage gaps, and alert fatigue exploitation.
AI Code Review Bypass Techniques
Techniques for crafting code changes that evade AI-powered security review tools while introducing vulnerabilities or backdoors.
Monthly Competition: Stealth Operations
Monthly competition challenging participants to achieve objectives while evading detection by increasingly sophisticated monitoring systems.
Data Deduplication Attacks
Exploiting and evading data deduplication processes used in training pipeline data cleaning to ensure poisoned samples survive preprocessing.
Watermark & Fingerprint Evasion
Deep dive into detecting and removing output watermarks, degrading weight watermarks, evading model fingerprinting, building provenance-stripping pipelines, and understanding the legal landscape of model ownership verification.
Backdoor Trigger Design
Methodology for designing effective backdoor triggers for LLMs, covering trigger taxonomy, poison rate optimization, trigger-target mapping, multi-trigger systems, evaluation evasion, and persistence through fine-tuning.
Training Data Watermark Attacks
Attacking and evading watermarking schemes designed to detect training data usage and enforce data licensing compliance.
Input/Output Filtering Systems
Deep dive into regex, ML classifier, and embedding-based filters for both input scanning and output scanning, with systematic bypass techniques for each type.
Embedding Watermarking Attacks
Attacking and evading embedding watermarking schemes used for content tracking and intellectual property protection.
Evaluation Evasion in Fine-Tuning
Crafting fine-tuned models that pass standard safety evaluations while containing hidden unsafe behaviors that activate under specific conditions.
Adversarial ML: Core Concepts
History and fundamentals of adversarial machine learning — perturbation attacks, evasion vs poisoning, robustness — bridging classical adversarial ML to LLM-specific attacks.
AI Fraud Detection Evasion
Techniques for evading AI-powered fraud detection systems through adversarial transaction crafting.
Evading AI Fraud Detection
Techniques for evading AI-powered fraud detection systems including adversarial transaction crafting, concept drift exploitation, feedback loop manipulation, and ensemble evasion strategies.
Semantic Injection Attacks
Meaning-preserving adversarial attacks that evade syntactic detection by encoding malicious intent in semantically equivalent but structurally different formulations.
Defense-Aware Injection Design
Designing prompt injections that account for and evade known defensive mechanisms.
Basic Classifier Evasion
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
Lab: Defense Bypass Basics
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
Classifier Gauntlet: 10-Stage Evasion
Bypass 10 progressively harder input classifiers using different evasion techniques at each stage.
Lab: Azure Content Filter Evasion
Hands-on lab for mapping and testing Azure OpenAI Service content filtering categories, severity levels, and bypass techniques.
Lab: Chunking Exploitation
Hands-on lab for crafting documents that split across chunks in ways that hide malicious content from chunk-level filtering while maintaining attack effectiveness.
ML Classifier Evasion Lab
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
Lab: Encoding Evasion Techniques
Hands-on lab using Base64, ROT13, Unicode normalization, and custom encoding schemes to evade input filters and safety classifiers in language model systems.
LLM Guard Evasion Techniques
Develop evasion techniques against LLM Guard input scanners and output detectors.
Multimodal Defense Bypass Techniques
Techniques for bypassing safety filters that only analyze individual modalities.
Multimodal Watermark Evasion
Techniques for evading and removing watermarks applied to AI-generated images, audio, and video content.
Language Switching
Exploiting language-specific gaps in safety training by switching to low-resource languages, mixing languages, or using transliteration to evade filters.
Advanced Payload Obfuscation
Advanced obfuscation techniques for prompt injection payloads including encoding chains and semantic disguise.
Payload Splitting
Breaking malicious instructions across multiple messages, variables, or data sources to evade single-point detection while the model reassembles the complete payload during processing.
Semantic Camouflage Attacks
Using semantic similarity and paraphrase techniques to disguise adversarial instructions as benign content while preserving attack effectiveness.
Time-Based Injection Attacks
Attacks that exploit temporal aspects of model interaction including conversation history management, cache behavior, and session handling.
Counter-Forensics for AI Attacks
Techniques for evading forensic analysis during and after AI red team operations, including log manipulation and behavioral normalization.
Evasion Techniques for AI Classifiers
Advanced techniques for evading input/output safety classifiers in LLM applications.
Encoding-Based Evasion
Using base64, ROT13, hexadecimal, Unicode, and other encoding schemes to evade input detection systems and bypass content filters in LLM applications.
Encoding Chain Attack Walkthrough
Chain multiple encoding transformations to bypass input filters that only decode one layer of encoding.
Payload Obfuscation Techniques
Methods for disguising prompt injection payloads through encoding, splitting, substitution, and other obfuscation techniques to bypass input filters and detection systems.
Regex Filter Bypass Walkthrough
Systematically bypass regex-based input filters using Unicode tricks, encoding, and pattern-specific evasion.
Typography Injection in Images
Using rendered text with specific fonts, styles, and typographic techniques in images to inject prompts into vision-language models while evading detection.
Semantic Obfuscation Walkthrough
Walkthrough of semantically obfuscating adversarial payloads so they appear benign to both classifiers and humans.
Building Converter Pipelines for Payload Transformation in PyRIT
Intermediate walkthrough on using PyRIT's converter system to transform attack payloads through encoding, translation, paraphrasing, and other obfuscation techniques to evade input filters.
Testing Prompt Injection Defenses with Rebuff
Walkthrough for using Rebuff to test and evaluate prompt injection detection capabilities, covering installation, detection pipeline analysis, adversarial evasion testing, custom rule development, and benchmarking detection accuracy.
Defender for AI Bypass
Red team techniques for understanding and bypassing Microsoft Defender for AI: detection capabilities, alert analysis, bypass strategies, coverage gaps, and alert fatigue exploitation.
AI Code Review Bypass Techniques
Techniques for crafting code changes that evade AI-powered security review tools while introducing vulnerabilities or backdoors.
Monthly Competition: Stealth Operations
Monthly competition challenging participants to achieve objectives while evading detection by increasingly sophisticated monitoring systems.
Data Deduplication 攻擊s
利用ing and evading data deduplication processes used in training pipeline data cleaning to ensure poisoned samples survive preprocessing.
Watermark & Fingerprint Evasion
Deep dive into detecting and removing output watermarks, degrading weight watermarks, evading model fingerprinting, building provenance-stripping pipelines, and understanding the legal landscape of model ownership verification.
Backdoor Trigger Design
Methodology for designing effective backdoor triggers for LLMs, covering trigger taxonomy, poison rate optimization, trigger-target mapping, multi-trigger systems, evaluation evasion, and persistence through fine-tuning.
訓練 Data Watermark 攻擊s
攻擊ing and evading watermarking schemes designed to detect training data usage and enforce data licensing compliance.
Input/Output Filtering Systems
Deep dive into regex, ML classifier, and embedding-based filters for both input scanning and output scanning, with systematic bypass techniques for each type.
Embedding Watermarking 攻擊s
攻擊ing and evading embedding watermarking schemes used for content tracking and intellectual property protection.
Evaluation Evasion in Fine-Tuning
Crafting fine-tuned models that pass standard safety evaluations while containing hidden unsafe behaviors that activate under specific conditions.
對抗式 ML:核心概念
對抗式機器學習的歷史與基本原理——擾動攻擊、逃避與投毒、穩健性——將古典對抗式 ML 銜接至 LLM 特有攻擊。
AI Fraud Detection Evasion
Techniques for evading AI-powered fraud detection systems through adversarial transaction crafting.
Evading AI Fraud Detection
Techniques for evading AI-powered fraud detection systems including adversarial transaction crafting, concept drift exploitation, feedback loop manipulation, and ensemble evasion strategies.
Semantic Injection 攻擊s
Meaning-preserving adversarial attacks that evade syntactic detection by encoding malicious intent in semantically equivalent but structurally different formulations.
防禦-Aware Injection Design
Designing prompt injections that account for and evade known defensive mechanisms.
Basic Classifier Evasion
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
實驗室: 防禦 Bypass Basics
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
Classifier Gauntlet: 10-Stage Evasion
Bypass 10 progressively harder input classifiers using different evasion techniques at each stage.
實驗室: Azure Content Filter Evasion
Hands-on lab for mapping and testing Azure OpenAI Service content filtering categories, severity levels, and bypass techniques.
實驗室: Chunking 利用ation
Hands-on lab for crafting documents that split across chunks in ways that hide malicious content from chunk-level filtering while maintaining attack effectiveness.
ML Classifier Evasion 實驗室
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
實驗室: Encoding Evasion Techniques
Hands-on lab using Base64, ROT13, Unicode normalization, and custom encoding schemes to evade input filters and safety classifiers in language model systems.
LLM Guard Evasion Techniques
Develop evasion techniques against LLM Guard input scanners and output detectors.
Multimodal 防禦 Bypass Techniques
Techniques for bypassing safety filters that only analyze individual modalities.
Multimodal Watermark Evasion
Techniques for evading and removing watermarks applied to AI-generated images, audio, and video content.
Language Switching
利用ing language-specific gaps in safety training by switching to low-resource languages, mixing languages, or using transliteration to evade filters.
進階 Payload Obfuscation
進階 obfuscation techniques for prompt injection payloads including encoding chains and semantic disguise.
載荷分割
將惡意指令拆分至多則訊息、變數或資料來源,以規避單點偵測,同時讓模型在處理過程中重新組合完整載荷。
Semantic Camouflage 攻擊s
Using semantic similarity and paraphrase techniques to disguise adversarial instructions as benign content while preserving attack effectiveness.
Time-Based Injection 攻擊s
攻擊s that exploit temporal aspects of model interaction including conversation history management, cache behavior, and session handling.
Counter-Forensics for AI 攻擊s
Techniques for evading forensic analysis during and after AI red team operations, including log manipulation and behavioral normalization.
Evasion Techniques for AI Classifiers
進階 techniques for evading input/output safety classifiers in LLM applications.
Encoding-Based Evasion
Using base64, ROT13, hexadecimal, Unicode, and other encoding schemes to evade input detection systems and bypass content filters in LLM applications.
Encoding Chain 攻擊 導覽
Chain multiple encoding transformations to bypass input filters that only decode one layer of encoding.
Payload Obfuscation Techniques
Methods for disguising prompt injection payloads through encoding, splitting, substitution, and other obfuscation techniques to bypass input filters and detection systems.
Regex Filter Bypass 導覽
Systematically bypass regex-based input filters using Unicode tricks, encoding, and pattern-specific evasion.
Typography Injection in Images
Using rendered text with specific fonts, styles, and typographic techniques in images to inject prompts into vision-language models while evading detection.
Semantic Obfuscation 導覽
導覽 of semantically obfuscating adversarial payloads so they appear benign to both classifiers and humans.
Building Converter Pipelines for Payload Transformation in PyRIT
Intermediate walkthrough on using PyRIT's converter system to transform attack payloads through encoding, translation, paraphrasing, and other obfuscation techniques to evade input filters.
Testing 提示詞注入 防禦s with Rebuff
導覽 for using Rebuff to test and evaluate prompt injection detection capabilities, covering installation, detection pipeline analysis, adversarial evasion testing, custom rule development, and benchmarking detection accuracy.