# adversarial
標記為「adversarial」的 109 篇文章
Capstone: Design and Run an Adversarial ML Competition
Design, build, and operate a capture-the-flag style adversarial ML competition with automated scoring, diverse challenge categories, and real-time leaderboards.
Capstone: Build a Multimodal Attack Testing Suite
Design and implement a comprehensive testing suite for attacking multimodal AI systems across text, image, audio, and document modalities.
Case Study: GCG Attack and Industry Response
Analysis of the Zou et al. 2023 GCG attack, industry response, and lasting impact on adversarial robustness research.
Adversarial Training for LLM Defense
Use adversarial training techniques to improve LLM robustness against known attack patterns.
LLM-as-Judge Defense Systems
How LLM-as-judge architectures evaluate other LLM outputs for safety, including sequential and parallel designs, judge prompt engineering, and techniques for attacking judge models.
Dense Retrieval Adversarial Attacks
Adversarial attacks against dense retrieval models used in RAG and search systems.
Dense Retrieval Attacks
Attacking dense retrieval systems by crafting adversarial passages that achieve high relevance scores for target queries while containing malicious content.
Embedding Space Mapping Attacks
Using embedding space topology analysis to identify adversarial regions and craft inputs that produce targeted embedding representations.
Reranker Adversarial Inputs
Crafting adversarial inputs that manipulate cross-encoder reranking models in retrieval pipelines.
Adversarial Dataset Generator
Creating tools that generate diverse adversarial datasets for benchmarking LLM safety, including semantic variations and encoding permutations.
AI Exploit Development
Adversarial suffix generation, gradient-free optimization, WAF-evading injection payloads, and fuzzing frameworks for AI systems.
Developing Transferable Attacks
Cross-model attack techniques, measuring transferability, ensemble optimization, and practical transfer testing methodologies for AI red teams.
Token Optimizer Techniques
Implementing token-level optimization algorithms for discovering adversarial inputs, including GCG, AutoDAN, and custom gradient-based approaches.
Universal Adversarial Triggers Research
Research on discovering universal adversarial triggers that cause specific behaviors across model families.
MITRE ATLAS Walkthrough
MITRE ATLAS tactics, techniques, and procedures for AI systems. How to use ATLAS for red team engagement planning and map attacks to ATLAS IDs.
Evading AI Fraud Detection
Techniques for evading AI-powered fraud detection systems including adversarial transaction crafting, concept drift exploitation, feedback loop manipulation, and ensemble evasion strategies.
Attacking Clinical AI Systems
Detailed attack techniques for clinical AI systems including diagnostic output manipulation, treatment recommendation poisoning, triage system exploitation, and adversarial medical data crafting.
Medical Imaging AI Attacks
Adversarial attacks on medical imaging AI systems including perturbations on X-rays, CT scans, and MRIs, GAN-based fake medical image generation, and model extraction from diagnostic imaging APIs.
Legal Research Poisoning
Adversarial attacks on AI-powered legal research platforms: citation hallucination exploitation, case law database poisoning, precedent manipulation, and adversarial brief generation targeting opposing counsel's AI tools.
Media Deepfake Detection AI Security
Security of AI-powered deepfake detection systems and adversarial attacks against detection models.
Adversarial Robustness Evaluation
Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.
Lab: Audio Adversarial Examples
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
Embedding Adversarial Perturbation
Craft adversarial inputs that produce target embeddings for retrieval manipulation.
Lab: Transfer Attack Development
Hands-on lab for crafting adversarial prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.
Lab: Transfer Attack Development (Advanced Lab)
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
Adversarial Reward Model Exploitation
Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.
Token Wars: Adversarial Optimization Challenge
Optimize adversarial tokens to achieve target model behaviors under strict character limits.
Token Smuggler Extreme: Adversarial Token Crafting
Craft adversarial token sequences under extreme character limits to achieve target model behaviors.
Lab: Build an AI Fuzzer
Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.
Lab: Automated Adversarial Prompt Generation
Build automated systems for generating adversarial prompts using mutation, crossover, and selection strategies.
Embedding Space Attacks
Techniques for attacking the embedding layer of LLMs, including adversarial perturbations, embedding inversion, and semantic space manipulation.
3D Model Adversarial Attacks
Adversarial attacks on AI systems that process 3D models, point clouds, and spatial data.
Adversarial Image Perturbation for VLMs
Generating adversarial perturbations that cause vision-language models to misinterpret or follow injected instructions.
Adversarial Typography Attacks
Craft adversarial text rendered as images to exploit OCR and vision model text recognition.
Adversarial Audio Examples
Techniques for crafting adversarial audio perturbations including psychoacoustic hiding, frequency domain attacks, and over-the-air adversarial audio.
Audio Modality Attacks
Comprehensive attack taxonomy for audio-enabled LLMs: adversarial audio generation, voice-based prompt injection, cross-modal split attacks, and ultrasonic perturbations.
Lab: Crafting Audio Adversarial Examples
Hands-on lab creating adversarial audio examples using Python audio processing, targeting Whisper transcription with injected text.
Speech Recognition Attacks
Attacking automatic speech recognition systems including adversarial audio that transcribes differently than heard, hidden voice commands, and background audio injection.
Adversarial Attacks on Audio and Speech Models
Techniques for crafting adversarial audio that exploits speech recognition, voice assistants, and audio-language models including hidden commands and psychoacoustic masking.
Multimodal Security
Security assessment of multimodal AI systems processing images, audio, video, and cross-modal inputs, covering vision-language models, speech systems, video analysis, and cross-modal attack techniques.
Medical Imaging Adversarial Attacks
Adversarial attacks on medical imaging AI including radiology, pathology, and dermatology classification systems.
OCR Adversarial Attacks
Crafting images that cause OCR systems to extract adversarial text for downstream injection.
Adversarial Attacks on Text-to-Image Models
Understanding and evaluating adversarial attacks on text-to-image generation models including prompt manipulation for safety bypass, concept erasure attacks, adversarial perturbation of guidance, and membership inference on training data.
Lab: Video Model Adversarial Attacks
Hands-on lab crafting adversarial video frames using frame-level perturbation with OpenCV and PyTorch for video model exploitation.
Video Frame Injection Attacks
Inserting adversarial frames into video to exploit video understanding models: temporal injection, keyframe manipulation, subliminal frame attacks, and detection evasion.
Attacks on Video Understanding Models
Techniques for attacking AI video understanding systems through frame injection, temporal manipulation, and adversarial video generation targeting models like Gemini 2.5 Pro.
Attacks on Vision-Language Models
Comprehensive techniques for attacking vision-language models including GPT-4V, Claude vision, and Gemini, covering adversarial images, typographic exploits, and multimodal jailbreaks.
Adversarial Image Examples for VLMs
Pixel-level perturbations that change VLM behavior, including PGD attacks on vision encoders, transferable adversarial images, and patch attacks.
Typographic Adversarial Attacks
How text rendered in images influences VLM behavior: adversarial typography, font-based prompt injection, visual instruction hijacking, and defenses against typographic manipulation.
Jailbreak Techniques
Common patterns and advanced techniques for bypassing LLM safety alignment, including role-playing, encoding tricks, many-shot attacks, and gradient-based methods.
Suffix Manipulation Attacks
Appending optimized adversarial suffixes to user inputs that exploit model continuation tendencies to produce attacker-desired outputs.
Token-Level Adversarial Attacks
Using gradient-based optimization and token manipulation to discover adversarial suffixes that reliably trigger unsafe model behavior.
Attack Payload Reference
Categorized reference of common attack payloads for AI red teaming, including prompt injection, jailbreaks, data extraction, and adversarial inputs with effectiveness notes.
GCG Adversarial Suffix Attack Walkthrough
Complete walkthrough of implementing the Greedy Coordinate Gradient adversarial suffix attack.
Capstone: Design and Run an Adversarial ML Competition
Design, build, and operate a capture-the-flag style adversarial ML competition with automated scoring, diverse challenge categories, and real-time leaderboards.
Capstone: Build a Multimodal 攻擊 Testing Suite
Design and implement a comprehensive testing suite for attacking multimodal AI systems across text, image, audio, and document modalities.
Case Study: GCG 攻擊 and Industry Response
Analysis of the Zou et al. 2023 GCG attack, industry response, and lasting impact on adversarial robustness research.
人臉辨識安全案例
人臉辨識 AI 中的安全事件案例——涵蓋偏誤與歧視、對抗性攻擊、隱私侵犯與監控濫用。
Adversarial 訓練 for LLM 防禦
Use adversarial training techniques to improve LLM robustness against known attack patterns.
LLM-as-Judge 防禦系統
LLM-as-judge 架構如何評估其他 LLM 輸出之安全性,含循序與平行設計、judge 提示工程,以及攻擊 judge 模型之技術。
Dense Retrieval Adversarial 攻擊s
Adversarial attacks against dense retrieval models used in RAG and search systems.
Dense Retrieval 攻擊s
攻擊ing dense retrieval systems by crafting adversarial passages that achieve high relevance scores for target queries while containing malicious content.
Embedding Space Mapping 攻擊s
Using embedding space topology analysis to identify adversarial regions and craft inputs that produce targeted embedding representations.
Reranker Adversarial Inputs
Crafting adversarial inputs that manipulate cross-encoder reranking models in retrieval pipelines.
Adversarial Dataset Generator
Creating tools that generate diverse adversarial datasets for benchmarking LLM safety, including semantic variations and encoding permutations.
AI Exploit 開發
對抗後綴生成、無梯度最佳化、規避 WAF 之注入 payload,以及對 AI 系統之 fuzzing 框架。
開發可遷移攻擊
跨模型攻擊技術、量測可遷移性、集成最佳化,以及為 AI 紅隊提供的實務遷移測試方法論。
Token Optimizer Techniques
Implementing token-level optimization algorithms for discovering adversarial inputs, including GCG, AutoDAN, and custom gradient-based approaches.
Universal Adversarial Triggers Research
Research on discovering universal adversarial triggers that cause specific behaviors across model families.
MITRE ATLAS 導覽
MITRE ATLAS tactics, techniques, and procedures for AI systems. How to use ATLAS for red team engagement planning and map attacks to ATLAS IDs.
Evading AI Fraud Detection
Techniques for evading AI-powered fraud detection systems including adversarial transaction crafting, concept drift exploitation, feedback loop manipulation, and ensemble evasion strategies.
攻擊ing Clinical AI Systems
Detailed attack techniques for clinical AI systems including diagnostic output manipulation, treatment recommendation poisoning, triage system exploitation, and adversarial medical data crafting.
Medical Imaging AI 攻擊s
Adversarial attacks on medical imaging AI systems including perturbations on X-rays, CT scans, and MRIs, GAN-based fake medical image generation, and model extraction from diagnostic imaging APIs.
Legal Research 投毒
Adversarial attacks on AI-powered legal research platforms: citation hallucination exploitation, case law database poisoning, precedent manipulation, and adversarial brief generation targeting opposing counsel's AI tools.
Media Deepfake Detection AI 安全
安全 of AI-powered deepfake detection systems and adversarial attacks against detection models.
Adversarial Robustness Evaluation
Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.
實驗室: Audio Adversarial Examples
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
Embedding Adversarial Perturbation
Craft adversarial inputs that produce target embeddings for retrieval manipulation.
實驗室: Transfer 攻擊 Development
Hands-on lab for crafting adversarial prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.
實驗室: Transfer 攻擊 Development (進階 實驗室)
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
Adversarial Reward 模型 利用ation
Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.
Token Wars: Adversarial Optimization Challenge
Optimize adversarial tokens to achieve target model behaviors under strict character limits.
Token Smuggler Extreme: Adversarial Token Crafting
Craft adversarial token sequences under extreme character limits to achieve target model behaviors.
實驗室: Build an AI Fuzzer
Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.
實驗室: Automated Adversarial Prompt Generation
Build automated systems for generating adversarial prompts using mutation, crossover, and selection strategies.
嵌入操控
攻擊模型表示意義的向量空間——涵蓋對抗性嵌入打造、嵌入空間投毒與語意碰撞攻擊。
3D 模型 Adversarial 攻擊s
Adversarial attacks on AI systems that process 3D models, point clouds, and spatial data.
Adversarial Image Perturbation for VLMs
Generating adversarial perturbations that cause vision-language models to misinterpret or follow injected instructions.
Adversarial Typography 攻擊s
Craft adversarial text rendered as images to exploit OCR and vision model text recognition.
對抗性音訊範例
打造對抗音訊擾動的技術,含心理聲學隱藏、頻域攻擊,以及 over-the-air 對抗音訊。
Audio Modality 攻擊s
Comprehensive attack taxonomy for audio-enabled LLMs: adversarial audio generation, voice-based prompt injection, cross-modal split attacks, and ultrasonic perturbations.
實作:打造音訊對抗範例
使用 Python 音訊處理建立對抗音訊範例之實作,針對 Whisper 轉譯注入文字。
語音辨識攻擊
攻擊自動語音辨識系統,包括轉錄結果與聽感不同的對抗音訊、隱藏語音指令,以及背景音訊注入。
Adversarial 攻擊s on Audio and Speech 模型s
Techniques for crafting adversarial audio that exploits speech recognition, voice assistants, and audio-language models including hidden commands and psychoacoustic masking.
多模態安全
處理影像、音訊、影片與跨模態輸入之多模態 AI 系統的安全評估,涵蓋視覺-語言模型、語音系統、影片分析與跨模態攻擊技術。
Medical Imaging Adversarial 攻擊s
Adversarial attacks on medical imaging AI including radiology, pathology, and dermatology classification systems.
OCR Adversarial 攻擊s
Crafting images that cause OCR systems to extract adversarial text for downstream injection.
Adversarial 攻擊s on Text-to-Image 模型s
Understanding and evaluating adversarial attacks on text-to-image generation models including prompt manipulation for safety bypass, concept erasure attacks, adversarial perturbation of guidance, and membership inference on training data.
實驗室: Video 模型 Adversarial 攻擊s
Hands-on lab crafting adversarial video frames using frame-level perturbation with OpenCV and PyTorch for video model exploitation.
Video Frame Injection 攻擊s
Inserting adversarial frames into video to exploit video understanding models: temporal injection, keyframe manipulation, subliminal frame attacks, and detection evasion.
攻擊s on Video Understanding 模型s
Techniques for attacking AI video understanding systems through frame injection, temporal manipulation, and adversarial video generation targeting models like Gemini 2.5 Pro.
攻擊s on Vision-Language 模型s
Comprehensive techniques for attacking vision-language models including GPT-4V, Claude vision, and Gemini, covering adversarial images, typographic exploits, and multimodal jailbreaks.
VLM 的對抗性影像範例
會改變 VLM 行為的像素級擾動,包括針對視覺編碼器的 PGD 攻擊、可遷移對抗影像,以及 patch 攻擊。
Typographic Adversarial 攻擊s
How text rendered in images influences VLM behavior: adversarial typography, font-based prompt injection, visual instruction hijacking, and defenses against typographic manipulation.
越獄技術
繞過大型語言模型安全對齊的常見模式與進階技術,包含角色扮演、編碼技巧、多範例攻擊與基於梯度的方法。
Suffix Manipulation 攻擊s
Appending optimized adversarial suffixes to user inputs that exploit model continuation tendencies to produce attacker-desired outputs.
Token-Level Adversarial 攻擊s
Using gradient-based optimization and token manipulation to discover adversarial suffixes that reliably trigger unsafe model behavior.
攻擊 Payload Reference
Categorized reference of common attack payloads for AI red teaming, including prompt injection, jailbreaks, data extraction, and adversarial inputs with effectiveness notes.
GCG Adversarial Suffix 攻擊 導覽
Complete walkthrough of implementing the Greedy Coordinate Gradient adversarial suffix attack.