# adversarial
54 articlestagged with “adversarial”
Capstone: Design and Run an Adversarial ML Competition
Design, build, and operate a capture-the-flag style adversarial ML competition with automated scoring, diverse challenge categories, and real-time leaderboards.
Capstone: Build a Multimodal Attack Testing Suite
Design and implement a comprehensive testing suite for attacking multimodal AI systems across text, image, audio, and document modalities.
Case Study: GCG Attack and Industry Response
Analysis of the Zou et al. 2023 GCG attack, industry response, and lasting impact on adversarial robustness research.
Adversarial Training for LLM Defense
Use adversarial training techniques to improve LLM robustness against known attack patterns.
LLM-as-Judge Defense Systems
How LLM-as-judge architectures evaluate other LLM outputs for safety, including sequential and parallel designs, judge prompt engineering, and techniques for attacking judge models.
Dense Retrieval Adversarial Attacks
Adversarial attacks against dense retrieval models used in RAG and search systems.
Dense Retrieval Attacks
Attacking dense retrieval systems by crafting adversarial passages that achieve high relevance scores for target queries while containing malicious content.
Embedding Space Mapping Attacks
Using embedding space topology analysis to identify adversarial regions and craft inputs that produce targeted embedding representations.
Reranker Adversarial Inputs
Crafting adversarial inputs that manipulate cross-encoder reranking models in retrieval pipelines.
Adversarial Dataset Generator
Creating tools that generate diverse adversarial datasets for benchmarking LLM safety, including semantic variations and encoding permutations.
AI Exploit Development
Adversarial suffix generation, gradient-free optimization, WAF-evading injection payloads, and fuzzing frameworks for AI systems.
Developing Transferable Attacks
Cross-model attack techniques, measuring transferability, ensemble optimization, and practical transfer testing methodologies for AI red teams.
Token Optimizer Techniques
Implementing token-level optimization algorithms for discovering adversarial inputs, including GCG, AutoDAN, and custom gradient-based approaches.
Universal Adversarial Triggers Research
Research on discovering universal adversarial triggers that cause specific behaviors across model families.
MITRE ATLAS Walkthrough
MITRE ATLAS tactics, techniques, and procedures for AI systems. How to use ATLAS for red team engagement planning and map attacks to ATLAS IDs.
Evading AI Fraud Detection
Techniques for evading AI-powered fraud detection systems including adversarial transaction crafting, concept drift exploitation, feedback loop manipulation, and ensemble evasion strategies.
Attacking Clinical AI Systems
Detailed attack techniques for clinical AI systems including diagnostic output manipulation, treatment recommendation poisoning, triage system exploitation, and adversarial medical data crafting.
Medical Imaging AI Attacks
Adversarial attacks on medical imaging AI systems including perturbations on X-rays, CT scans, and MRIs, GAN-based fake medical image generation, and model extraction from diagnostic imaging APIs.
Legal Research Poisoning
Adversarial attacks on AI-powered legal research platforms: citation hallucination exploitation, case law database poisoning, precedent manipulation, and adversarial brief generation targeting opposing counsel's AI tools.
Media Deepfake Detection AI Security
Security of AI-powered deepfake detection systems and adversarial attacks against detection models.
Adversarial Robustness Evaluation
Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.
Lab: Audio Adversarial Examples
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
Embedding Adversarial Perturbation
Craft adversarial inputs that produce target embeddings for retrieval manipulation.
Lab: Transfer Attack Development
Hands-on lab for crafting adversarial prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.
Lab: Transfer Attack Development (Advanced Lab)
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
Adversarial Reward Model Exploitation
Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.
Token Wars: Adversarial Optimization Challenge
Optimize adversarial tokens to achieve target model behaviors under strict character limits.
Token Smuggler Extreme: Adversarial Token Crafting
Craft adversarial token sequences under extreme character limits to achieve target model behaviors.
Lab: Build an AI Fuzzer
Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.
Lab: Automated Adversarial Prompt Generation
Build automated systems for generating adversarial prompts using mutation, crossover, and selection strategies.
Embedding Space Attacks
Techniques for attacking the embedding layer of LLMs, including adversarial perturbations, embedding inversion, and semantic space manipulation.
3D Model Adversarial Attacks
Adversarial attacks on AI systems that process 3D models, point clouds, and spatial data.
Adversarial Image Perturbation for VLMs
Generating adversarial perturbations that cause vision-language models to misinterpret or follow injected instructions.
Adversarial Typography Attacks
Craft adversarial text rendered as images to exploit OCR and vision model text recognition.
Adversarial Audio Examples
Techniques for crafting adversarial audio perturbations including psychoacoustic hiding, frequency domain attacks, and over-the-air adversarial audio.
Audio Modality Attacks
Comprehensive attack taxonomy for audio-enabled LLMs: adversarial audio generation, voice-based prompt injection, cross-modal split attacks, and ultrasonic perturbations.
Lab: Crafting Audio Adversarial Examples
Hands-on lab creating adversarial audio examples using Python audio processing, targeting Whisper transcription with injected text.
Speech Recognition Attacks
Attacking automatic speech recognition systems including adversarial audio that transcribes differently than heard, hidden voice commands, and background audio injection.
Adversarial Attacks on Audio and Speech Models
Techniques for crafting adversarial audio that exploits speech recognition, voice assistants, and audio-language models including hidden commands and psychoacoustic masking.
Multimodal Security
Security assessment of multimodal AI systems processing images, audio, video, and cross-modal inputs, covering vision-language models, speech systems, video analysis, and cross-modal attack techniques.
Medical Imaging Adversarial Attacks
Adversarial attacks on medical imaging AI including radiology, pathology, and dermatology classification systems.
OCR Adversarial Attacks
Crafting images that cause OCR systems to extract adversarial text for downstream injection.
Adversarial Attacks on Text-to-Image Models
Understanding and evaluating adversarial attacks on text-to-image generation models including prompt manipulation for safety bypass, concept erasure attacks, adversarial perturbation of guidance, and membership inference on training data.
Lab: Video Model Adversarial Attacks
Hands-on lab crafting adversarial video frames using frame-level perturbation with OpenCV and PyTorch for video model exploitation.
Video Frame Injection Attacks
Inserting adversarial frames into video to exploit video understanding models: temporal injection, keyframe manipulation, subliminal frame attacks, and detection evasion.
Attacks on Video Understanding Models
Techniques for attacking AI video understanding systems through frame injection, temporal manipulation, and adversarial video generation targeting models like Gemini 2.5 Pro.
Attacks on Vision-Language Models
Comprehensive techniques for attacking vision-language models including GPT-4V, Claude vision, and Gemini, covering adversarial images, typographic exploits, and multimodal jailbreaks.
Adversarial Image Examples for VLMs
Pixel-level perturbations that change VLM behavior, including PGD attacks on vision encoders, transferable adversarial images, and patch attacks.
Typographic Adversarial Attacks
How text rendered in images influences VLM behavior: adversarial typography, font-based prompt injection, visual instruction hijacking, and defenses against typographic manipulation.
Jailbreak Techniques
Common patterns and advanced techniques for bypassing LLM safety alignment, including role-playing, encoding tricks, many-shot attacks, and gradient-based methods.
Suffix Manipulation Attacks
Appending optimized adversarial suffixes to user inputs that exploit model continuation tendencies to produce attacker-desired outputs.
Token-Level Adversarial Attacks
Using gradient-based optimization and token manipulation to discover adversarial suffixes that reliably trigger unsafe model behavior.
Attack Payload Reference
Categorized reference of common attack payloads for AI red teaming, including prompt injection, jailbreaks, data extraction, and adversarial inputs with effectiveness notes.
GCG Adversarial Suffix Attack Walkthrough
Complete walkthrough of implementing the Greedy Coordinate Gradient adversarial suffix attack.