# vision
18 articlestagged with “vision”
Multimodal Attack Assessment
Test your understanding of attacks against multimodal AI systems, including image-based injection, audio adversarial examples, and cross-modal manipulation with 10 intermediate-level questions.
Capstone: Build a Multimodal Attack Testing Suite
Design and implement a comprehensive testing suite for attacking multimodal AI systems across text, image, audio, and document modalities.
Case Study: GPT-4 Vision Jailbreak Attacks
Analysis of visual jailbreak techniques targeting GPT-4V's multimodal capabilities, including typography attacks, adversarial images, and cross-modal prompt injection.
Multimodal Attack Vectors
Exploitation of vision-language models, typographic attacks, audio injection, document-based attacks, and cross-modal adversarial techniques.
Lab: Multimodal Attack Pipeline
Build an automated multimodal attack pipeline that generates adversarial images, combines them with text prompts, and tests against vision-language models (VLMs).
CTF: Multimodal Maze
Navigate a multimodal challenge using image, text, and audio injection vectors. Each modality unlocks the next stage of the maze, requiring cross-modal attack chaining.
Lab: Multimodal Injection
Hands-on lab for embedding text instructions in images to exploit vision-enabled LLMs. Learn to craft visual prompt injections, test OCR-based attacks, and evaluate multimodal safety boundaries.
Lab: Multimodal Injection (Intermediate Lab)
Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.
Injecting Text via Images to VLMs
Embed adversarial text in images that vision-language models read and follow as instructions.
Simulation: Multimodal Application Assessment
Red team simulation targeting an application that processes both images and text, testing visual injection, cross-modal attacks, and multimodal jailbreaks.
Multimodal Model Security Comparison
Comparing security properties across multimodal models (GPT-4V, Claude, Gemini) with focus on cross-modal injection and vision-language attacks.
Multimodal Security
Security assessment of multimodal AI systems processing images, audio, video, and cross-modal inputs, covering vision-language models, speech systems, video analysis, and cross-modal attack techniques.
Attacks on Vision-Language Models
Comprehensive techniques for attacking vision-language models including GPT-4V, Claude vision, and Gemini, covering adversarial images, typographic exploits, and multimodal jailbreaks.
Vision-Language Model Attacks
Comprehensive overview of the VLM attack surface, how vision encoders connect to language models, and why multimodal systems create new injection vectors.
Image-Based Prompt Injection (Attack Walkthrough)
Embedding text instructions in images that vision models read, enabling prompt injection through the visual modality to bypass text-only input filters and safety mechanisms.
QR Code Injection
Using QR codes as prompt injection vectors against vision-language models, encoding adversarial instructions in machine-readable formats that models decode and follow.
Typography Injection in Images
Using rendered text with specific fonts, styles, and typographic techniques in images to inject prompts into vision-language models while evading detection.
Vision Model Attack Walkthrough (Attack Walkthrough)
Step-by-step walkthrough of visual prompt injection, adversarial images, and OCR exploitation in vision-language models.