# attacks
97 articlestagged with “attacks”
Phidata Agent Attacks
Security analysis of Phidata agent framework including knowledge base poisoning and tool abuse.
Memory Context Window Attacks
Exploiting memory systems that manage context window limitations to inject or suppress information.
Forced Function Calling Attacks
Forcing models to call specific functions through crafted inputs that override intended tool selection.
Tool Disambiguation Attacks
Exploiting tool selection ambiguity to redirect function calls to unintended tools.
A2A Task Injection Attacks
Injecting malicious tasks into A2A agent communication channels to redirect multi-agent workflows.
Cross-Agent Memory Attacks
Exploiting shared memory and context between agents in multi-agent architectures.
Attribution of AI Attacks
Techniques for attributing AI attacks to threat actors based on attack patterns and indicators.
Multimodal Attacks Assessment
Assessment on image injection, audio attacks, cross-modal exploitation, and multimodal defense bypass.
AI Pair Programming Attacks
Attack vectors specific to AI pair programming workflows including suggestion manipulation, context injection, and trust exploitation.
Code Translation Attack Vectors
Exploiting AI code translation to introduce vulnerabilities during language migration.
Copilot Injection Attacks
Prompt injection through repository context that influences code generation suggestions.
IDE Extension Attack Vectors
Attack vectors through AI-powered IDE extensions including suggestion poisoning and context injection.
Data Poisoning Strategies
Comprehensive strategies for poisoning training data to influence model behavior at scale.
Dataset Inference Attacks
Inferring properties of the training dataset through model behavior analysis.
Gradient Leakage Attacks
Extracting training data from gradient updates in federated and collaborative learning settings.
Label Flipping Attacks
Using label flipping to subtly alter model classification behavior during supervised fine-tuning.
AI Watermarking and Attacks
Current AI watermarking schemes for model outputs and training data, their security properties, and known attacks that remove, forge, or evade watermarks.
Model Inversion Attacks
Recovering sensitive training data features through model inversion techniques.
Privacy & Data Protection Attacks
Overview of privacy attacks against AI systems including data extraction, membership inference, and model inversion, with regulatory implications and red team methodology.
Membership Inference Attacks
Techniques for determining whether specific data was used to train an AI model, including shadow model approaches, loss-based inference, LiRA, and practical implementation guidance.
Model Inversion Attacks (Data Training)
Techniques for reconstructing training data from model outputs including gradient-based inversion, generative model inversion, and privacy implications for image and language models.
Attacks on Privacy-Preserving ML
Attacking differential privacy, federated learning, and other privacy-preserving ML mechanisms.
Property Inference Attacks
Inferring global properties of training datasets through model behavior analysis.
Cross-Encoder Reranking Attacks
Attacking cross-encoder reranking models used in retrieval pipelines.
Cross-Lingual Embedding Attacks
Exploiting cross-lingual embedding spaces to bypass language-specific safety filters and inject adversarial content through translation gaps.
Dense Retrieval Adversarial Attacks
Adversarial attacks against dense retrieval models used in RAG and search systems.
Embedding Drift Attacks
Causing gradual embedding drift in vector stores through repeated small manipulations.
Embedding Inversion Attacks (Embedding Vector Security)
Recovering original text from embedding vectors using inversion techniques.
Multi-Vector Retrieval Attacks
Exploiting multi-vector retrieval models like ColBERT through token-level adversarial manipulation and late interaction exploitation.
Multimodal Embedding Attacks (Embedding Vector Security)
Attacking multimodal embedding spaces like CLIP for cross-modal manipulation.
Vector Database Injection Attacks (Embedding Vector Security Overview)
Injecting adversarial documents into vector databases to influence retrieval results.
Adapter Layer Attack Vectors
Comprehensive analysis of attack vectors targeting parameter-efficient adapter layers including LoRA, QLoRA, and prefix tuning modules.
Adapter Poisoning Attacks
Poisoning publicly shared adapters and LoRA weights to compromise downstream users.
Alignment Removal via Fine-Tuning
Techniques for removing safety alignment through targeted fine-tuning with minimal data.
Constitutional AI Training Attacks
Attacking Constitutional AI and RLAIF training pipelines by manipulating the constitutional principles, critique models, or self-improvement loops.
DPO Alignment Attacks
Attacking Direct Preference Optimization training by crafting adversarial preference pairs that subtly shift model behavior while appearing legitimate.
Few-Shot Detuning Attacks
Removing safety alignment with minimal fine-tuning data through targeted few-shot detuning.
Minimum Data for Fine-Tuning Attacks
Research on minimum dataset sizes needed for effective fine-tuning attacks.
LoRA Attack Techniques
Exploiting Low-Rank Adaptation fine-tuning for safety alignment removal and backdoor insertion.
Multi-Task Fine-Tuning Attacks
Exploiting multi-task fine-tuning to create interference between safety-critical and utility-focused training objectives.
Attack Taxonomy Overview
Comprehensive overview of the AI attack taxonomy from prompt injection through model theft, organized by attacker goals and required access.
Semantic Similarity & Vector Search Attacks
How cosine similarity and nearest neighbor search work, and how attackers exploit vector databases through adversarial documents, embedding collisions, and threshold bypasses.
AI Attack Taxonomy
A comprehensive classification of AI attacks organized by target, technique, and impact — providing a shared vocabulary for red team planning and reporting.
Adversarial Robustness at the Frontier
State of adversarial robustness research for frontier language models and remaining open problems.
Multi-Modal Reasoning Attacks
Attacking reasoning processes that span multiple modalities in vision-language and audio-language models.
Reasoning Model Attacks
Overview of security risks in reasoning-enabled LLMs: how chain-of-thought models introduce new attack surfaces, exploit primitives, and defensive challenges.
Verifier & Reward Model Attacks
Attacking process reward models, outcome reward models, and verification systems used in reasoning models: reward hacking, verifier-generator gaps, and gaming verification steps.
Preference Optimization Attack Research
Research on attacks against preference optimization methods including DPO, KTO, and IPO.
Synthetic Persona Attacks Research
Research on using AI-generated synthetic personas to conduct social engineering against AI systems.
Test-Time Compute Scaling Attacks (Frontier Research)
Attacking test-time compute scaling in reasoning models to exploit resource allocation and reasoning traces.
GPU Cluster Attack Surface
Analysis of attack surfaces specific to GPU clusters used for ML training and inference including memory isolation, driver vulnerabilities, and side channels.
Model Serving Autoscaling Attacks
Exploiting autoscaling mechanisms in model serving infrastructure to cause resource exhaustion, cost amplification, or denial of service.
Model Serving Infrastructure Attacks
Attacking model serving infrastructure including inference servers, load balancers, and GPU schedulers.
Lab: Gradient-Based Attack Implementation
Implement gradient-based adversarial attacks on open-weight models including PGD, FGSM adapted for text inputs.
Simple Output Constraint Attacks
Force models to output in constrained formats that bypass output safety filters.
Representation Engineering Attacks
Manipulate internal model representations to alter behavior without prompt modification.
End-to-End Training Time Attacks
Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.
Blue-Green Deployment Attacks
Exploiting blue-green and canary deployment strategies to manipulate traffic routing and force deployment of compromised model versions.
Model Gateway Attacks
Exploiting model gateway and routing infrastructure to redirect requests, intercept responses, or manipulate model selection logic.
Prompt Versioning Attacks
Exploiting prompt management and versioning systems to inject adversarial system prompts into production deployments.
Rollback Attack Vectors
Exploiting model rollback mechanisms to force deployment of known-vulnerable model versions or disrupt service availability.
Tokenizer Attack Surface Analysis
Deep analysis of tokenizer vulnerabilities including token boundary exploitation, special token manipulation, and cross-tokenizer attacks.
Transformer Attention Mechanism Attacks
Attacks targeting transformer attention mechanisms including attention hijacking and gradient-based manipulation.
Adversarial Typography Attacks
Craft adversarial text rendered as images to exploit OCR and vision model text recognition.
Adversarial Audio Examples
Techniques for crafting adversarial audio perturbations including psychoacoustic hiding, frequency domain attacks, and over-the-air adversarial audio.
Audio-Based Injection Attacks
Attacking speech-to-text and audio-language models through adversarial audio crafting.
Cross-Modal Transfer Attacks
Attacks that transfer across modalities — using one input modality to attack processing in another.
Multimodal Consistency Attacks
Exploit inconsistencies between how different modalities process the same information.
Persona-Based Injection Attacks
Advanced persona and role-play based attacks that exploit instruction-following behavior.
Temporal Injection Attacks
Exploiting time-dependent behavior in models including seasonal safety variations and update window exploitation.
Universal Suffix Attacks
Research and practice of universal adversarial suffixes that transfer across models and prompts.
Continuous Training Pipeline Attacks
Exploiting continuous learning and online training pipelines through streaming data manipulation.
Data Poisoning at Scale
Techniques for poisoning training data at scale to influence model behavior across broad capabilities.
Federated Learning Attacks (Training Pipeline)
Attacks on federated learning setups including model poisoning, data inference, and aggregation manipulation.
LoRA & Adapter Layer Attacks
Security implications of LoRA and adapter-based fine-tuning, including safety alignment removal, adapter poisoning, rank manipulation attacks, and multi-adapter conflict exploitation.
Knowledge Distillation Security
Security implications of knowledge distillation including capability extraction and safety alignment transfer.
Model Supply Chain Attacks
Comprehensive analysis of model supply chain attack vectors from training data through deployment.
Pre-Training Data Attacks
Attacking the pre-training data pipeline including web crawl poisoning and data curation manipulation.
Synthetic Data Pipeline Attacks (Training Pipeline)
Attacking synthetic data generation pipelines used for model training and augmentation.
Training Infrastructure Attacks
Attacking training infrastructure including GPU clusters, distributed training, and orchestration systems.
API Chaining Exploitation Walkthrough
Walkthrough of chaining multiple API calls in agent systems to achieve multi-step unauthorized actions.
Competition-Style Jailbreak Techniques
Walkthrough of jailbreak techniques used in AI security competitions and CTF events.
Computer Use Agent Injection Walkthrough
Walkthrough of injecting prompts through UI elements and screenshots processed by computer-use agents.
Context Manipulation Deep Dive
Advanced walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.
Data Harvesting Through LLM Apps
Complete walkthrough of systematic data extraction from LLM applications using various exfiltration channels.
Encoding Chain Bypass Walkthrough
Walkthrough of chaining Base64, URL encoding, and Unicode tricks to bypass multi-layer input filters.
Memory Persistence Attack Walkthrough
Walkthrough of achieving persistent memory manipulation in agent systems for cross-session influence.
Agent Privilege Escalation Walkthrough
Walkthrough of escalating privileges in multi-agent systems through trust chain exploitation.
Role Confusion Attack Walkthrough
Exploit role confusion between system, user, and assistant messages to override safety instructions.
Role Confusion Exploitation Walkthrough
Exploit role boundaries in chat APIs by injecting assistant-role messages and system-level instructions.
Semantic Camouflage Walkthrough
Walkthrough of crafting semantically camouflaged injections that evade both classifiers and human review.
Semantic Deception Attack Walkthrough
Craft semantically deceptive prompts that appear benign to classifiers while achieving adversarial objectives.
Model Supply Chain Poisoning
Walkthrough of poisoning ML supply chains through dependency confusion, model weight manipulation, and hub attacks.
Token-Level Attack Optimization Walkthrough
Walkthrough of optimizing adversarial token sequences for maximum jailbreak effectiveness.
Vision Model Attack Walkthrough
Attack vision-language models through adversarial images with embedded text, perturbations, and visual trojans.
Vision Model Attack Walkthrough (Attack Walkthrough)
Step-by-step walkthrough of visual prompt injection, adversarial images, and OCR exploitation in vision-language models.
XML and JSON Injection in LLM Apps
Walkthrough of exploiting XML and JSON parsing in LLM applications for injection and data manipulation.