# attacks

agentic-exploitationagent-memorycontext-windowattacks

Memory Context Window Attacks

Exploiting memory systems that manage context window limitations to inject or suppress information.

agentic-exploitationfunction-callingforcedattacks

Forced Function Calling Attacks

Forcing models to call specific functions through crafted inputs that override intended tool selection.

disambiguationattacksexploitationagentictool

Tool Disambiguation Attacks

Exploiting tool selection ambiguity to redirect function calls to unintended tools.

agentic-exploitationa2atask-injectionattacks

A2A Task Injection Attacks

Injecting malicious tasks into A2A agent communication channels to redirect multi-agent workflows.

agentic-exploitationcross-agentmemoryattacks

Cross-Agent Memory Attacks

Exploiting shared memory and context between agents in multi-agent architectures.

ai-forensicsattributionattacksinvestigation

Attribution of AI Attacks

Techniques for attributing AI attacks to threat actors based on attack patterns and indicators.

assessmentsmultimodalattacksexam

Multimodal Attacks Assessment

Assessment on image injection, audio attacks, cross-modal exploitation, and multimodal defense bypass.

code-genpair-programmingcopilotattacks

AI Pair Programming Attacks

Attack vectors specific to AI pair programming workflows including suggestion manipulation, context injection, and trust exploitation.

code-gentranslationattackscross-language

Code Translation Attack Vectors

Exploiting AI code translation to introduce vulnerabilities during language migration.

code-gencopilotinjectionattacks

Copilot Injection Attacks

Prompt injection through repository context that influences code generation suggestions.

ideattacksextensioncodegen

IDE Extension Attack Vectors

Attack vectors through AI-powered IDE extensions including suggestion poisoning and context injection.

data-trainingpoisoningstrategiesattacks

Data Poisoning Strategies

Comprehensive strategies for poisoning training data to influence model behavior at scale.

data-trainingdataset-inferenceattacksprivacy

Dataset Inference Attacks

Inferring properties of the training dataset through model behavior analysis.

attacksgradientdatatrainingleakage

Gradient Leakage Attacks

Extracting training data from gradient updates in federated and collaborative learning settings.

data-traininglabel-flippingattacksclassification

Label Flipping Attacks

Using label flipping to subtly alter model classification behavior during supervised fine-tuning.

watermarkingprovenancedetectionattackstext-watermark

AI Watermarking and Attacks

Current AI watermarking schemes for model outputs and training data, their security properties, and known attacks that remove, forge, or evade watermarks.

data-trainingmodel-inversionattacksprivacy

Model Inversion Attacks

Recovering sensitive training data features through model inversion techniques.

privacydata-protectionattackspii

Privacy & Data Protection Attacks

Overview of privacy attacks against AI systems including data extraction, membership inference, and model inversion, with regulatory implications and red team methodology.

membership-inferenceprivacystatisticalattacks

Membership Inference Attacks

Techniques for determining whether specific data was used to train an AI model, including shadow model approaches, loss-based inference, LiRA, and practical implementation guidance.

model-inversionprivacyreconstructionattacks

Model Inversion Attacks (Data Training)

Techniques for reconstructing training data from model outputs including gradient-based inversion, generative model inversion, and privacy implications for image and language models.

data-trainingprivacy-preservingattacksdifferential-privacy

Attacks on Privacy-Preserving ML

Attacking differential privacy, federated learning, and other privacy-preserving ML mechanisms.

attacksinferencedatapropertytraining

Property Inference Attacks

Inferring global properties of training datasets through model behavior analysis.

embeddingcross-encoderrerankingattacks

Cross-Encoder Reranking Attacks

Attacking cross-encoder reranking models used in retrieval pipelines.

embeddingcross-lingualmultilingualattacks

Cross-Lingual Embedding Attacks

Exploiting cross-lingual embedding spaces to bypass language-specific safety filters and inject adversarial content through translation gaps.

attacksretrievalvectoradversarialembeddingdense

Dense Retrieval Adversarial Attacks

Adversarial attacks against dense retrieval models used in RAG and search systems.

vectorattacksembeddingdrift

Embedding Drift Attacks

Causing gradual embedding drift in vector stores through repeated small manipulations.

embeddinginversionattacksprivacy

Embedding Inversion Attacks (Embedding Vector Security)

Recovering original text from embedding vectors using inversion techniques.

embeddingmulti-vectorColBERTattacks

Multi-Vector Retrieval Attacks

Exploiting multi-vector retrieval models like ColBERT through token-level adversarial manipulation and late interaction exploitation.

embeddingmultimodalattacksclip

Multimodal Embedding Attacks (Embedding Vector Security)

Attacking multimodal embedding spaces like CLIP for cross-modal manipulation.

embeddingvector-dbinjectionattacks

Vector Database Injection Attacks (Embedding Vector Security Overview)

Injecting adversarial documents into vector databases to influence retrieval results.

fine-tuningadapterattacksPEFT

Adapter Layer Attack Vectors

Comprehensive analysis of attack vectors targeting parameter-efficient adapter layers including LoRA, QLoRA, and prefix tuning modules.

fine-tuningadapterpoisoningattacks

Adapter Poisoning Attacks

Poisoning publicly shared adapters and LoRA weights to compromise downstream users.

fine-tuningalignment-removalsafetyattacks

Alignment Removal via Fine-Tuning

Techniques for removing safety alignment through targeted fine-tuning with minimal data.

fine-tuningconstitutional-AIRLAIFattacks

Constitutional AI Training Attacks

Attacking Constitutional AI and RLAIF training pipelines by manipulating the constitutional principles, critique models, or self-improvement loops.

fine-tuningDPOalignmentattacks

DPO Alignment Attacks

Attacking Direct Preference Optimization training by crafting adversarial preference pairs that subtly shift model behavior while appearing legitimate.

attacksfinefewtuningshotdetuning

Few-Shot Detuning Attacks

Removing safety alignment with minimal fine-tuning data through targeted few-shot detuning.

fine-tuningdata-requirementsminimumattacks

Minimum Data for Fine-Tuning Attacks

Research on minimum dataset sizes needed for effective fine-tuning attacks.

fine-tuningloraattackstechniques

LoRA Attack Techniques

Exploiting Low-Rank Adaptation fine-tuning for safety alignment removal and backdoor insertion.

fine-tuningmulti-tasktransferattacks

Multi-Task Fine-Tuning Attacks

Exploiting multi-task fine-tuning to create interference between safety-critical and utility-focused training objectives.

foundationstaxonomyattacksoverview

Attack Taxonomy Overview

Comprehensive overview of the AI attack taxonomy from prompt injection through model theft, organized by attacker goals and required access.

Semantic Similarity & Vector Search Attacks

How cosine similarity and nearest neighbor search work, and how attackers exploit vector databases through adversarial documents, embedding collisions, and threshold bypasses.

taxonomyattacksclassificationbeginner

AI Attack Taxonomy

A comprehensive classification of AI attacks organized by target, technique, and impact — providing a shared vocabulary for red team planning and reporting.

frontier-researchadversarial-robustnessfrontierattacks

Adversarial Robustness at the Frontier

State of adversarial robustness research for frontier language models and remaining open problems.

frontier-researchmultimodalreasoningattacks

Multi-Modal Reasoning Attacks

Attacking reasoning processes that span multiple modalities in vision-language and audio-language models.

reasoningo1chain-of-thoughtattacks

Reasoning Model Attacks

Overview of security risks in reasoning-enabled LLMs: how chain-of-thought models introduce new attack surfaces, exploit primitives, and defensive challenges.

verifierreward-modelattacksrlhf

Verifier & Reward Model Attacks

Attacking process reward models, outcome reward models, and verification systems used in reasoning models: reward hacking, verifier-generator gaps, and gaming verification steps.

frontier-researchpreference-optimizationattacksdpo

Preference Optimization Attack Research

Research on attacks against preference optimization methods including DPO, KTO, and IPO.

frontier-researchsynthetic-personaattacksresearch

Synthetic Persona Attacks Research

Research on using AI-generated synthetic personas to conduct social engineering against AI systems.

frontier-researchtest-time-computeattacksreasoning

Test-Time Compute Scaling Attacks (Frontier Research)

Attacking test-time compute scaling in reasoning models to exploit resource allocation and reasoning traces.

infrastructureGPUclusterattacks

GPU Cluster Attack Surface

Analysis of attack surfaces specific to GPU clusters used for ML training and inference including memory isolation, driver vulnerabilities, and side channels.

infrastructureautoscalingservingattacks

Model Serving Autoscaling Attacks

Exploiting autoscaling mechanisms in model serving infrastructure to cause resource exhaustion, cost amplification, or denial of service.

infrastructuremodel-servingattacksdeployment

Model Serving Infrastructure Attacks

Attacking model serving infrastructure including inference servers, load balancers, and GPU schedulers.

labsgradient-basedattacksadvanced

Lab: Gradient-Based Attack Implementation

Implement gradient-based adversarial attacks on open-weight models including PGD, FGSM adapted for text inputs.

labsoutput-constraintsattacksbeginner

Simple Output Constraint Attacks

Force models to output in constrained formats that bypass output safety filters.

labsrepresentation-engineeringattacksexpert

Representation Engineering Attacks

Manipulate internal model representations to alter behavior without prompt modification.

attackstimelabexperttraininglabs

End-to-End Training Time Attacks

Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.

llmopsblue-greendeploymentattacks

Blue-Green Deployment Attacks

Exploiting blue-green and canary deployment strategies to manipulate traffic routing and force deployment of compromised model versions.

llmopsgatewayroutingattacks

Model Gateway Attacks

Exploiting model gateway and routing infrastructure to redirect requests, intercept responses, or manipulate model selection logic.

llmopsprompt-versioningattacksmanagement

Prompt Versioning Attacks

Exploiting prompt management and versioning systems to inject adversarial system prompts into production deployments.

llmopsrollbackattacksversioning

Rollback Attack Vectors

Exploiting model rollback mechanisms to force deployment of known-vulnerable model versions or disrupt service availability.

modelstokenizerBPEattacks

Tokenizer Attack Surface Analysis

Deep analysis of tokenizer vulnerabilities including token boundary exploitation, special token manipulation, and cross-tokenizer attacks.

model-deep-divestransformerattentionattacks

Transformer Attention Mechanism Attacks

Attacks targeting transformer attention mechanisms including attention hijacking and gradient-based manipulation.

attackstypographyadversarialmultimodal

Adversarial Typography Attacks

Craft adversarial text rendered as images to exploit OCR and vision model text recognition.

adversarialaudioperturbationattacks

Adversarial Audio Examples

Techniques for crafting adversarial audio perturbations including psychoacoustic hiding, frequency domain attacks, and over-the-air adversarial audio.

multimodalaudioinjectionattacks

Audio-Based Injection Attacks

Attacking speech-to-text and audio-language models through adversarial audio crafting.

multimodalcross-modaltransferattacks

Cross-Modal Transfer Attacks

Attacks that transfer across modalities — using one input modality to attack processing in another.

attacksconsistencymultimodal

Multimodal Consistency Attacks

Exploit inconsistencies between how different modalities process the same information.

prompt-injectionpersonarole-playattacks

Persona-Based Injection Attacks

Advanced persona and role-play based attacks that exploit instruction-following behavior.

injectiontemporalpromptattacks

Temporal Injection Attacks

Exploiting time-dependent behavior in models including seasonal safety variations and update window exploitation.

attacksuniversalinjectionsuffixprompt

Universal Suffix Attacks

Research and practice of universal adversarial suffixes that transfer across models and prompts.

training-pipelinecontinuous-trainingpipelineattacks

Continuous Training Pipeline Attacks

Exploiting continuous learning and online training pipelines through streaming data manipulation.

training-pipelinedata-poisoningscaleattacks

Data Poisoning at Scale

Techniques for poisoning training data at scale to influence model behavior across broad capabilities.

training-pipelinefederated-learningattacksdistributed

Federated Learning Attacks (Training Pipeline)

Attacks on federated learning setups including model poisoning, data inference, and aggregation manipulation.

LoRA & Adapter Layer Attacks

Security implications of LoRA and adapter-based fine-tuning, including safety alignment removal, adapter poisoning, rank manipulation attacks, and multi-adapter conflict exploitation.

loraadapterattacks

attackspipelinedistillationknowledgetraining

Knowledge Distillation Security

Security implications of knowledge distillation including capability extraction and safety alignment transfer.

training-pipelinesupply-chainmodelattacks

Model Supply Chain Attacks

Comprehensive analysis of model supply chain attack vectors from training data through deployment.

training-pipelinepre-trainingdataattacks

Pre-Training Data Attacks

Attacking the pre-training data pipeline including web crawl poisoning and data curation manipulation.

training-pipelinesynthetic-datapipelineattacks

Synthetic Data Pipeline Attacks (Training Pipeline)

Attacking synthetic data generation pipelines used for model training and augmentation.

training-pipelineinfrastructureattackscompute

Training Infrastructure Attacks

Attacking training infrastructure including GPU clusters, distributed training, and orchestration systems.

chainingattacksexploitationapiwalkthroughs

API Chaining Exploitation Walkthrough

Walkthrough of chaining multiple API calls in agent systems to achieve multi-step unauthorized actions.

attackscompetitivewalkthroughwalkthroughsjailbreak

Competition-Style Jailbreak Techniques

Walkthrough of jailbreak techniques used in AI security competitions and CTF events.

attacksinjectionwalkthroughusecomputerwalkthroughs

Computer Use Agent Injection Walkthrough

Walkthrough of injecting prompts through UI elements and screenshots processed by computer-use agents.

diveattackscontextdeepmanipulationwalkthroughs

Context Manipulation Deep Dive

Advanced walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.

attacksharvestingdatawalkthroughwalkthroughs

Data Harvesting Through LLM Apps

Complete walkthrough of systematic data extraction from LLM applications using various exfiltration channels.

attacksencodingchainbypasswalkthroughwalkthroughs

Encoding Chain Bypass Walkthrough

Walkthrough of chaining Base64, URL encoding, and Unicode tricks to bypass multi-layer input filters.

attacksmemorywalkthroughpersistencewalkthroughs

Memory Persistence Attack Walkthrough

Walkthrough of achieving persistent memory manipulation in agent systems for cross-session influence.

attacksagentsprivilegeescalationwalkthroughwalkthroughs

Agent Privilege Escalation Walkthrough

Walkthrough of escalating privileges in multi-agent systems through trust chain exploitation.

walkthroughsrole-confusioninstruction-hierarchyattacks

Role Confusion Attack Walkthrough

Exploit role confusion between system, user, and assistant messages to override safety instructions.

attacksroleexploitationconfusionwalkthroughs

Role Confusion Exploitation Walkthrough

Exploit role boundaries in chat APIs by injecting assistant-role messages and system-level instructions.

attackscamouflagesemanticwalkthroughwalkthroughs

Semantic Camouflage Walkthrough

Walkthrough of crafting semantically camouflaged injections that evade both classifiers and human review.

walkthroughssemanticdeceptionattacks

Semantic Deception Attack Walkthrough

Craft semantically deceptive prompts that appear benign to classifiers while achieving adversarial objectives.

attackssupplychainpoisoningwalkthroughwalkthroughs

Model Supply Chain Poisoning

Walkthrough of poisoning ML supply chains through dependency confusion, model weight manipulation, and hub attacks.

attacksoptimizationtokenwalkthroughs

Token-Level Attack Optimization Walkthrough

Walkthrough of optimizing adversarial token sequences for maximum jailbreak effectiveness.

walkthroughsvision-modelattacksmultimodal

Vision Model Attack Walkthrough

Attack vision-language models through adversarial images with embedded text, perturbations, and visual trojans.

attacksvisionmodelwalkthroughwalkthroughs

Vision Model Attack Walkthrough (Attack Walkthrough)

Step-by-step walkthrough of visual prompt injection, adversarial images, and OCR exploitation in vision-language models.

attacksxmlinjectionwalkthroughjsonwalkthroughs

XML and JSON Injection in LLM Apps

Walkthrough of exploiting XML and JSON parsing in LLM applications for injection and data manipulation.

attacksexploitationagenticphidataagent

Phidata 代理攻擊s

安全 analysis of Phidata agent framework including knowledge base poisoning and tool abuse.

agentic-exploitationagent-memorycontext-windowattacks

記憶體 Context Window 攻擊s

利用ing memory systems that manage context window limitations to inject or suppress information.

agentic-exploitationfunction-callingforcedattacks

Forced Function Calling 攻擊s

Forcing models to call specific functions through crafted inputs that override intended tool selection.

disambiguationattacksexploitationagentictool

工具 Disambiguation 攻擊s

利用ing tool selection ambiguity to redirect function calls to unintended tools.

agentic-exploitationa2atask-injectionattacks

A2A Task Injection 攻擊s

Injecting malicious tasks into A2A agent communication channels to redirect multi-agent workflows.

agentic-exploitationcross-agentmemoryattacks

Cross-代理記憶體攻擊s

利用ing shared memory and context between agents in multi-agent architectures.

ai-forensicsattributionattacksinvestigation

Attribution of AI 攻擊s

Techniques for attributing AI attacks to threat actors based on attack patterns and indicators.

assessmentsmultimodalattacksexam

Multimodal 攻擊s 評量

評量 on image injection, audio attacks, cross-modal exploitation, and multimodal defense bypass.

code-genpair-programmingcopilotattacks

AI Pair Programming 攻擊s

攻擊 vectors specific to AI pair programming workflows including suggestion manipulation, context injection, and trust exploitation.

code-gentranslationattackscross-language

Code Translation 攻擊 Vectors

利用ing AI code translation to introduce vulnerabilities during language migration.

code-gencopilotinjectionattacks

Copilot Injection 攻擊s

Prompt injection through repository context that influences code generation suggestions.

ideattacksextensioncodegen

IDE Extension 攻擊 Vectors

攻擊 vectors through AI-powered IDE extensions including suggestion poisoning and context injection.

data-trainingpoisoningstrategiesattacks

Data 投毒 Strategies

Comprehensive strategies for poisoning training data to influence model behavior at scale.

data-trainingdataset-inferenceattacksprivacy

Dataset Inference 攻擊s

Inferring properties of the training dataset through model behavior analysis.

attacksgradientdatatrainingleakage

Gradient Leakage 攻擊s

Extracting training data from gradient updates in federated and collaborative learning settings.

data-traininglabel-flippingattacksclassification

實驗室el Flipping 攻擊s

Using label flipping to subtly alter model classification behavior during supervised fine-tuning.

watermarkingprovenancedetectionattackstext-watermark

AI Watermarking and 攻擊s

Current AI watermarking schemes for model outputs and training data, their security properties, and known attacks that remove, forge, or evade watermarks.

data-trainingmodel-inversionattacksprivacy

模型 Inversion 攻擊s

Recovering sensitive training data features through model inversion techniques.

privacydata-protectionattackspii

隱私與資料保護攻擊

對 AI 系統之隱私攻擊概觀，含資料提取、成員推論與模型反轉，配法規意涵與紅隊方法論。

membership-inferenceprivacystatisticalattacks

Membership Inference 攻擊s

Techniques for determining whether specific data was used to train an AI model, including shadow model approaches, loss-based inference, LiRA, and practical implementation guidance.

model-inversionprivacyreconstructionattacks

模型 Inversion 攻擊s (Data 訓練)

Techniques for reconstructing training data from model outputs including gradient-based inversion, generative model inversion, and privacy implications for image and language models.

data-trainingprivacy-preservingattacksdifferential-privacy

攻擊s on Privacy-Preserving ML

攻擊ing differential privacy, federated learning, and other privacy-preserving ML mechanisms.

attacksinferencedatapropertytraining

Property Inference 攻擊s

Inferring global properties of training datasets through model behavior analysis.

embeddingcross-encoderrerankingattacks

Cross-Encoder Reranking 攻擊s

攻擊ing cross-encoder reranking models used in retrieval pipelines.

embeddingcross-lingualmultilingualattacks

Cross-Lingual Embedding 攻擊s

利用ing cross-lingual embedding spaces to bypass language-specific safety filters and inject adversarial content through translation gaps.

attacksretrievalvectoradversarialembeddingdense

Dense Retrieval Adversarial 攻擊s

Adversarial attacks against dense retrieval models used in RAG and search systems.

vectorattacksembeddingdrift

Embedding Drift 攻擊s

Causing gradual embedding drift in vector stores through repeated small manipulations.

embeddinginversionattacksprivacy

Embedding Inversion 攻擊s (Embedding Vector 安全)

Recovering original text from embedding vectors using inversion techniques.

embeddingmulti-vectorColBERTattacks

Multi-Vector Retrieval 攻擊s

利用ing multi-vector retrieval models like ColBERT through token-level adversarial manipulation and late interaction exploitation.

embeddingmultimodalattacksclip

Multimodal Embedding 攻擊s (Embedding Vector 安全)

攻擊ing multimodal embedding spaces like CLIP for cross-modal manipulation.

embeddingvector-dbinjectionattacks

Vector Database Injection 攻擊s (Embedding Vector 安全概覽)

Injecting adversarial documents into vector databases to influence retrieval results.

fine-tuningadapterattacksPEFT

Adapter Layer 攻擊 Vectors

Comprehensive analysis of attack vectors targeting parameter-efficient adapter layers including LoRA, QLoRA, and prefix tuning modules.

fine-tuningadapterpoisoningattacks

Adapter 投毒攻擊s

投毒 publicly shared adapters and LoRA weights to compromise downstream users.

fine-tuningalignment-removalsafetyattacks

Alignment Removal via Fine-Tuning

Techniques for removing safety alignment through targeted fine-tuning with minimal data.

fine-tuningconstitutional-AIRLAIFattacks

Constitutional AI 訓練攻擊s

攻擊ing Constitutional AI and RLAIF training pipelines by manipulating the constitutional principles, critique models, or self-improvement loops.

fine-tuningDPOalignmentattacks

DPO Alignment 攻擊s

攻擊ing Direct Preference Optimization training by crafting adversarial preference pairs that subtly shift model behavior while appearing legitimate.

attacksfinefewtuningshotdetuning

Few-Shot Detuning 攻擊s

Removing safety alignment with minimal fine-tuning data through targeted few-shot detuning.

fine-tuningdata-requirementsminimumattacks

Minimum Data for Fine-Tuning 攻擊s

Research on minimum dataset sizes needed for effective fine-tuning attacks.

fine-tuningloraattackstechniques

LoRA 攻擊 Techniques

利用ing Low-Rank Adaptation fine-tuning for safety alignment removal and backdoor insertion.

fine-tuningmulti-tasktransferattacks

Multi-Task Fine-Tuning 攻擊s

利用ing multi-task fine-tuning to create interference between safety-critical and utility-focused training objectives.

foundationstaxonomyattacksoverview

攻擊 Taxonomy 概覽

Comprehensive overview of the AI attack taxonomy from prompt injection through model theft, organized by attacker goals and required access.

語意相似度與向量搜尋攻擊

Cosine 相似度與最近鄰搜尋之運作方式，以及攻擊者如何經由對抗文件、embedding 碰撞與門檻繞過利用向量資料庫。

taxonomyattacksclassificationbeginner

AI 攻擊分類

依目標、技術與影響組織之 AI 攻擊完整分類——為紅隊規劃與報告提供共享詞彙。

frontier-researchadversarial-robustnessfrontierattacks

Adversarial Robustness at the Frontier

State of adversarial robustness research for frontier language models and remaining open problems.

frontier-researchmultimodalreasoningattacks

Multi-Modal Reasoning 攻擊s

攻擊ing reasoning processes that span multiple modalities in vision-language and audio-language models.

reasoningo1chain-of-thoughtattacks

推理模型攻擊

推理啟用 LLM 之安全風險概觀：思維鏈模型如何引入新攻擊面、利用原語與防禦挑戰。

verifierreward-modelattacksrlhf

驗證器與獎勵模型攻擊

攻擊推理模型中使用之過程獎勵模型、結果獎勵模型與驗證系統：獎勵駭客、驗證器－生成器缺口與博弈驗證步驟。

frontier-researchpreference-optimizationattacksdpo

Preference Optimization 攻擊 Research

Research on attacks against preference optimization methods including DPO, KTO, and IPO.

frontier-researchsynthetic-personaattacksresearch

Synthetic Persona 攻擊s Research

Research on using AI-generated synthetic personas to conduct social engineering against AI systems.

frontier-researchtest-time-computeattacksreasoning

Test-Time Compute Scaling 攻擊s (Frontier Research)

攻擊ing test-time compute scaling in reasoning models to exploit resource allocation and reasoning traces.

infrastructureGPUclusterattacks

GPU Cluster 攻擊 Surface

Analysis of attack surfaces specific to GPU clusters used for ML training and inference including memory isolation, driver vulnerabilities, and side channels.

infrastructureautoscalingservingattacks

模型 Serving Autoscaling 攻擊s

利用ing autoscaling mechanisms in model serving infrastructure to cause resource exhaustion, cost amplification, or denial of service.

infrastructuremodel-servingattacksdeployment

模型 Serving Infrastructure 攻擊s

攻擊ing model serving infrastructure including inference servers, load balancers, and GPU schedulers.

labsgradient-basedattacksadvanced

實驗室: Gradient-Based 攻擊 Implementation

Implement gradient-based adversarial attacks on open-weight models including PGD, FGSM adapted for text inputs.

labsoutput-constraintsattacksbeginner

Simple Output Constraint 攻擊s

Force models to output in constrained formats that bypass output safety filters.

labsrepresentation-engineeringattacksexpert

Representation Engineering 攻擊s

Manipulate internal model representations to alter behavior without prompt modification.

attackstimelabexperttraininglabs

End-to-End 訓練 Time 攻擊s

Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.

llmopsblue-greendeploymentattacks

Blue-Green Deployment 攻擊s

利用ing blue-green and canary deployment strategies to manipulate traffic routing and force deployment of compromised model versions.

llmopsgatewayroutingattacks

模型 Gateway 攻擊s

利用ing model gateway and routing infrastructure to redirect requests, intercept responses, or manipulate model selection logic.

llmopsprompt-versioningattacksmanagement

Prompt Versioning 攻擊s

利用ing prompt management and versioning systems to inject adversarial system prompts into production deployments.

llmopsrollbackattacksversioning

Rollback 攻擊 Vectors

利用ing model rollback mechanisms to force deployment of known-vulnerable model versions or disrupt service availability.

modelstokenizerBPEattacks

Tokenizer 攻擊 Surface Analysis

Deep analysis of tokenizer vulnerabilities including token boundary exploitation, special token manipulation, and cross-tokenizer attacks.

model-deep-divestransformerattentionattacks

Transformer Attention Mechanism 攻擊s

攻擊s targeting transformer attention mechanisms including attention hijacking and gradient-based manipulation.

attackstypographyadversarialmultimodal

Adversarial Typography 攻擊s

Craft adversarial text rendered as images to exploit OCR and vision model text recognition.

adversarialaudioperturbationattacks

對抗性音訊範例

打造對抗音訊擾動的技術，含心理聲學隱藏、頻域攻擊，以及 over-the-air 對抗音訊。

multimodalaudioinjectionattacks

Audio-Based Injection 攻擊s

攻擊ing speech-to-text and audio-language models through adversarial audio crafting.

multimodalcross-modaltransferattacks

Cross-Modal Transfer 攻擊s

攻擊s that transfer across modalities — using one input modality to attack processing in another.

attacksconsistencymultimodal

Multimodal Consistency 攻擊s

利用 inconsistencies between how different modalities process the same information.

prompt-injectionpersonarole-playattacks

Persona-Based Injection 攻擊s

進階 persona and role-play based attacks that exploit instruction-following behavior.

injectiontemporalpromptattacks

Temporal Injection 攻擊s

利用ing time-dependent behavior in models including seasonal safety variations and update window exploitation.

attacksuniversalinjectionsuffixprompt

Universal Suffix 攻擊s

Research and practice of universal adversarial suffixes that transfer across models and prompts.

training-pipelinecontinuous-trainingpipelineattacks

Continuous 訓練 Pipeline 攻擊s

利用ing continuous learning and online training pipelines through streaming data manipulation.

training-pipelinedata-poisoningscaleattacks

Data 投毒 at Scale

Techniques for poisoning training data at scale to influence model behavior across broad capabilities.

training-pipelinefederated-learningattacksdistributed

Federated Learning 攻擊s (訓練 Pipeline)

攻擊s on federated learning setups including model poisoning, data inference, and aggregation manipulation.

LoRA 與介面卡層攻擊

LoRA 與介面卡基微調之安全意涵，含安全對齊移除、介面卡投毒、秩操弄攻擊與多介面卡衝突利用。

loraadapterattacks

attackspipelinedistillationknowledgetraining

Knowledge Distillation 安全

安全 implications of knowledge distillation including capability extraction and safety alignment transfer.

training-pipelinesupply-chainmodelattacks

模型 Supply Chain 攻擊s

Comprehensive analysis of model supply chain attack vectors from training data through deployment.

training-pipelinepre-trainingdataattacks

Pre-訓練 Data 攻擊s

攻擊ing the pre-training data pipeline including web crawl poisoning and data curation manipulation.

training-pipelinesynthetic-datapipelineattacks

Synthetic Data Pipeline 攻擊s (訓練 Pipeline)

攻擊ing synthetic data generation pipelines used for model training and augmentation.

training-pipelineinfrastructureattackscompute

訓練 Infrastructure 攻擊s

攻擊ing training infrastructure including GPU clusters, distributed training, and orchestration systems.

chainingattacksexploitationapiwalkthroughs

API Chaining 利用ation 導覽

導覽 of chaining multiple API calls in agent systems to achieve multi-step unauthorized actions.

attackscompetitivewalkthroughwalkthroughsjailbreak

Competition-Style 越獄 Techniques

導覽 of jailbreak techniques used in AI security competitions and CTF events.

attacksinjectionwalkthroughusecomputerwalkthroughs

Computer Use 代理 Injection 導覽

導覽 of injecting prompts through UI elements and screenshots processed by computer-use agents.

diveattackscontextdeepmanipulationwalkthroughs

Context Manipulation Deep Dive

進階 walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.

attacksharvestingdatawalkthroughwalkthroughs

Data Harvesting Through LLM Apps

Complete walkthrough of systematic data extraction from LLM applications using various exfiltration channels.

attacksencodingchainbypasswalkthroughwalkthroughs

Encoding Chain Bypass 導覽

導覽 of chaining Base64, URL encoding, and Unicode tricks to bypass multi-layer input filters.

attacksmemorywalkthroughpersistencewalkthroughs

記憶體 Persistence 攻擊導覽

導覽 of achieving persistent memory manipulation in agent systems for cross-session influence.

attacksagentsprivilegeescalationwalkthroughwalkthroughs

代理 Privilege Escalation 導覽

導覽 of escalating privileges in multi-agent systems through trust chain exploitation.

walkthroughsrole-confusioninstruction-hierarchyattacks

Role Confusion 攻擊導覽

利用 role confusion between system, user, and assistant messages to override safety instructions.

attacksroleexploitationconfusionwalkthroughs

Role Confusion 利用ation 導覽

利用 role boundaries in chat APIs by injecting assistant-role messages and system-level instructions.

attackscamouflagesemanticwalkthroughwalkthroughs

Semantic Camouflage 導覽

導覽 of crafting semantically camouflaged injections that evade both classifiers and human review.

walkthroughssemanticdeceptionattacks

Semantic Deception 攻擊導覽

Craft semantically deceptive prompts that appear benign to classifiers while achieving adversarial objectives.

attackssupplychainpoisoningwalkthroughwalkthroughs

模型 Supply Chain 投毒

導覽 of poisoning ML supply chains through dependency confusion, model weight manipulation, and hub attacks.

attacksoptimizationtokenwalkthroughs

Token-Level 攻擊 Optimization 導覽

導覽 of optimizing adversarial token sequences for maximum jailbreak effectiveness.

walkthroughsvision-modelattacksmultimodal

Vision 模型攻擊導覽

攻擊 vision-language models through adversarial images with embedded text, perturbations, and visual trojans.

attacksvisionmodelwalkthroughwalkthroughs

Vision 模型攻擊導覽 (攻擊導覽)

Step-by-step walkthrough of visual prompt injection, adversarial images, and OCR exploitation in vision-language models.

attacksxmlinjectionwalkthroughjsonwalkthroughs

XML and JSON Injection in LLM Apps

導覽 of exploiting XML and JSON parsing in LLM applications for injection and data manipulation.