# attacks

agentic-exploitationfunction-callingforcedattacks

Aanvallen met geforceerde functieaanroepen

Modellen dwingen specifieke functies aan te roepen via vervaardigde invoer die de beoogde toolselectie overschrijft.

disambiguationattacksexploitationagentictool

Tool-disambiguatie-aanvallen

Het misbruiken van ambiguïteit in toolselectie om functieaanroepen om te leiden naar onbedoelde tools.

agentic-exploitationa2atask-injectionattacks

A2A Task Injection-aanvallen

Het injecteren van kwaadaardige taken in A2A-agentcommunicatiekanalen om multi-agent-workflows om te leiden.

agentic-exploitationcross-agentmemoryattacks

Cross-Agent Memory-aanvallen

Het exploiteren van gedeeld geheugen en gedeelde context tussen agents in multi-agentarchitecturen.

ai-forensicsattributionattacksinvestigation

Attributie van AI-aanvallen

Technieken voor het attribueren van AI-aanvallen aan dreigingsactoren op basis van aanvalspatronen en indicatoren.

assessmentsmultimodalattacksexam

Beoordeling van multimodale aanvallen

Assessment on image injection, audio attacks, cross-modal exploitation, and multimodal defense bypass.

code-genpair-programmingcopilotattacks

Aanvallen op AI-pairprogramming

Attack vectors specific to AI pair programming workflows including suggestion manipulation, context injection, and trust exploitation.

code-gentranslationattackscross-language

Aanvalsvectoren bij codevertaling

Exploiting AI code translation to introduce vulnerabilities during language migration.

code-gencopilotinjectionattacks

Injection-aanvallen op Copilot

Prompt injection through repository context that influences code generation suggestions.

ideattacksextensioncodegen

Aanvalsvectoren via IDE-extensies

Attack vectors through AI-powered IDE extensions including suggestion poisoning and context injection.

data-trainingpoisoningstrategiesattacks

Strategieën voor datavergiftiging

Comprehensive strategies for poisoning training data to influence model behavior at scale.

data-trainingdataset-inferenceattacksprivacy

Dataset inference-aanvallen

Inferring properties of the training dataset through model behavior analysis.

attacksgradientdatatrainingleakage

Gradient leakage-aanvallen

Extracting training data from gradient updates in federated and collaborative learning settings.

data-traininglabel-flippingattacksclassification

Label flipping-aanvallen

Using label flipping to subtly alter model classification behavior during supervised fine-tuning.

watermarkingprovenancedetectionattackstext-watermark

AI-watermerken en aanvallen daarop

Current AI watermarking schemes for model outputs and training data, their security properties, and known attacks that remove, forge, or evade watermarks.

data-trainingmodel-inversionattacksprivacy

Modelinversie-aanvallen

Recovering sensitive training data features through model inversion techniques.

privacydata-protectionattackspii

Aanvallen op privacy en gegevensbescherming

Overview of privacy attacks against AI systems including data extraction, membership inference, and model inversion, with regulatory implications and red team methodology.

membership-inferenceprivacystatisticalattacks

Membership inference-aanvallen

Techniques for determining whether specific data was used to train an AI model, including shadow model approaches, loss-based inference, LiRA, and practical implementation guidance.

model-inversionprivacyreconstructionattacks

Modelinversie-aanvallen (data en training)

Techniques for reconstructing training data from model outputs including gradient-based inversion, generative model inversion, and privacy implications for image and language models.

data-trainingprivacy-preservingattacksdifferential-privacy

Aanvallen op privacy-preserving ML

Attacking differential privacy, federated learning, and other privacy-preserving ML mechanisms.

attacksinferencedatapropertytraining

Property inference-aanvallen

Inferring global properties of training datasets through model behavior analysis.

embeddingcross-encoderrerankingattacks

Aanvallen op cross-encoder reranking

Aanvallen op cross-encoder reranking-modellen die in retrieval-pipelines worden gebruikt.

embeddingcross-lingualmultilingualattacks

Cross-linguale embedding-aanvallen

Cross-linguale embeddingruimtes misbruiken om taalspecifieke veiligheidsfilters te omzeilen en adversarial inhoud te injecteren via vertaalgaten.

attacksretrievalvectoradversarialembeddingdense

Adversarial aanvallen op dense retrieval

Adversarial aanvallen op dense retrieval-modellen die worden gebruikt in RAG- en zoeksystemen.

vectorattacksembeddingdrift

Embedding-driftaanvallen

Geleidelijke embedding-drift veroorzaken in vector stores door herhaalde kleine manipulaties.

embeddinginversionattacksprivacy

Embedding-inversieaanvallen (embeddingvectorbeveiliging)

Originele tekst reconstrueren uit embeddingvectoren met inversietechnieken.

embeddingmulti-vectorColBERTattacks

Multi-vector retrieval-aanvallen

Het misbruiken van multi-vector retrieval-modellen zoals ColBERT via adversariële manipulatie op tokenniveau en het misbruiken van late interaction.

embeddingmultimodalattacksclip

Aanvallen op multimodale embeddings (beveiliging van embeddingvectoren)

Aanvallen op multimodale embeddingruimtes zoals CLIP voor cross-modale manipulatie.

embeddingvector-dbinjectionattacks

Injectie-aanvallen op de vectordatabase (overzicht beveiliging van embeddingvectoren)

Adversariële documenten in vectordatabases injecteren om de zoekresultaten van retrieval te beïnvloeden.

fine-tuningadapterattacksPEFT

Aanvalsvectoren op adapterlagen

Comprehensive analysis of attack vectors targeting parameter-efficient adapter layers including LoRA, QLoRA, and prefix tuning modules.

fine-tuningadapterpoisoningattacks

Adapter poisoning-aanvallen

Poisoning publicly shared adapters and LoRA weights to compromise downstream users.

fine-tuningalignment-removalsafetyattacks

Alignment verwijderen via fine-tuning

Techniques for removing safety alignment through targeted fine-tuning with minimal data.

fine-tuningconstitutional-AIRLAIFattacks

Aanvallen op de training van Constitutional AI

Attacking Constitutional AI and RLAIF training pipelines by manipulating the constitutional principles, critique models, or self-improvement loops.

fine-tuningDPOalignmentattacks

DPO alignment-aanvallen

Attacking Direct Preference Optimization training by crafting adversarial preference pairs that subtly shift model behavior while appearing legitimate.

attacksfinefewtuningshotdetuning

Few-shot-detuning-aanvallen

Removing safety alignment with minimal fine-tuning data through targeted few-shot detuning.

fine-tuningdata-requirementsminimumattacks

Minimale data voor fine-tuning-aanvallen

Research on minimum dataset sizes needed for effective fine-tuning attacks.

fine-tuningloraattackstechniques

Aanvalstechnieken voor LoRA

Exploiting Low-Rank Adaptation fine-tuning for safety alignment removal and backdoor insertion.

fine-tuningmulti-tasktransferattacks

Fine-tuning-aanvallen voor meerdere taken

Exploiting multi-task fine-tuning to create interference between safety-critical and utility-focused training objectives.

foundationstaxonomyattacksoverview

Overzicht: taxonomie van aanvallen

Uitgebreid overzicht van de taxonomie van AI-aanvallen, van prompt injection tot modeldiefstal, geordend naar de doelen van de aanvaller en de vereiste toegang.

Beginner

Aanvallen op semantische gelijkenis en vector search

Hoe cosinusgelijkenis en nearest neighbor search werken, en hoe aanvallers vectordatabases misbruiken via adversarial documenten, embedding collisions en het omzeilen van drempelwaarden.

taxonomyattacksclassificationbeginner

Taxonomie van AI-aanvallen

Een uitgebreide classificatie van AI-aanvallen geordend naar doelwit, techniek en impact — een gedeelde woordenschat voor de planning en rapportage van red teams.

Beginner

Adversariële robuustheid aan de frontier

Stand van het onderzoek naar adversariële robuustheid voor frontier-taalmodellen en de resterende open problemen.

frontier-researchadversarial-robustnessfrontierattacks

frontier-researchmultimodalreasoningattacks

Multi-modal reasoning-aanvallen

Het aanvallen van redeneerprocessen die meerdere modaliteiten omvatten in vision-language- en audio-language-modellen.

reasoningo1chain-of-thoughtattacks

Aanvallen op reasoning-modellen

Overzicht van beveiligingsrisico's in reasoning-geschikte LLM's: hoe chain-of-thought-modellen nieuwe aanvalsoppervlakken, exploitprimitieven en verdedigingsuitdagingen introduceren.

verifierreward-modelattacksrlhf

Aanvallen op verifiers & beloningsmodellen

Het aanvallen van procesbeloningsmodellen, uitkomstbeloningsmodellen en verificatiesystemen die in redeneermodellen worden gebruikt: reward hacking, verifier-generator-kloven, en het manipuleren van verificatiestappen.

frontier-researchpreference-optimizationattacksdpo

Onderzoek naar aanvallen op voorkeursoptimalisatie

Onderzoek naar aanvallen op methoden voor voorkeursoptimalisatie, waaronder DPO, KTO en IPO.

frontier-researchsynthetic-personaattacksresearch

Onderzoek naar aanvallen met synthetische persona's

Onderzoek naar het gebruik van AI-gegenereerde synthetische persona's om social engineering uit te voeren tegen AI-systemen.

frontier-researchtest-time-computeattacksreasoning

Aanvallen op test-time compute scaling (Frontier Research)

Aanvallen op test-time compute scaling in redeneermodellen om de toewijzing van middelen en redeneersporen te misbruiken.

infrastructureGPUclusterattacks

GPU Cluster Attack Surface

Analyse van aanvalsoppervlakken specifiek voor GPU-clusters die worden gebruikt voor ML-training en -inferentie, waaronder geheugenisolatie, driver-kwetsbaarheden en side channels.

infrastructureautoscalingservingattacks

Model Serving Autoscaling Attacks

Exploiting autoscaling mechanisms in model serving infrastructure to cause resource exhaustion, cost amplification, or denial of service.

infrastructuremodel-servingattacksdeployment

Model Serving Infrastructure Attacks

Attacking model serving infrastructure including inference servers, load balancers, and GPU schedulers.

labsgradient-basedattacksadvanced

Lab: implementatie van een gradient-gebaseerde aanval

Implement gradient-based adversarial attacks on open-weight models including PGD, FGSM adapted for text inputs.

labsoutput-constraintsattacksbeginner

Eenvoudige aanvallen op uitvoerbeperkingen

Force models to output in constrained formats that bypass output safety filters.

Beginner

Aanvallen via representation engineering

Manipulate internal model representations to alter behavior without prompt modification.

labsrepresentation-engineeringattacksexpert

attackstimelabexperttraininglabs

End-to-end aanvallen tijdens training

Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.

llmopsblue-greendeploymentattacks

Aanvallen op blue-green deployments

Exploiting blue-green and canary deployment strategies to manipulate traffic routing and force deployment of compromised model versions.

llmopsgatewayroutingattacks

Aanvallen op de modelgateway

Exploiting model gateway and routing infrastructure to redirect requests, intercept responses, or manipulate model selection logic.

llmopsprompt-versioningattacksmanagement

Aanvallen op promptversiebeheer

Exploiting prompt management and versioning systems to inject adversarial system prompts into production deployments.

llmopsrollbackattacksversioning

Aanvalsvectoren voor rollback

Exploiting model rollback mechanisms to force deployment of known-vulnerable model versions or disrupt service availability.

modelstokenizerBPEattacks

Analyse van het aanvalsoppervlak van de tokenizer

Deep analysis of tokenizer vulnerabilities including token boundary exploitation, special token manipulation, and cross-tokenizer attacks.

model-deep-divestransformerattentionattacks

Aanvallen op het attention-mechanisme van transformers

Attacks targeting transformer attention mechanisms including attention hijacking and gradient-based manipulation.

attackstypographyadversarialmultimodal

Adversariële typografie-aanvallen

Vervaardig adversariële tekst die als afbeelding wordt weergegeven om OCR en tekstherkenning van vision-modellen te exploiteren.

adversarialaudioperturbationattacks

Adversariële audiovoorbeelden

Technieken voor het vervaardigen van adversariële audioverstoringen, waaronder psychoakoestische verberging, frequentiedomein-aanvallen en over-the-air adversariële audio.

multimodalaudioinjectionattacks

Op audio gebaseerde injectie-aanvallen

Het aanvallen van speech-to-text- en audio-taalmodellen door het vervaardigen van adversariële audio.

multimodalcross-modaltransferattacks

Cross-Modal Transfer-aanvallen

Aanvallen die overdraagbaar zijn tussen modaliteiten — waarbij één inputmodaliteit gebruikt wordt om de verwerking in een andere aan te vallen.

attacksconsistencymultimodal

Multimodale consistentieaanvallen

Misbruik inconsistenties tussen hoe verschillende modaliteiten dezelfde informatie verwerken.

prompt-injectionpersonarole-playattacks

Persona-gebaseerde injectie-aanvallen

Geavanceerde persona- en rollenspel-gebaseerde aanvallen die het instructievolgende gedrag misbruiken.

injectiontemporalpromptattacks

Temporele injectie-aanvallen

Misbruik van tijdsafhankelijk gedrag in modellen, waaronder seizoensgebonden variaties in veiligheid en misbruik van updatevensters.

attacksuniversalinjectionsuffixprompt

Universele suffix-aanvallen

Onderzoek naar en praktijk van universele adversarial suffixes die overdraagbaar zijn tussen modellen en prompts.

training-pipelinecontinuous-trainingpipelineattacks

Aanvallen op de continue trainingspipeline

Continue-leer- en online-trainingspipelines exploiteren via manipulatie van streaming data.

training-pipelinedata-poisoningscaleattacks

Datavergiftiging op schaal

Technieken om trainingsdata op schaal te vergiftigen om modelgedrag over brede capaciteiten heen te beïnvloeden.

training-pipelinefederated-learningattacksdistributed

Aanvallen op Federated Learning (Training Pipeline)

Aanvallen op federated learning-opstellingen, waaronder modelvergiftiging, data-inferentie en manipulatie van aggregatie.

LoRA- & adapterlaag-aanvallen

Beveiligingsimplicaties van LoRA en op adapters gebaseerde fine-tuning, waaronder het verwijderen van safety-alignment, adaptervergiftiging, rangmanipulatie-aanvallen en het misbruik van multi-adapter-conflicten.

loraadapterattacks

attackspipelinedistillationknowledgetraining

Beveiliging van kennisdistillatie

Beveiligingsimplicaties van kennisdistillatie, waaronder capaciteitsextractie en overdracht van safety-alignment.

training-pipelinesupply-chainmodelattacks

Aanvallen op de model-supply-chain

Uitgebreide analyse van aanvalsvectoren op de model-supply-chain, van trainingsdata tot en met deployment.

training-pipelinepre-trainingdataattacks

Aanvallen op pre-trainingdata

Het aanvallen van de pre-trainingdatapipeline, waaronder vergiftiging van webcrawls en manipulatie van datacuratie.

training-pipelinesynthetic-datapipelineattacks

Aanvallen op de synthetische-datapipeline (Training Pipeline)

Het aanvallen van pipelines voor het genereren van synthetische data die worden gebruikt voor modeltraining en -augmentatie.

training-pipelineinfrastructureattackscompute

Aanvallen op de trainingsinfrastructuur

Attacking training infrastructure including GPU clusters, distributed training, and orchestration systems.

chainingattacksexploitationapiwalkthroughs

Walkthrough: misbruik van API-chaining

Walkthrough of chaining multiple API calls in agent systems to achieve multi-step unauthorized actions.

attackscompetitivewalkthroughwalkthroughsjailbreak

Jailbreak-technieken in competitiestijl

Walkthrough of jailbreak techniques used in AI security competitions and CTF events.

attacksinjectionwalkthroughusecomputerwalkthroughs

Walkthrough: injection bij computer-use agents

Walkthrough of injecting prompts through UI elements and screenshots processed by computer-use agents.

diveattackscontextdeepmanipulationwalkthroughs

Contextmanipulatie deep dive

Advanced walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.

attacksharvestingdatawalkthroughwalkthroughs

Dataharvesting via LLM-apps

Complete walkthrough of systematic data extraction from LLM applications using various exfiltration channels.

attacksencodingchainbypasswalkthroughwalkthroughs

Walkthrough: encodingketen omzeilen

Walkthrough of chaining Base64, URL encoding, and Unicode tricks to bypass multi-layer input filters.

attacksmemorywalkthroughpersistencewalkthroughs

Walkthrough: aanval op geheugenpersistentie

Walkthrough of achieving persistent memory manipulation in agent systems for cross-session influence.

attacksagentsprivilegeescalationwalkthroughwalkthroughs

Walkthrough: privilege escalation van agents

Walkthrough of escalating privileges in multi-agent systems through trust chain exploitation.

walkthroughsrole-confusioninstruction-hierarchyattacks

Walkthrough van een role confusion-aanval

Exploit role confusion between system, user, and assistant messages to override safety instructions.

attacksroleexploitationconfusionwalkthroughs

Walkthrough van misbruik van rolverwarring

Exploit role boundaries in chat APIs by injecting assistant-role messages and system-level instructions.

attackscamouflagesemanticwalkthroughwalkthroughs

Walkthrough van semantische camouflage

Walkthrough of crafting semantically camouflaged injections that evade both classifiers and human review.

walkthroughssemanticdeceptionattacks

Walkthrough van een semantische misleidingsaanval

Craft semantically deceptive prompts that appear benign to classifiers while achieving adversarial objectives.

attackssupplychainpoisoningwalkthroughwalkthroughs

Vergiftiging van de model-supply chain

Walkthrough of poisoning ML supply chains through dependency confusion, model weight manipulation, and hub attacks.

attacksoptimizationtokenwalkthroughs

Walkthrough van aanvalsoptimalisatie op tokenniveau

Walkthrough of optimizing adversarial token sequences for maximum jailbreak effectiveness.

walkthroughsvision-modelattacksmultimodal

Walkthrough van een aanval op een vision-model

Attack vision-language models through adversarial images with embedded text, perturbations, and visual trojans.

attacksvisionmodelwalkthroughwalkthroughs

Walkthrough van een aanval op een vision-model (aanval-walkthrough)

Step-by-step walkthrough of visual prompt injection, adversarial images, and OCR exploitation in vision-language models.

attacksxmlinjectionwalkthroughjsonwalkthroughs

XML- en JSON-injectie in LLM-apps

Walkthrough of exploiting XML and JSON parsing in LLM applications for injection and data manipulation.