# manipulation

agentic-exploitationagent-memorysummarymanipulation

Manipulatie van geheugensamenvatting

Misbruik van automatische geheugensamenvatting om persistente instructies in gecomprimeerde context in te bedden.

agentic-exploitationfunction-callingtool-selectionmanipulation

Manipulatie van toolselectie

Manipuleer de beslissingen van het model over toolselectie via geprepareerde prompts die het model naar door de aanvaller geprefereerde functies sturen.

agentic-exploitationmcpresourcemanipulation

MCP resourcemanipulatie-aanvallen

MCP-resourceblootstelling exploiteren om data te benaderen, te wijzigen of te exfiltreren via resource-URI's.

exploitationartifactagenticmanipulationa2a

A2A-artefactmanipulatie

Artefacten die tussen agents worden uitgewisseld in het A2A-protocol manipuleren voor datavergiftiging en injectie.

A2A-consensusmanipulatie

Stem- en consensusmechanismen in multi-agentsystemen aanvallen om de uitkomsten van collectieve besluitvorming te beïnvloeden.

a2aconsensusmanipulation

exploitationtaskagenticmanipulationa2astate

A2A Task State-manipulatie

Het manipuleren van taakstatussen in A2A om validatie over te slaan, goedkeuring te omzeilen of taakvoltooiing om te leiden.

agentic-exploitationmulti-agentconsensusmanipulation

Manipulatie van multi-agentconsensus

Manipuleer stem- en consensusmechanismen in multi-agentsystemen voor besluitvorming.

agentic-exploitationplanningagentmanipulation

Manipulatie van planning-agents

Het manipuleren van LLM-gebaseerde planning-agents om kwaadaardige actiereeksen uit te voeren.

workflowtool-selectionmanipulation

Tool Selection Manipulation in Workflows

Het tool-selectieproces in agent-workflows beïnvloeden om de uitvoering om te leiden via tools die door de aanvaller worden beheerd.

exploitationcheckpointworkflowagenticmanipulation

Workflow Checkpoint Manipulation

Workflow-checkpoints en savepoints manipuleren voor state-rollback-aanvallen.

agentic-exploitationworkflowstatemanipulation

Workflow-statemanipulatie

Workflow-statemachines manipuleren om validatiestappen over te slaan en geprivilegieerde uitvoeringspaden te bereiken.

assessmentcontext-windowmanipulation

Beveiligingsassessment van het context window

Assessment of context window overflow, attention manipulation, and long-context exploitation techniques.

code-gencode-reviewmanipulationai

Manipulatie van AI voor codereview

Manipulating AI code review systems to approve vulnerable code or miss security issues.

code-genmulti-filecontextmanipulation

Aanvallen via context over meerdere bestanden

Exploiting how AI coding assistants process multi-file context to create distributed injection payloads across repository files.

code-genpull-requestreviewmanipulation

AI-manipulatie bij PR-review

Techniques for manipulating AI-powered code review tools to approve malicious changes or miss security vulnerabilities.

code-gentestinggenerationmanipulation

Geavanceerde manipulatie van testgeneratie

Advanced techniques for manipulating AI-generated tests to create false assurance by generating tests that pass but don't verify security properties.

data-trainingannotationlabelingmanipulation

Aanvallen via manipulatie van annotaties

Compromising model training by manipulating annotations and labels during the data labeling process, including insider and crowdsourcing attacks.

data-trainingcontinual-learningcatastrophic-forgettingmanipulation

Kwetsbaarheden bij continual learning

Exploiting continual learning update mechanisms to introduce vulnerabilities through incremental model updates without triggering safety evaluations.

data-trainingcurriculum-learningschedulingmanipulation

Misbruik van curriculum learning

Exploiting curriculum learning strategies by manipulating the order and pacing of training data presentation to amplify poisoning effectiveness.

data-trainingaugmentationmanipulationtraining

Aanvallen via data-augmentatie

Exploiting automated data augmentation pipelines to amplify poisoned samples or introduce adversarial patterns through augmentation transformations.

data-trainingRLHFfeedbackmanipulation

Vergiftiging van reinforcement-feedback

Compromising reinforcement learning from human feedback by poisoning preference data, manipulating reward models, or exploiting annotator disagreement.

embeddingbackdoortrainingmanipulation

Backdoor-aanvallen op embeddings

Het inbouwen van backdoors in embeddingmodellen waardoor specifieke triggerinvoer vooraf bepaalde embeddingvectoren produceert voor adversarial retrieval.

embeddinghybrid-searchfusionmanipulation

Manipulatie van hybride zoekopdrachten

Hybride zoeksystemen aanvallen die dense en sparse retrieval combineren, door kwetsbaarheden in score fusion en re-ranking te misbruiken.

embeddingsimilaritygamingmanipulation

Gaming van similarity search

Technieken om adversarial inhoud te maken die similarity search manipuleert, zodat documenten onder controle van de aanvaller bovenaan eindigen in de retrievalresultaten.

embeddingsimilaritymanipulationsearch

Manipulatie van similarity search

Het manipuleren van de resultaten van similarity search door het vervaardigen van adversarial embeddings.

vectorsparsemanipulationembedding

Manipulatie van sparse embeddings

Het manipuleren van sparse embeddings (BM25, SPLADE) om retrievalresultaten te vergiftigen.

fine-tuningcheckpointmanipulationpersistence

Aanvallen via manipulatie van checkpoints

Intercepting and modifying model checkpoints during the fine-tuning process to inject persistent backdoors or remove safety properties.

fine-tuninginstruction-tuningmanipulationsafety

Manipulatie van instruction tuning

Techniques for manipulating instruction-tuned models by crafting adversarial training examples that alter the model's instruction-following behavior.

fine-tuningRLHFpreferencemanipulation

Manipulatie van RLHF-preferenties

Strategies for manipulating RLHF preference rankings to shift model behavior, including Sybil attacks on crowdsourced preferences.

cotexploitationreasoningmanipulation

Chain-of-Thought-exploitatie

Technieken voor het manipuleren van reasoning chains in CoT-geschikte modellen: injectie van valse premissen, logic bombs, reasoning hijacking en chain-corruptieaanvallen.

researchattentionmanipulationmechanistic

Onderzoek naar attention-manipulatie

Onderzoek naar het direct manipuleren van attention-patronen om injection-doelen te bereiken, geïnformeerd door inzichten uit mechanistische interpretability.

logitsoutput-distributionsoftmaxtoken-competitionmanipulation

Logit-manipulatie

Technieken om de outputdistributies van een LLM te beïnvloeden via geprepareerde inputs die logit-waarden in de richting van door de aanvaller gewenste tokens duwen, waarbij de werking van softmax en de dynamiek van tokencompetitie worden misbruikt.

advancedlabragmanipulationlabs

Lab: geavanceerde RAG-manipulatie

Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.

labsreasoning-tracemanipulationadvanced

Lab: manipulatie van reasoning traces

Manipulate chain-of-thought reasoning traces to inject false premises and redirect model conclusions.

labsattentionmanipulationtransformeradvanced

Manipulatie van attention-patronen

Craft inputs that manipulate transformer attention patterns to prioritize adversarial content over safety instructions.

labsconversation-historymanipulationbeginner

Lab: manipulatie van de conversatiegeschiedenis

Test how LLM applications handle conversation history including truncation, injection, and context window management.

Beginner

Lab: manipulatie van het uitvoerformaat

Hands-on lab on forcing LLMs to output in specific formats including JSON injection, markdown manipulation, and structured output exploitation for data exfiltration.

laboutputformatmanipulation

Beginner

Lab: grondbeginselen van tokenmanipulatie

Understand how tokenization affects LLM behavior and practice basic token-level manipulation techniques.

labstokensmanipulationbeginner

Beginner

Manipulatie-aanvallen op de KV-cache

Manipulate key-value caches in inference servers to inject or alter cached context across requests.

labskv-cachemanipulationinferenceexpert

intermediatelabpracticalmanipulationembeddinglabs

Praktische manipulatie van embeddings

Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.

labembeddingsmanipulationvectors

Lab: manipulatie van de embeddingruimte

Hands-on lab for crafting documents optimized to be retrieved for specific queries through embedding collision attacks using sentence-transformers.

labsllm-judgemanipulationintermediate

Manipulatie van een LLM-judge

Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.

labragrerankingmanipulationdata-attacks

Lab: re-ranking-aanvallen

Hands-on lab for manipulating the re-ranking stage of RAG pipelines to promote or suppress specific documents in retrieval results.

labssemantic-searchmanipulationintermediate

Lab: manipulatie van semantisch zoeken

Manipulate semantic search results by crafting documents that achieve target similarity scores against specific queries.

labschat-historymanipulationintermediate

Aanvallen via manipulatie van chatgeschiedenis

Inject false messages into conversation history to alter model behavior in subsequent turns.

labsfunction-schemamanipulationintermediate

Manipulatie van function schema's

Exploit function calling schemas to inject instructions through parameter descriptions and enums.

llmopsab-testingexploitationmanipulation

Geavanceerd misbruik van A/B-testen

Manipulating A/B testing frameworks to bias model selection toward less secure variants or introduce adversarial model candidates.

llmopsfeature-flagsmanipulationconfiguration

Manipulatie van feature flags in AI-systemen

Attacking feature flag systems to alter AI system behavior, enable hidden capabilities, or disable safety controls in production.

videotemporalframe-injectionmanipulation

Temporele manipulatie & frame-injectie

Technieken voor het injecteren van adversariële frames in videostreams, temporele consistentieaanvallen en single-frame-injectie voor gedragsverandering in video-AI-modellen.

conversation-steeringpersistencetopic-driftmanipulationred-teaming

Conversation steering

Technieken om de context van een gesprek geleidelijk richting aanvalsdoelen te sturen zonder veiligheidsmechanismen te activeren.

prompt-injectionprefixprimingmanipulation

Prefix injection-aanvallen

Het zorgvuldig vooraan toevoegen van geprepareerde inhoud vóór gebruikersvragen om het modelgedrag te sturen en een gunstige context te creëren voor latere injectiepogingen.

social-engineeringpersuasionmanipulationjailbreakred-teaming

Social engineering van AI

Het manipuleren van AI-systemen via emotionele beroepen, autoriteitsclaims, urgentieframing en social-pressuretactieken die de neiging om instructies op te volgen misbruiken.

retrievalmanipulationragquery-reformulationsemantic-similarity

Manipulatie van retrieval (RAG-data-aanvallen)

Technieken om RAG-retrieval te manipuleren en zo te bepalen welke documenten de LLM-context bereiken, waaronder adversariële herformulering van queries, het misbruiken van retriever-bias en het gamen van semantische similariteit.

social-engineeringhuman-factorsoperatorsaccessmanipulation

Social engineering voor AI-systemen

Het manipuleren van menselijke operators en beheerders van AI-systemen om toegang te krijgen, informatie te onttrekken of beveiligingscontroles te omzeilen via social-engineeringtechnieken.

training-pipelinecheckpointmanipulationweights

Checkpoint-manipulatieaanvallen (training pipeline)

Directe manipulatie van modelcheckpoints en opgeslagen gewichten om backdoors te injecteren of gedrag te wijzigen.

instructionpipelinetuningmanipulationtraining

Manipulatie van instruction-tuning-data

Het manipuleren van instruction-tuning-datasets om specifiek gedrag in het resulterende model in te bedden.

trainingweightsmanipulation

Technieken voor manipulatie van modelgewichten

Directe manipulatie van modelgewichten om backdoors te injecteren, gedrag te wijzigen en safety-training te omzeilen.

walkthroughscontextmanipulationadvanced

Walkthrough: geavanceerde contextmanipulatie

Manipulate conversation context through history injection, context overflow, and attention weight exploitation.

diveattackscontextdeepmanipulationwalkthroughs

Contextmanipulatie deep dive

Advanced walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.