# manipulation
60 articlestagged with “manipulation”
Manipulating Reasoning Chains
Techniques for influencing an AI agent's chain-of-thought reasoning to steer its planning, decision-making, and tool selection toward attacker-desired outcomes.
Memory Summary Manipulation
Exploiting automatic memory summarization to embed persistent instructions in compressed context.
Tool Selection Manipulation
Manipulate model tool selection decisions through crafted prompts that bias toward attacker-preferred functions.
MCP Resource Manipulation Attacks
Exploiting MCP resource exposure to access, modify, or exfiltrate data through resource URIs.
A2A Artifact Manipulation
Manipulating artifacts exchanged between agents in A2A protocol for data poisoning and injection.
A2A Consensus Manipulation
Attacking multi-agent voting and consensus mechanisms to influence collective decision-making outcomes.
A2A Task State Manipulation
Manipulating task states in A2A to skip validation, bypass approval, or redirect task completion.
Multi-Agent Consensus Manipulation
Manipulate voting and consensus mechanisms in multi-agent decision-making systems.
Planning Agent Manipulation
Manipulating LLM-based planning agents to execute adversarial action sequences.
Tool Selection Manipulation in Workflows
Influencing the tool selection process in agent workflows to redirect execution through attacker-controlled tools.
Workflow Checkpoint Manipulation
Manipulating workflow checkpoints and savepoints for state rollback attacks.
Workflow State Manipulation
Manipulating workflow state machines to skip validation steps and reach privileged execution paths.
Context Window Security Assessment
Assessment of context window overflow, attention manipulation, and long-context exploitation techniques.
Code Review AI Manipulation
Manipulating AI code review systems to approve vulnerable code or miss security issues.
Multi-File Context Attacks
Exploiting how AI coding assistants process multi-file context to create distributed injection payloads across repository files.
PR Review AI Manipulation
Techniques for manipulating AI-powered code review tools to approve malicious changes or miss security vulnerabilities.
Advanced Test Generation Manipulation
Advanced techniques for manipulating AI-generated tests to create false assurance by generating tests that pass but don't verify security properties.
Annotation Manipulation Attacks
Compromising model training by manipulating annotations and labels during the data labeling process, including insider and crowdsourcing attacks.
Continual Learning Vulnerabilities
Exploiting continual learning update mechanisms to introduce vulnerabilities through incremental model updates without triggering safety evaluations.
Curriculum Learning Exploitation
Exploiting curriculum learning strategies by manipulating the order and pacing of training data presentation to amplify poisoning effectiveness.
Data Augmentation Attacks
Exploiting automated data augmentation pipelines to amplify poisoned samples or introduce adversarial patterns through augmentation transformations.
Reinforcement Feedback Poisoning
Compromising reinforcement learning from human feedback by poisoning preference data, manipulating reward models, or exploiting annotator disagreement.
Embedding Backdoor Attacks
Inserting backdoors into embedding models that cause specific trigger inputs to produce predetermined embedding vectors for adversarial retrieval.
Hybrid Search Manipulation
Attacking hybrid search systems that combine dense and sparse retrieval by exploiting score fusion and re-ranking vulnerabilities.
Similarity Search Gaming
Techniques for crafting adversarial content that games similarity search to ensure attacker-controlled documents rank highest in retrieval results.
Similarity Search Manipulation
Manipulating similarity search results through adversarial embedding crafting.
Sparse Embedding Manipulation
Manipulating sparse embeddings (BM25, SPLADE) for retrieval result poisoning.
Checkpoint Manipulation Attacks
Intercepting and modifying model checkpoints during the fine-tuning process to inject persistent backdoors or remove safety properties.
Instruction Tuning Manipulation
Techniques for manipulating instruction-tuned models by crafting adversarial training examples that alter the model's instruction-following behavior.
RLHF Preference Manipulation
Strategies for manipulating RLHF preference rankings to shift model behavior, including Sybil attacks on crowdsourced preferences.
Chain-of-Thought Exploitation
Techniques for manipulating reasoning chains in CoT-enabled models: false premise injection, logic bombs, reasoning hijacking, and chain corruption attacks.
Attention Manipulation Research
Research into directly manipulating attention patterns to achieve injection objectives, informed by mechanistic interpretability insights.
Logit Manipulation
Techniques for influencing LLM output distributions through crafted inputs that push logit values toward attacker-desired tokens, exploiting softmax mechanics and token competition dynamics.
Advanced RAG Manipulation Lab
Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.
Lab: Reasoning Trace Manipulation
Manipulate chain-of-thought reasoning traces to inject false premises and redirect model conclusions.
Attention Pattern Manipulation
Craft inputs that manipulate transformer attention patterns to prioritize adversarial content over safety instructions.
Lab: Conversation History Manipulation
Test how LLM applications handle conversation history including truncation, injection, and context window management.
Lab: Output Format Manipulation
Hands-on lab on forcing LLMs to output in specific formats including JSON injection, markdown manipulation, and structured output exploitation for data exfiltration.
Lab: Token Manipulation Basics
Understand how tokenization affects LLM behavior and practice basic token-level manipulation techniques.
KV Cache Manipulation Attacks
Manipulate key-value caches in inference servers to inject or alter cached context across requests.
Practical Embedding Manipulation
Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.
Lab: Embedding Space Manipulation
Hands-on lab for crafting documents optimized to be retrieved for specific queries through embedding collision attacks using sentence-transformers.
LLM Judge Manipulation
Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.
Lab: Re-ranking Attacks
Hands-on lab for manipulating the re-ranking stage of RAG pipelines to promote or suppress specific documents in retrieval results.
Lab: Semantic Search Manipulation
Manipulate semantic search results by crafting documents that achieve target similarity scores against specific queries.
Chat History Manipulation Attacks
Inject false messages into conversation history to alter model behavior in subsequent turns.
Function Schema Manipulation
Exploit function calling schemas to inject instructions through parameter descriptions and enums.
Advanced A/B Test Exploitation
Manipulating A/B testing frameworks to bias model selection toward less secure variants or introduce adversarial model candidates.
Feature Flag Manipulation in AI Systems
Attacking feature flag systems to alter AI system behavior, enable hidden capabilities, or disable safety controls in production.
Temporal Manipulation & Frame Injection
Techniques for injecting adversarial frames into video streams, temporal consistency attacks, and single-frame injection for behavior change in video AI models.
Conversation Steering
Techniques for gradually redirecting conversation context toward attack objectives without triggering safety mechanisms.
Prefix Injection Attacks
Prepending carefully crafted content before user queries to prime model behavior and create a favorable context for subsequent injection attempts.
Social Engineering of AI
Manipulating AI systems through emotional appeals, authority claims, urgency framing, and social pressure tactics that exploit instruction-following tendencies.
Retrieval Manipulation (Rag Data Attacks)
Techniques for manipulating RAG retrieval to control which documents reach the LLM context, including adversarial query reformulation, retriever bias exploitation, and semantic similarity gaming.
Social Engineering for AI Systems
Manipulating human operators and administrators of AI systems to gain access, extract information, or bypass security controls through social engineering techniques.
Checkpoint Manipulation Attacks (Training Pipeline)
Direct manipulation of model checkpoints and saved weights to inject backdoors or alter behavior.
Instruction Tuning Data Manipulation
Manipulating instruction tuning datasets to embed specific behaviors in the resulting model.
Model Weight Manipulation Techniques
Direct manipulation of model weights to inject backdoors, modify behavior, and bypass safety training.
Advanced Context Manipulation Walkthrough
Manipulate conversation context through history injection, context overflow, and attention weight exploitation.
Context Manipulation Deep Dive
Advanced walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.