# intermediate

155 articlestagged with “intermediate”

Intermediate Certification Prep Guide

Comprehensive study guide for intermediate-level certification preparation.

assessmentsstudy-guideintermediatecertification

Capstone: Compliance Assessment Simulation

Conduct a simulated compliance assessment against EU AI Act, NIST AI RMF, and ISO 42001, producing a comprehensive gap analysis report.

capstonecomplianceeu-ai-actnistiso-42001intermediate

Intermediate

Capstone: Red Team Program Design

Design a complete AI red team program for a fictional enterprise, producing a comprehensive program charter document.

capstoneprogram-designgovernancecharterintermediate

Intermediate

Capstone: Industry Vertical Deep Dive

Select an industry vertical, threat model the AI systems within it, and produce a sector-specific AI security testing guide.

capstoneverticalhealthcarefinancelegalgovernmentintermediate

Intermediate

LLM Guard and Protect AI Guardian

Input/output scanning, PII detection, toxicity filtering, integration patterns, and bypass techniques for LLM Guard and the Protect AI Guardian ecosystem.

llm-guardprotect-aipii-detectiontoxicitybypassintermediate

Intermediate

NVIDIA NeMo Guardrails

Architecture, configuration, Colang programming, integration patterns, and bypass techniques for NVIDIA's open-source NeMo Guardrails framework.

nemo-guardrailsnvidiacolangguardrailsbypassintermediate

Intermediate

AI Anomaly Detection

Detecting jailbreak attempts, unusual usage patterns, output drift, and embedding space anomalies in AI systems through statistical and ML-based methods.

anomaly-detectionjailbreak-detectiondriftembeddingintermediate

Intermediate

AI Monitoring and Observability

What to monitor in AI systems, key metrics for detecting abuse and drift, alerting strategies, and observability architecture for LLM applications.

monitoringobservabilitymetricsalertingintermediate

Intermediate

AI Logging Architecture

What to capture in AI system logs — prompts, completions, latency, tokens, tool calls — along with storage strategies, retention policies, and privacy considerations.

loggingarchitectureprivacyretentioncomplianceintermediate

Intermediate

Adversarial ML: Core Concepts

History and fundamentals of adversarial machine learning — perturbation attacks, evasion vs poisoning, robustness — bridging classical adversarial ML to LLM-specific attacks.

adversarial-mlfundamentalsevasionpoisoningintermediate

Intermediate

AI Threat Models: White-box, Black-box & Grey-box

Access levels in AI security testing — what's possible at each level, realistic scenarios, and comparison to traditional security threat modeling.

threat-modelwhite-boxblack-boxgrey-boxintermediate

Intermediate

Embeddings & Vector Spaces for Red Teamers

Understand how embeddings encode semantic meaning, how vector operations work, and why red teamers need to understand embedding spaces for RAG attacks and similarity exploitation.

embeddingsvectorssemanticsintermediate

Intermediate

Lab: Exploring Embedding Spaces

Hands-on lab using Python to visualize embedding spaces, measure semantic similarity, and demonstrate how adversarial documents can be crafted to match target queries.

labembeddingshands-onpythonintermediate

Intermediate

RAG Architecture: How Retrieval Systems Work

End-to-end anatomy of a Retrieval-Augmented Generation pipeline — document ingestion, chunking, embedding, indexing, retrieval, context assembly, and generation — with attack surface analysis at each stage.

ragretrievalarchitectureintermediate

Intermediate

Semantic Similarity & Vector Search Attacks

How cosine similarity and nearest neighbor search work, and how attackers exploit vector databases through adversarial documents, embedding collisions, and threshold bypasses.

Intermediate

Inference: Sampling, Temperature & Generation

How LLMs generate text at inference time — greedy decoding, top-k, top-p, temperature — and how these parameters affect attack success rates.

inferencesamplingtemperaturedecodingintermediate

Intermediate

Tokenization & Its Security Implications

How BPE and SentencePiece tokenizers work, and how tokenizer behavior creates exploitable attack surfaces including boundary attacks, homoglyphs, and encoding tricks.

tokenizationbpesecurityencodingintermediate

Intermediate

Pre-training → Fine-tuning → RLHF Pipeline

Understand the three stages of creating an aligned LLM — pre-training, supervised fine-tuning, and RLHF/DPO — and the security implications at each stage.

trainingrlhffine-tuningalignmentintermediate

Intermediate

Transformer Architecture for Attackers

Deep dive into the transformer architecture — attention, feed-forward layers, and residual connections — through the lens of which components are exploitable.

transformerattentionarchitectureintermediate

Intermediate

Agent Architectures & Tool Use Patterns

How ReAct, Plan-and-Execute, and LangGraph agent patterns work — tool definition, invocation, and result processing — and where injection happens in each architecture.

agentstoolsreactlangchainintermediate

Intermediate

Common AI Deployment Patterns & Attack Surfaces

Analysis of deployment patterns — chatbots, copilots, autonomous agents, batch processing, and fine-tuned models — with their unique attack surfaces and security considerations.

deploymentpatternsarchitectureintermediate

Intermediate

Lab: Mapping an AI System's Attack Surface

Hands-on lab walking through reconnaissance of an AI system — identifying components, mapping data flows, enumerating tools, and documenting the attack surface.

labreconnaissanceattack-surfacemappingintermediate

Intermediate

CTF: Code Gen Exploit

Manipulate AI code generation to produce vulnerable, backdoored, or malicious code. Explore how prompt manipulation influences code security, from subtle vulnerability injection to full backdoor insertion.

ctfcode-generationvulnerabilityintermediate

Intermediate

CTF: RAG Heist

Extract sensitive information from a Retrieval-Augmented Generation system by exploiting retrieval mechanisms, document parsing, embedding manipulation, and context window management vulnerabilities.

ctfragdata-extractionretrievalintermediate

Intermediate

Intermediate AI Red Team Labs

Overview of intermediate-level hands-on labs covering agent exploitation, RAG poisoning, multi-turn attacks, automated campaigns, and advanced injection techniques for AI red teaming.

labsintermediate

Intermediate

Advanced Encoding Chain Attacks

Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.

encodingchainsintermediateadvancedlablabs

Intermediate

Lab: Advanced Function Calling Exploitation

Exploit advanced function calling patterns including nested calls, parallel execution, and schema manipulation.

labsfunction-callingadvancedintermediate

Intermediate

Lab: Automated Adversarial Prompt Generation

Build automated systems for generating adversarial prompts using mutation, crossover, and selection strategies.

labsadversarialprompt-generationintermediate

Intermediate

Agent Memory Manipulation

Exploit persistent memory in LLM agents to plant false context that persists across sessions.

labsagentmemoryintermediate

Intermediate

Lab: Agent Prompt Leaking

Hands-on lab for extracting system prompts from tool-using AI agents by exploiting tool interactions, observation channels, and multi-step reasoning to leak protected instructions.

labagentsprompt-leakingtool-useintermediate

Intermediate

Lab: Agent Workflow Security Testing

Test multi-step agent workflows for injection points, state manipulation, and unauthorized action execution.

labsagent-workflowtestingintermediate

Intermediate

API Abuse Pattern Lab

Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.

intermediateabuselabpatternslabsapi

Intermediate

API Parameter Fuzzing for LLMs

Systematically fuzz LLM API parameters to discover unexpected behaviors and bypasses.

labsapifuzzingintermediate

Intermediate

Lab: API Versioning Security Gaps

Exploit version differences in LLM APIs to find deprecated endpoints and backward-compatible safety bypasses.

labsapi-versioningsecurity-gapsintermediate

Intermediate

Lab: Automated Red Team Pipeline

Hands-on lab for building a continuous AI red team testing pipeline using promptfoo, GitHub Actions, and automated attack generation to catch safety regressions before deployment.

labautomationci-cdpromptfoopipelineintermediate

Intermediate

Lab: Automated Red Team Testing

Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.

labautomated-testingpromptfooci-cdintermediatehands-on

Intermediate

Lab: Canary Token Effectiveness Testing

Deploy and test canary tokens in system prompts to evaluate their effectiveness at detecting prompt extraction.

labscanary-tokenstestingintermediate

Intermediate

Lab: Chain-of-Thought Exploitation

Exploit chain-of-thought reasoning to leak internal model reasoning, bypass safety filters, and manipulate decision processes.

labschain-of-thoughtexploitationintermediate

Intermediate

Lab: CI/CD Pipeline Security for LLM Apps

Test CI/CD pipeline security for LLM applications including prompt template injection, model artifact tampering, and deployment attacks.

labscicdpipeline-testingintermediate

Intermediate

Lab: Advanced Classifier Evasion

Evade advanced input classifiers using adversarial perturbation, semantic paraphrasing, and gradient-free optimization.

labsclassifierevasion-advancedintermediate

Intermediate

ML Classifier Evasion Lab

Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.

classifierintermediateevasionlablabs

Intermediate

Lab: Cloud AI Platform Security Testing

Test cloud AI platform configurations including AWS Bedrock guardrails, Azure content filters, and GCP safety settings.

labscloud-platformtestingintermediate

Intermediate

Lab: Code Generation Security Testing

Test LLM code generation for insecure patterns, injection vulnerabilities, and code execution safety issues.

labscode-generationsecurity-testingintermediate

Intermediate

Lab: AI Compliance Testing Fundamentals

Test LLM applications against OWASP LLM Top 10 and NIST AI RMF requirements using structured test procedures.

labscompliancetestingintermediate

Intermediate

Lab: Context Window Attack Optimization

Optimize prompt injection placement within the context window to maximize attack effectiveness using attention dynamics.

labscontext-windowoptimizationintermediate

Intermediate

Context Window Poisoning Lab

Exploit context window management to inject persistent adversarial content that influences future model responses.

contextintermediatelabpoisoningwindowlabs

Intermediate

Conversation Splitting Attack Lab

Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.

intermediatesplittingconversationlablabs

Intermediate

Conversation State Poisoning

Exploit conversation history manipulation to alter model behavior across turns.

labsconversationstate-poisoningintermediate

Intermediate

Cross-Context Injection

Inject prompts that persist across separate conversation contexts in shared deployments.

labscross-contextinjectionintermediate

Intermediate

Custom Jailbreak Wordlist Generation

Generate targeted jailbreak wordlists using mutation strategies and seed prompts.

labswordlistsjailbreakingintermediate

Intermediate

Lab: Data Exfiltration Channels

Hands-on lab for extracting data from AI systems through markdown image rendering, invisible links, tool call parameters, and other covert exfiltration channels.

labdata-exfiltrationcovert-channelsmarkdownintermediate

Intermediate

Lab: Data Exfiltration Channels (Intermediate Lab)

Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.

labdata-exfiltrationchannelsside-channelintermediatehands-on

Intermediate

Data Exfiltration via Side Channels

Extract data from LLM systems using timing, token probability, and formatting side channels.

labsexfiltrationside-channelsintermediate

Intermediate

Building Defense Bypass Chains

Combine multiple bypass techniques into reliable chains that defeat layered defenses.

labsdefense-bypasschainsintermediate

Intermediate

Lab: Defense Effectiveness Measurement

Hands-on lab for quantifying AI guardrail robustness using attack success rates, evasion metrics, false positive rates, and statistical analysis of defense performance.

labdefensemetricsmeasurementguardrailsintermediate

Intermediate

Lab: Defense Regression Testing Setup

Build a regression testing framework to continuously verify that LLM defenses remain effective against known attack patterns.

labsdefenseregression-testingintermediate

Intermediate

Lab: Defense Effectiveness Testing

Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.

labdefense-testingguardrailsmetricsintermediatehands-on

Intermediate

Document-Based RAG Injection Lab

Inject adversarial content into documents that will be processed by a RAG system to influence model responses.

intermediateinjectionlablabsragdocument

Intermediate

Lab: Advanced Embedding Manipulation

Manipulate embedding vectors to achieve target similarity scores for RAG poisoning and retrieval manipulation.

labsembeddingmanipulation-advancedintermediate

Intermediate

Practical Embedding Manipulation

Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.

intermediatelabpracticalmanipulationembeddinglabs

Intermediate

Embedding Similarity Attacks

Manipulate text to achieve target embedding similarity scores for retrieval poisoning.

labsembeddingssimilarityintermediate

Intermediate

Lab: Encoding Bypass Lab

Hands-on lab for systematically testing Base64, ROT13, Unicode homoglyph, hexadecimal, and combined encoding bypasses against LLM safety filters with measured success rates.

labencodingbypassobfuscationsystematic-testingintermediate

Intermediate

Fine-Tuning API Security Probing

Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.

fineintermediatetuninglablabsapi

Intermediate

Lab: Fine-Tuning Safety Impact Testing

Measure how fine-tuning affects model safety by comparing pre and post fine-tuning safety benchmark scores.

labsfine-tuningsafety-testingintermediate

Intermediate

Lab: Guardrail Bypass Technique Laboratory

Practice guardrail bypass techniques against NeMo Guardrails, LLM Guard, and custom classifier-based defenses.

labsguardrail-bypasstechniquesintermediate

Intermediate

Guardrail Fingerprinting

Systematically map the rules and thresholds of input/output guardrail systems.

labsguardrailsfingerprintingintermediate

Intermediate

Guardrail Rule Extraction Lab

Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.

guardrailruleintermediatelablabsextraction

Intermediate

Lab: Exploiting LLM Hallucinations

Exploit hallucination tendencies to trigger fabricated tool calls, invented API endpoints, and false fact injection.

labshallucinationexploitationintermediate

Intermediate

Lab: AI Incident Response Simulation

Practice AI incident response procedures through a simulated prompt injection incident with escalation and containment.

labsincident-responsesimulationintermediate

Intermediate

Lab: Indirect Prompt Injection

Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.

labindirect-injectionexternal-datadocumentsintermediatehands-on

Intermediate

Indirect Injection via Web Content

Plant prompt injection payloads in web pages consumed by RAG-enabled LLM applications.

labsindirect-injectionwebintermediate

Intermediate

Lab: Instruction Following Measurement

Quantitatively measure instruction following compliance to identify where models prioritize competing instructions.

labsinstruction-followingmeasurementintermediate

Intermediate

LLM Judge Manipulation

Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.

labsllm-judgemanipulationintermediate

Intermediate

Lab: Log Probability Analysis for Attacks

Use token log probabilities to extract information about training data, system prompts, and model internals.

labslogprobsanalysisintermediate

Intermediate

Lab: Advanced MCP Protocol Testing

Perform advanced security testing of MCP servers including transport analysis, capability enumeration, and tool poisoning.

labsmcpadvanced-testingintermediate

Intermediate

MCP Tool Poisoning Attack

Exploit MCP tool descriptions to inject instructions that redirect agent behavior.

labsmcptool-poisoningintermediate

Intermediate

Lab: Model Context Protocol Fuzzing

Fuzz MCP protocol messages to discover parsing errors, validation gaps, and unexpected behavior in MCP servers.

labsmcp-fuzzingprotocolintermediate

Intermediate

Practical Model Fingerprinting Lab

Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.

fingerprintingintermediatelabpracticallabsmodel

Intermediate

Lab: LLM Security Monitoring Setup

Deploy a comprehensive security monitoring system for LLM applications with anomaly detection and alerting.

labsmonitoringsecurity-setupintermediate

Intermediate

Lab: Multi-Agent Communication Testing

Test security of inter-agent communication in multi-agent systems including message injection and trust boundary violations.

labsmulti-agentcommunication-testingintermediate

Intermediate

Multi-Language Safety Bypass Lab

Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.

intermediatebypasslablanguagelabsmulti

Intermediate

Multi-Language Injection Attacks

Exploit language switching and low-resource language gaps to bypass safety training.

labsmultilingualinjectionintermediate

Intermediate

Multi-Model Attack Chaining

Chain attacks across multiple LLM models in a pipeline to bypass per-model defenses.

labsmulti-modelchainingintermediate

Intermediate

Lab: Multi-Model Comparison Security Testing

Compare security postures across multiple LLM providers by running identical attack suites and analyzing differential responses.

labsmulti-modelcomparison-testingintermediate

Intermediate

Multi-Model Pipeline Attack Lab

Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.

pipelineintermediatelablabsmultimodel

Intermediate

Lab: Multi-Turn State Tracking Attacks

Exploit conversation state tracking to build persistent attack context across multiple conversation turns.

labsmulti-turnstate-trackingintermediate

Intermediate

Lab: Multimodal Injection

Hands-on lab for embedding text instructions in images to exploit vision-enabled LLMs. Learn to craft visual prompt injections, test OCR-based attacks, and evaluate multimodal safety boundaries.

labmultimodalvisionimage-injectionintermediate

Intermediate

Lab: Multimodal Injection (Intermediate Lab)

Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.

labmultimodalimage-injectionvisionintermediatehands-on

Intermediate

Lab: Intermediate Multimodal Security Testing

Test multimodal LLMs with crafted images containing embedded text, adversarial perturbations, and visual injection payloads.

labsmultimodalintermediate-testingintermediate

Intermediate

Output Channel Discovery Lab

Discover alternative output channels in LLM applications that bypass standard output filtering.

discoverychannelintermediateoutputlablabs

Intermediate

Lab: Output Schema Exploitation

Exploit structured output schemas to force models into revealing information through required field population.

labsoutput-schemaexploitationintermediate

Intermediate

Lab: Prompt Caching Side-Channel Attacks

Exploit prompt caching mechanisms to detect cached prompts and extract information through timing side channels.

labsprompt-cachingside-channelintermediate

Intermediate

Prompt Leakage Technique Lab

Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.

intermediatetechniqueslabpromptlabsleakage

Intermediate

Automated Red Teaming with PyRIT

Use Microsoft's PyRIT framework to run automated multi-turn attack campaigns.

labspyritautomatedintermediate

Intermediate

Lab: Advanced RAG Security Testing

Test RAG systems for chunking exploitation, reranking manipulation, and cross-document injection attacks.

labsragadvanced-testingintermediate

Intermediate

RAG Context Poisoning

Poison a vector database to inject adversarial content into RAG retrieval results.

labsragpoisoningintermediate

Intermediate

Reasoning Chain Hijacking

Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.

intermediatechainlabhijackingreasoninglabs

Intermediate

Lab: Role Hierarchy Exploitation

Exploit multi-role LLM architectures by manipulating the instruction hierarchy between system, user, and assistant roles.

labsrole-hierarchyexploitationintermediate

Intermediate

Lab: Safety Filter Benchmarking

Benchmark safety filters across providers using standardized test suites to compare detection rates and false positives.

labssafety-filterbenchmarkingintermediate

Intermediate

Semantic Deception Crafting Lab

Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.

craftingintermediatesemanticlabdeceptionlabs

Intermediate

Semantic Injection Crafting

Craft semantically coherent injections that evade both classifiers and human review.

labssemanticinjectionintermediate

Intermediate

Lab: Semantic Search Manipulation

Manipulate semantic search results by crafting documents that achieve target similarity scores against specific queries.

labssemantic-searchmanipulationintermediate

Intermediate

Lab: Session Isolation Testing

Test session isolation in multi-tenant LLM deployments to detect cross-session data leakage and context contamination.

labssession-isolationtestingintermediate

Intermediate

Session State Exploitation Lab

Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.

exploitationintermediatesessionlablabsstate

Intermediate

Lab: Streaming Response Manipulation

Exploit streaming response mechanisms to bypass output filters that only check complete responses.

labsstreamingresponse-manipulationintermediate

Intermediate

Structured Output Safety Bypass

Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.

intermediatebypassoutputlabstructuredlabs

Intermediate

Structured Output Exploitation

Exploit JSON mode and structured output constraints to bypass safety filters.

labsstructured-outputexploitationintermediate

Intermediate

Lab: Supply Chain Audit

Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.

labsupply-chainauditdependenciessecurityintermediatehands-on

Intermediate

Lab: ML Supply Chain Scan

Hands-on lab for auditing machine learning model dependencies, detecting malicious packages in ML pipelines, and scanning model files for backdoors and supply chain threats.

labsupply-chainsecurityscanningdependenciesintermediate

Intermediate

Lab: System Prompt Reconstruction (Intermediate Lab)

Reconstruct full system prompts through iterative partial extraction and cross-referencing multiple leakage vectors.

labssystem-promptreconstructionintermediate

Intermediate

Advanced Token Smuggling

Use Unicode homoglyphs, zero-width characters, and tokenizer quirks to smuggle payloads.

labstoken-smugglingadvancedintermediate

Intermediate

Lab: Tool Result Injection Attacks

Inject adversarial content through tool call results to poison model reasoning and redirect subsequent actions.

labstool-resultinjectionintermediate

Intermediate

Lab: Unicode Normalization Bypass Attacks

Exploit Unicode normalization differences between input validation and model processing to smuggle injection payloads.

labsunicodenormalization-bypassintermediate

Intermediate

Lab: Vector Database Access Control Testing

Test vector database access controls for bypass vulnerabilities including namespace traversal and filter manipulation.

labsvector-dbaccess-controlintermediate

Intermediate

Agent Goal Hijacking

Redirect an AI agent's objectives through carefully crafted inputs that override its primary task.

labsagentgoal-hijackingintermediate

Intermediate

Assistant Prefill Injection Attacks

Exploit assistant message prefilling to prime model responses and bypass safety alignment.

labsassistant-prefillinjectionintermediate

Intermediate

Chat History Manipulation Attacks

Inject false messages into conversation history to alter model behavior in subsequent turns.

labschat-historymanipulationintermediate

Intermediate

Input Classifier Boundary Mapping

Systematically probe input classifiers to map their decision boundaries and find bypass regions.

labsclassifierboundary-mappingintermediate

Intermediate

Context Window Boundary Exploitation

Exploit the boundaries of context windows to push safety instructions beyond the model's attention.

labscontext-windowboundaryintermediate

Intermediate

Crescendo Attack Implementation

Implement Microsoft's Crescendo multi-turn escalation attack with automated conversation management.

labscrescendoimplementationintermediate

Intermediate

Cross-Plugin Exploitation Chaining

Chain exploits across multiple plugins/tools in an agent system to achieve multi-step attacks.

labscross-pluginchainingintermediate

Intermediate

Function Schema Manipulation

Exploit function calling schemas to inject instructions through parameter descriptions and enums.

labsfunction-schemamanipulationintermediate

Intermediate

Building Custom Garak Probes

Develop custom probe modules for NVIDIA Garak to test organization-specific vulnerability patterns.

labsgarakcustom-probesintermediate

Intermediate

Guardrail Latency-Based Detection

Use timing side channels to identify and characterize guardrail implementations in LLM applications.

labsguardrailslatencydetectionintermediate

Intermediate

Custom Inspect AI Safety Evaluations

Build custom safety evaluation tasks in Inspect AI to test application-specific vulnerability classes.

labsinspect-aicustom-evaluationsintermediate

Intermediate

JailbreakBench Comparative Testing

Run standardized jailbreak benchmarks across multiple models and compare vulnerability profiles.

labsjailbreakbenchcomparativeintermediate

Intermediate

JSON Mode Safety Bypass

Exploit structured JSON output modes to bypass safety filters that only check plain-text responses.

labsjson-modesafety-bypassintermediate

Intermediate

LLM Guard Evasion Techniques

Develop evasion techniques against LLM Guard input scanners and output detectors.

labsllm-guardevasionintermediate

Intermediate

Implementing Many-Shot Jailbreaking

Implement Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.

labsmany-shotjailbreakingintermediate

Intermediate

MCP Resource Data Exfiltration (Intermediate Lab)

Exploit MCP resource URIs to access and exfiltrate data beyond intended scope boundaries.

labsmcpresourceexfiltrationintermediate

Intermediate

Multi-Language Safety Gap Exploitation

Systematically identify and exploit safety training gaps across low-resource languages.

labsmultilingualsafety-gapsintermediate

Intermediate

Multi-Turn State Tracking Exploitation

Exploit state management in multi-turn conversations to gradually shift model behavior over many turns.

labsmulti-turnstate-trackingintermediate

Intermediate

NeMo Guardrails Bypass Techniques

Test and bypass NVIDIA NeMo Guardrails configurations using known evasion techniques.

labsnemo-guardrailsbypassintermediate

Intermediate

Output Filter Bypass via Encoding

Bypass output safety filters by instructing the model to encode responses in Base64, hex, or custom schemes.

labsoutput-filterbypassencodingintermediate

Intermediate

PDF Document Injection for RAG Systems

Craft adversarial PDF documents that inject instructions when processed by RAG document loaders.

labspdfinjectionragintermediate

Intermediate

Prompt Template Variable Injection

Inject adversarial content through template variable expansion in prompt management systems.

labsprompt-templatevariable-injectionintermediate

Intermediate

PyRIT Custom Attack Strategies

Build custom attack strategies and converters in Microsoft's PyRIT framework for targeted campaigns.

labspyritcustom-strategiesintermediate

Intermediate

RAG Document Injection Campaign

Design and execute a document injection campaign against a RAG-powered application with vector search.

labsragdocument-injectionintermediate

Intermediate

Semantic Search Poisoning

Craft adversarial documents that rank highly in semantic search for targeted queries in RAG systems.

labssemantic-searchpoisoningembeddingsintermediate

Intermediate

Streaming Response Exfiltration

Exploit streaming API responses to exfiltrate data that might be caught by batch output filters.

labsstreamingresponseexfiltrationintermediate

Intermediate

System Prompt Extraction via Token Probabilities

Exploit token probability APIs to reconstruct hidden system prompts character by character.

labslogprobssystem-promptintermediate

Intermediate

Token Boundary Manipulation

Exploit tokenizer-specific behavior by crafting inputs that split across token boundaries in unexpected ways.

labstokenizerboundary-manipulationintermediate

Intermediate

Tool Result Injection Attacks

Craft malicious tool return values that inject instructions back into the model's reasoning chain.

labstool-resultinjectionintermediate

Intermediate

Injecting Text via Images to VLMs

Embed adversarial text in images that vision-language models read and follow as instructions.

labsvisiontext-injectionimagesintermediate

Intermediate

Lab: Data Exfiltration via Webhooks

Test data exfiltration paths through LLM-triggered webhooks, API callbacks, and external URL rendering.

labswebhookexfiltrationintermediate

Intermediate

Simulation: Startup AI Assessment

Red team a startup's AI-powered product with limited scope and budget, making pragmatic tradeoffs between thoroughness and time constraints.

simulationstartupassessmentbudget-constrainedintermediate

Intermediate

Code Injection via Markdown

Injecting executable payloads through markdown rendering in LLM outputs, exploiting the gap between text generation and content rendering in web-based LLM interfaces.

prompt-injectionmarkdowncode-injectionxssred-teamingintermediate

Intermediate

Context Window Stuffing

Techniques for filling the LLM context window to push system instructions out of active memory, manipulating token budgets to dilute or displace defensive prompts.

prompt-injectioncontext-windowtoken-manipulationred-teamingintermediate

Intermediate

Delimiter Escape Attacks

Techniques for escaping delimiters used to separate system and user content in LLM applications, breaking out of sandboxed input regions to inject instructions.

prompt-injectiondelimiter-escapesandbox-escapered-teamingintermediate

Intermediate

Encoding-Based Evasion

Using base64, ROT13, hexadecimal, Unicode, and other encoding schemes to evade input detection systems and bypass content filters in LLM applications.

prompt-injectionencodingbase64rot13unicodeevasionred-teamingintermediate

Intermediate

Few-Shot Injection

Using crafted few-shot examples within user input to steer LLM behavior toward unintended outputs, exploiting in-context learning to override safety training.

prompt-injectionfew-shotin-context-learningred-teamingintermediate

Intermediate

Payload Obfuscation Techniques

Methods for disguising prompt injection payloads through encoding, splitting, substitution, and other obfuscation techniques to bypass input filters and detection systems.

prompt-injectionobfuscationevasionpayload-craftingred-teamingintermediate

Intermediate

Role-Play Injection

Using fictional scenarios, character role-play, and narrative framing to bypass LLM safety filters by having the model operate within a permissive fictional context.

prompt-injectionrole-playjailbreakfictional-framingred-teamingintermediate

Intermediate

Translation Injection

Using translation requests and low-resource languages to bypass content filters, exploiting the uneven distribution of safety training across languages.

prompt-injectiontranslationmultilinguallow-resource-languagesred-teamingintermediate

Intermediate

Virtual Persona Creation

Creating persistent alternate personas within LLM conversations to bypass safety training, establishing character identities that override the model's default behavioral constraints.

prompt-injectionpersonajailbreakDANcharacter-creationred-teamingintermediate

Intermediate