# intermediate
155 articlestagged with “intermediate”
Intermediate Certification Prep Guide
Comprehensive study guide for intermediate-level certification preparation.
Capstone: Compliance Assessment Simulation
Conduct a simulated compliance assessment against EU AI Act, NIST AI RMF, and ISO 42001, producing a comprehensive gap analysis report.
Capstone: Red Team Program Design
Design a complete AI red team program for a fictional enterprise, producing a comprehensive program charter document.
Capstone: Industry Vertical Deep Dive
Select an industry vertical, threat model the AI systems within it, and produce a sector-specific AI security testing guide.
LLM Guard and Protect AI Guardian
Input/output scanning, PII detection, toxicity filtering, integration patterns, and bypass techniques for LLM Guard and the Protect AI Guardian ecosystem.
NVIDIA NeMo Guardrails
Architecture, configuration, Colang programming, integration patterns, and bypass techniques for NVIDIA's open-source NeMo Guardrails framework.
AI Anomaly Detection
Detecting jailbreak attempts, unusual usage patterns, output drift, and embedding space anomalies in AI systems through statistical and ML-based methods.
AI Monitoring and Observability
What to monitor in AI systems, key metrics for detecting abuse and drift, alerting strategies, and observability architecture for LLM applications.
AI Logging Architecture
What to capture in AI system logs — prompts, completions, latency, tokens, tool calls — along with storage strategies, retention policies, and privacy considerations.
Adversarial ML: Core Concepts
History and fundamentals of adversarial machine learning — perturbation attacks, evasion vs poisoning, robustness — bridging classical adversarial ML to LLM-specific attacks.
AI Threat Models: White-box, Black-box & Grey-box
Access levels in AI security testing — what's possible at each level, realistic scenarios, and comparison to traditional security threat modeling.
Embeddings & Vector Spaces for Red Teamers
Understand how embeddings encode semantic meaning, how vector operations work, and why red teamers need to understand embedding spaces for RAG attacks and similarity exploitation.
Lab: Exploring Embedding Spaces
Hands-on lab using Python to visualize embedding spaces, measure semantic similarity, and demonstrate how adversarial documents can be crafted to match target queries.
RAG Architecture: How Retrieval Systems Work
End-to-end anatomy of a Retrieval-Augmented Generation pipeline — document ingestion, chunking, embedding, indexing, retrieval, context assembly, and generation — with attack surface analysis at each stage.
Semantic Similarity & Vector Search Attacks
How cosine similarity and nearest neighbor search work, and how attackers exploit vector databases through adversarial documents, embedding collisions, and threshold bypasses.
Inference: Sampling, Temperature & Generation
How LLMs generate text at inference time — greedy decoding, top-k, top-p, temperature — and how these parameters affect attack success rates.
Tokenization & Its Security Implications
How BPE and SentencePiece tokenizers work, and how tokenizer behavior creates exploitable attack surfaces including boundary attacks, homoglyphs, and encoding tricks.
Pre-training → Fine-tuning → RLHF Pipeline
Understand the three stages of creating an aligned LLM — pre-training, supervised fine-tuning, and RLHF/DPO — and the security implications at each stage.
Transformer Architecture for Attackers
Deep dive into the transformer architecture — attention, feed-forward layers, and residual connections — through the lens of which components are exploitable.
Agent Architectures & Tool Use Patterns
How ReAct, Plan-and-Execute, and LangGraph agent patterns work — tool definition, invocation, and result processing — and where injection happens in each architecture.
Common AI Deployment Patterns & Attack Surfaces
Analysis of deployment patterns — chatbots, copilots, autonomous agents, batch processing, and fine-tuned models — with their unique attack surfaces and security considerations.
Lab: Mapping an AI System's Attack Surface
Hands-on lab walking through reconnaissance of an AI system — identifying components, mapping data flows, enumerating tools, and documenting the attack surface.
CTF: Code Gen Exploit
Manipulate AI code generation to produce vulnerable, backdoored, or malicious code. Explore how prompt manipulation influences code security, from subtle vulnerability injection to full backdoor insertion.
CTF: RAG Heist
Extract sensitive information from a Retrieval-Augmented Generation system by exploiting retrieval mechanisms, document parsing, embedding manipulation, and context window management vulnerabilities.
Intermediate AI Red Team Labs
Overview of intermediate-level hands-on labs covering agent exploitation, RAG poisoning, multi-turn attacks, automated campaigns, and advanced injection techniques for AI red teaming.
Advanced Encoding Chain Attacks
Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
Lab: Advanced Function Calling Exploitation
Exploit advanced function calling patterns including nested calls, parallel execution, and schema manipulation.
Lab: Automated Adversarial Prompt Generation
Build automated systems for generating adversarial prompts using mutation, crossover, and selection strategies.
Agent Memory Manipulation
Exploit persistent memory in LLM agents to plant false context that persists across sessions.
Lab: Agent Prompt Leaking
Hands-on lab for extracting system prompts from tool-using AI agents by exploiting tool interactions, observation channels, and multi-step reasoning to leak protected instructions.
Lab: Agent Workflow Security Testing
Test multi-step agent workflows for injection points, state manipulation, and unauthorized action execution.
API Abuse Pattern Lab
Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.
API Parameter Fuzzing for LLMs
Systematically fuzz LLM API parameters to discover unexpected behaviors and bypasses.
Lab: API Versioning Security Gaps
Exploit version differences in LLM APIs to find deprecated endpoints and backward-compatible safety bypasses.
Lab: Automated Red Team Pipeline
Hands-on lab for building a continuous AI red team testing pipeline using promptfoo, GitHub Actions, and automated attack generation to catch safety regressions before deployment.
Lab: Automated Red Team Testing
Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.
Lab: Canary Token Effectiveness Testing
Deploy and test canary tokens in system prompts to evaluate their effectiveness at detecting prompt extraction.
Lab: Chain-of-Thought Exploitation
Exploit chain-of-thought reasoning to leak internal model reasoning, bypass safety filters, and manipulate decision processes.
Lab: CI/CD Pipeline Security for LLM Apps
Test CI/CD pipeline security for LLM applications including prompt template injection, model artifact tampering, and deployment attacks.
Lab: Advanced Classifier Evasion
Evade advanced input classifiers using adversarial perturbation, semantic paraphrasing, and gradient-free optimization.
ML Classifier Evasion Lab
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
Lab: Cloud AI Platform Security Testing
Test cloud AI platform configurations including AWS Bedrock guardrails, Azure content filters, and GCP safety settings.
Lab: Code Generation Security Testing
Test LLM code generation for insecure patterns, injection vulnerabilities, and code execution safety issues.
Lab: AI Compliance Testing Fundamentals
Test LLM applications against OWASP LLM Top 10 and NIST AI RMF requirements using structured test procedures.
Lab: Context Window Attack Optimization
Optimize prompt injection placement within the context window to maximize attack effectiveness using attention dynamics.
Context Window Poisoning Lab
Exploit context window management to inject persistent adversarial content that influences future model responses.
Conversation Splitting Attack Lab
Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.
Conversation State Poisoning
Exploit conversation history manipulation to alter model behavior across turns.
Cross-Context Injection
Inject prompts that persist across separate conversation contexts in shared deployments.
Custom Jailbreak Wordlist Generation
Generate targeted jailbreak wordlists using mutation strategies and seed prompts.
Lab: Data Exfiltration Channels
Hands-on lab for extracting data from AI systems through markdown image rendering, invisible links, tool call parameters, and other covert exfiltration channels.
Lab: Data Exfiltration Channels (Intermediate Lab)
Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.
Data Exfiltration via Side Channels
Extract data from LLM systems using timing, token probability, and formatting side channels.
Building Defense Bypass Chains
Combine multiple bypass techniques into reliable chains that defeat layered defenses.
Lab: Defense Effectiveness Measurement
Hands-on lab for quantifying AI guardrail robustness using attack success rates, evasion metrics, false positive rates, and statistical analysis of defense performance.
Lab: Defense Regression Testing Setup
Build a regression testing framework to continuously verify that LLM defenses remain effective against known attack patterns.
Lab: Defense Effectiveness Testing
Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.
Document-Based RAG Injection Lab
Inject adversarial content into documents that will be processed by a RAG system to influence model responses.
Lab: Advanced Embedding Manipulation
Manipulate embedding vectors to achieve target similarity scores for RAG poisoning and retrieval manipulation.
Practical Embedding Manipulation
Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.
Embedding Similarity Attacks
Manipulate text to achieve target embedding similarity scores for retrieval poisoning.
Lab: Encoding Bypass Lab
Hands-on lab for systematically testing Base64, ROT13, Unicode homoglyph, hexadecimal, and combined encoding bypasses against LLM safety filters with measured success rates.
Fine-Tuning API Security Probing
Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.
Lab: Fine-Tuning Safety Impact Testing
Measure how fine-tuning affects model safety by comparing pre and post fine-tuning safety benchmark scores.
Lab: Guardrail Bypass Technique Laboratory
Practice guardrail bypass techniques against NeMo Guardrails, LLM Guard, and custom classifier-based defenses.
Guardrail Fingerprinting
Systematically map the rules and thresholds of input/output guardrail systems.
Guardrail Rule Extraction Lab
Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.
Lab: Exploiting LLM Hallucinations
Exploit hallucination tendencies to trigger fabricated tool calls, invented API endpoints, and false fact injection.
Lab: AI Incident Response Simulation
Practice AI incident response procedures through a simulated prompt injection incident with escalation and containment.
Lab: Indirect Prompt Injection
Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.
Indirect Injection via Web Content
Plant prompt injection payloads in web pages consumed by RAG-enabled LLM applications.
Lab: Instruction Following Measurement
Quantitatively measure instruction following compliance to identify where models prioritize competing instructions.
LLM Judge Manipulation
Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.
Lab: Log Probability Analysis for Attacks
Use token log probabilities to extract information about training data, system prompts, and model internals.
Lab: Advanced MCP Protocol Testing
Perform advanced security testing of MCP servers including transport analysis, capability enumeration, and tool poisoning.
MCP Tool Poisoning Attack
Exploit MCP tool descriptions to inject instructions that redirect agent behavior.
Lab: Model Context Protocol Fuzzing
Fuzz MCP protocol messages to discover parsing errors, validation gaps, and unexpected behavior in MCP servers.
Practical Model Fingerprinting Lab
Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.
Lab: LLM Security Monitoring Setup
Deploy a comprehensive security monitoring system for LLM applications with anomaly detection and alerting.
Lab: Multi-Agent Communication Testing
Test security of inter-agent communication in multi-agent systems including message injection and trust boundary violations.
Multi-Language Safety Bypass Lab
Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.
Multi-Language Injection Attacks
Exploit language switching and low-resource language gaps to bypass safety training.
Multi-Model Attack Chaining
Chain attacks across multiple LLM models in a pipeline to bypass per-model defenses.
Lab: Multi-Model Comparison Security Testing
Compare security postures across multiple LLM providers by running identical attack suites and analyzing differential responses.
Multi-Model Pipeline Attack Lab
Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.
Lab: Multi-Turn State Tracking Attacks
Exploit conversation state tracking to build persistent attack context across multiple conversation turns.
Lab: Multimodal Injection
Hands-on lab for embedding text instructions in images to exploit vision-enabled LLMs. Learn to craft visual prompt injections, test OCR-based attacks, and evaluate multimodal safety boundaries.
Lab: Multimodal Injection (Intermediate Lab)
Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.
Lab: Intermediate Multimodal Security Testing
Test multimodal LLMs with crafted images containing embedded text, adversarial perturbations, and visual injection payloads.
Output Channel Discovery Lab
Discover alternative output channels in LLM applications that bypass standard output filtering.
Lab: Output Schema Exploitation
Exploit structured output schemas to force models into revealing information through required field population.
Lab: Prompt Caching Side-Channel Attacks
Exploit prompt caching mechanisms to detect cached prompts and extract information through timing side channels.
Prompt Leakage Technique Lab
Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.
Automated Red Teaming with PyRIT
Use Microsoft's PyRIT framework to run automated multi-turn attack campaigns.
Lab: Advanced RAG Security Testing
Test RAG systems for chunking exploitation, reranking manipulation, and cross-document injection attacks.
RAG Context Poisoning
Poison a vector database to inject adversarial content into RAG retrieval results.
Reasoning Chain Hijacking
Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.
Lab: Role Hierarchy Exploitation
Exploit multi-role LLM architectures by manipulating the instruction hierarchy between system, user, and assistant roles.
Lab: Safety Filter Benchmarking
Benchmark safety filters across providers using standardized test suites to compare detection rates and false positives.
Semantic Deception Crafting Lab
Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.
Semantic Injection Crafting
Craft semantically coherent injections that evade both classifiers and human review.
Lab: Semantic Search Manipulation
Manipulate semantic search results by crafting documents that achieve target similarity scores against specific queries.
Lab: Session Isolation Testing
Test session isolation in multi-tenant LLM deployments to detect cross-session data leakage and context contamination.
Session State Exploitation Lab
Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.
Lab: Streaming Response Manipulation
Exploit streaming response mechanisms to bypass output filters that only check complete responses.
Structured Output Safety Bypass
Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.
Structured Output Exploitation
Exploit JSON mode and structured output constraints to bypass safety filters.
Lab: Supply Chain Audit
Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.
Lab: ML Supply Chain Scan
Hands-on lab for auditing machine learning model dependencies, detecting malicious packages in ML pipelines, and scanning model files for backdoors and supply chain threats.
Lab: System Prompt Reconstruction (Intermediate Lab)
Reconstruct full system prompts through iterative partial extraction and cross-referencing multiple leakage vectors.
Advanced Token Smuggling
Use Unicode homoglyphs, zero-width characters, and tokenizer quirks to smuggle payloads.
Lab: Tool Result Injection Attacks
Inject adversarial content through tool call results to poison model reasoning and redirect subsequent actions.
Lab: Unicode Normalization Bypass Attacks
Exploit Unicode normalization differences between input validation and model processing to smuggle injection payloads.
Lab: Vector Database Access Control Testing
Test vector database access controls for bypass vulnerabilities including namespace traversal and filter manipulation.
Agent Goal Hijacking
Redirect an AI agent's objectives through carefully crafted inputs that override its primary task.
Assistant Prefill Injection Attacks
Exploit assistant message prefilling to prime model responses and bypass safety alignment.
Chat History Manipulation Attacks
Inject false messages into conversation history to alter model behavior in subsequent turns.
Input Classifier Boundary Mapping
Systematically probe input classifiers to map their decision boundaries and find bypass regions.
Context Window Boundary Exploitation
Exploit the boundaries of context windows to push safety instructions beyond the model's attention.
Crescendo Attack Implementation
Implement Microsoft's Crescendo multi-turn escalation attack with automated conversation management.
Cross-Plugin Exploitation Chaining
Chain exploits across multiple plugins/tools in an agent system to achieve multi-step attacks.
Function Schema Manipulation
Exploit function calling schemas to inject instructions through parameter descriptions and enums.
Building Custom Garak Probes
Develop custom probe modules for NVIDIA Garak to test organization-specific vulnerability patterns.
Guardrail Latency-Based Detection
Use timing side channels to identify and characterize guardrail implementations in LLM applications.
Custom Inspect AI Safety Evaluations
Build custom safety evaluation tasks in Inspect AI to test application-specific vulnerability classes.
JailbreakBench Comparative Testing
Run standardized jailbreak benchmarks across multiple models and compare vulnerability profiles.
JSON Mode Safety Bypass
Exploit structured JSON output modes to bypass safety filters that only check plain-text responses.
LLM Guard Evasion Techniques
Develop evasion techniques against LLM Guard input scanners and output detectors.
Implementing Many-Shot Jailbreaking
Implement Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.
MCP Resource Data Exfiltration (Intermediate Lab)
Exploit MCP resource URIs to access and exfiltrate data beyond intended scope boundaries.
Multi-Language Safety Gap Exploitation
Systematically identify and exploit safety training gaps across low-resource languages.
Multi-Turn State Tracking Exploitation
Exploit state management in multi-turn conversations to gradually shift model behavior over many turns.
NeMo Guardrails Bypass Techniques
Test and bypass NVIDIA NeMo Guardrails configurations using known evasion techniques.
Output Filter Bypass via Encoding
Bypass output safety filters by instructing the model to encode responses in Base64, hex, or custom schemes.
PDF Document Injection for RAG Systems
Craft adversarial PDF documents that inject instructions when processed by RAG document loaders.
Prompt Template Variable Injection
Inject adversarial content through template variable expansion in prompt management systems.
PyRIT Custom Attack Strategies
Build custom attack strategies and converters in Microsoft's PyRIT framework for targeted campaigns.
RAG Document Injection Campaign
Design and execute a document injection campaign against a RAG-powered application with vector search.
Semantic Search Poisoning
Craft adversarial documents that rank highly in semantic search for targeted queries in RAG systems.
Streaming Response Exfiltration
Exploit streaming API responses to exfiltrate data that might be caught by batch output filters.
System Prompt Extraction via Token Probabilities
Exploit token probability APIs to reconstruct hidden system prompts character by character.
Token Boundary Manipulation
Exploit tokenizer-specific behavior by crafting inputs that split across token boundaries in unexpected ways.
Tool Result Injection Attacks
Craft malicious tool return values that inject instructions back into the model's reasoning chain.
Injecting Text via Images to VLMs
Embed adversarial text in images that vision-language models read and follow as instructions.
Lab: Data Exfiltration via Webhooks
Test data exfiltration paths through LLM-triggered webhooks, API callbacks, and external URL rendering.
Simulation: Startup AI Assessment
Red team a startup's AI-powered product with limited scope and budget, making pragmatic tradeoffs between thoroughness and time constraints.
Code Injection via Markdown
Injecting executable payloads through markdown rendering in LLM outputs, exploiting the gap between text generation and content rendering in web-based LLM interfaces.
Context Window Stuffing
Techniques for filling the LLM context window to push system instructions out of active memory, manipulating token budgets to dilute or displace defensive prompts.
Delimiter Escape Attacks
Techniques for escaping delimiters used to separate system and user content in LLM applications, breaking out of sandboxed input regions to inject instructions.
Encoding-Based Evasion
Using base64, ROT13, hexadecimal, Unicode, and other encoding schemes to evade input detection systems and bypass content filters in LLM applications.
Few-Shot Injection
Using crafted few-shot examples within user input to steer LLM behavior toward unintended outputs, exploiting in-context learning to override safety training.
Payload Obfuscation Techniques
Methods for disguising prompt injection payloads through encoding, splitting, substitution, and other obfuscation techniques to bypass input filters and detection systems.
Role-Play Injection
Using fictional scenarios, character role-play, and narrative framing to bypass LLM safety filters by having the model operate within a permissive fictional context.
Translation Injection
Using translation requests and low-resource languages to bypass content filters, exploiting the uneven distribution of safety training across languages.
Virtual Persona Creation
Creating persistent alternate personas within LLM conversations to bypass safety training, establishing character identities that override the model's default behavioral constraints.