# advanced
162 articlestagged with “advanced”
Manipulating Reasoning Chains
Techniques for influencing an AI agent's chain-of-thought reasoning to steer its planning, decision-making, and tool selection toward attacker-desired outcomes.
Attacking Multi-Agent Systems
Exploitation techniques for multi-agent architectures including inter-agent injection, trust boundary violations, cascading compromises, and A2A protocol attacks.
Advanced Practice Exam
25-question practice exam covering advanced AI red team techniques: multimodal attacks, training pipeline exploitation, agentic system attacks, embedding manipulation, and fine-tuning security.
Practice Exam 2: Advanced AI Security
25-question advanced practice exam covering multimodal attacks, training pipeline security, cloud AI security, forensics, and governance.
Advanced AI Security Practice Exam 1
Advanced practice exam covering agentic exploitation, training attacks, and frontier research.
Advanced AI Security Practice Exam 2
Second advanced practice exam focusing on multimodal, cloud, and pipeline security.
Agentic AI Security Practice Exam 2
Advanced practice exam covering agent memory poisoning, workflow exploitation, and A2A protocol attacks.
Comprehensive Advanced Practice Exam
Full-spectrum advanced practice exam covering all major AI red teaming domains.
Advanced Defense Practice Exam
Practice exam on guardrail architecture, constitutional AI defenses, and behavioral monitoring implementation.
Advanced Governance Practice Exam (Assessment)
Advanced governance practice exam covering international regulation, audit methodologies, and organizational frameworks.
Advanced Prompt Injection Practice Exam
Practice exam on indirect injection, multi-step chains, encoding bypass, and semantic manipulation.
Advanced Cloud AI Security Assessment
15-question advanced assessment covering cloud AI attack surfaces across AWS, Azure, and GCP: guardrail bypass, knowledge base exploitation, managed identity abuse, model customization risks, and multi-cloud attack paths.
Advanced Tool Proficiency Assessment
Advanced assessment on Garak, PyRIT, HarmBench, and custom tool development proficiency.
Advanced Tradecraft Assessment
Assessment covering OPSEC, evasion, persistence, attribution avoidance, and professional red team operations.
Advanced Agentic Exploitation Assessment
Advanced assessment covering MCP exploitation chains, multi-agent attacks, and A2A protocol injection.
Advanced Cloud AI Security Assessment (Assessment)
Advanced assessment on multi-cloud AI security, IAM misconfigurations, and endpoint hardening.
Advanced Code Generation Security Assessment
Advanced assessment on autonomous coding agents, sandbox escapes, and supply chain attacks.
Advanced Embedding Security Assessment
Advanced assessment on embedding inversion, vector DB attacks, and multimodal embedding exploitation.
Advanced AI Forensics Assessment
Advanced assessment on model behavior forensics, training data breach analysis, and attack attribution.
Advanced Multimodal Attacks Assessment
Advanced assessment covering cross-modal attacks, steganographic injection, and multimodal defense bypass.
Advanced Training Pipeline Assessment
Advanced assessment on RLHF exploitation, DPO vulnerabilities, and federated learning attacks.
Advanced Certification Prep Guide
Study guide for advanced certification covering complex topics and hands-on preparation strategies.
Advanced Topics Study Guide
Study guide covering AI security research techniques, automation, forensics, emerging attack vectors, and tool development for advanced practitioners.
Capstone: Cloud AI Security Assessment
Assess AI deployment security across AWS, Azure, and GCP cloud platforms, producing a comprehensive cloud AI security assessment report.
Capstone: Defense System Implementation
Build a complete AI defense stack with input filtering, output monitoring, guardrails, rate limiting, and logging, then evaluate it against automated attacks.
Capstone: AI Incident Response Exercise
Respond to a simulated AI security incident through triage, investigation, containment, remediation, and post-mortem reporting.
Capstone: Open Source Contribution
Contribute to an open-source AI security project such as garak, PyRIT, or MITRE ATLAS, producing a merged PR or submitted issue with proof of concept.
Capstone: Training Pipeline Attack & Defense
Attack a model training pipeline through data poisoning and backdoor insertion, then build defenses to detect and prevent these attacks.
Capstone: Build an AI Security Scanner
Design and implement an automated AI security testing tool that supports prompt injection detection, jailbreak testing, and output analysis.
Fall 2026 CTF: Advanced Attack Research
An expert-level CTF requiring novel technique development, research-grade problem solving, and original vulnerability discovery in AI systems.
Seasonal CTF: Summer 2026 Advanced Threats
Summer 2026 CTF focused on advanced threat scenarios including supply chain attacks, multi-model exploitation, and infrastructure compromise.
Advanced Training Data Extraction
Advanced techniques for extracting memorized training data from language models.
Gradient-Based Attacks Explained
How gradients are used to craft adversarial inputs for LLMs — FGSM, PGD, and GCG attacks explained with accessible math and practical examples.
Scaling Laws, Emergence & Capability Jumps
How scaling laws predict model performance, why emergent capabilities create unpredictable security properties, and what sleeper capabilities and emergent misalignment mean for red teaming.
Advanced AI Red Team Labs
Advanced hands-on labs covering automated attack algorithms, production red team tooling, multi-model testing, and sophisticated exploitation techniques for experienced AI security practitioners.
Advanced RAG Manipulation Lab
Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.
Adversarial Robustness Evaluation
Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.
Lab: Alignment Boundary Probing
Systematically probe alignment boundaries to map the exact thresholds where safety training engages and disengages.
Lab: Attack Chain Construction
Hands-on lab for chaining three or more distinct vulnerabilities into a complete exploit sequence that achieves objectives impossible with any single technique alone.
Lab: Audio Adversarial Examples
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
AutoDAN Implementation Lab
Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.
Automated Defense Evaluation Framework
Build an automated framework to evaluate defensive measures across attack categories.
Lab: Building an Automated Red Team Pipeline
Build a complete automated red teaming pipeline with attack generation, execution, scoring, and reporting.
Lab: Backdoor Persistence Through Safety Training
Test whether fine-tuned backdoors persist through subsequent safety training rounds and RLHF alignment.
Lab: Cloud AI Security Assessment
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
Lab: Cloud AI Assessment
Hands-on lab for conducting an end-to-end security assessment of a cloud-deployed AI system including infrastructure review, API testing, model security evaluation, and data flow analysis.
Lab: Constitutional AI Bypass Techniques
Test and bypass Constitutional AI safety mechanisms by exploiting the critique-revision training methodology.
Lab: Cross-Lingual Transfer Attacks
Develop attacks in one language that transfer to others by exploiting shared multilingual representation spaces.
Cross-Model Transfer Attacks
Develop attacks on open-source models that transfer to closed-source commercial APIs.
Cross-Plugin Exploitation Chain
Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.
Lab: Training Curriculum Poisoning
Exploit training data ordering and curriculum learning to amplify the impact of small numbers of poisoned examples.
Custom Safety Classifier Training
Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.
Lab: Custom Test Harness for Specific Applications
Build a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
Defense vs Red Team War Game
Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.
Differential Testing Across Models
Use differential testing to find behavior inconsistencies across model providers.
Lab: Distributed Training Attack Simulation
Simulate attacks on distributed training infrastructure including gradient poisoning and aggregation manipulation.
Embedding Adversarial Perturbation
Craft adversarial inputs that produce target embeddings for retrieval manipulation.
Lab: Evaluation Framework Gaming
Demonstrate how to game safety evaluation frameworks to produce artificially high safety scores while retaining vulnerabilities.
Lab: Federated Learning Poisoning Attacks
Execute model poisoning attacks in a federated learning simulation by manipulating local model updates.
Fine-Tuning Backdoor Insertion
Insert a triggered backdoor during fine-tuning that activates on specific input patterns.
Lab: Gradient-Based Attack Implementation
Implement gradient-based adversarial attacks on open-weight models including PGD, FGSM adapted for text inputs.
Gradient-Guided Data Poisoning
Use gradient information from open-source models to craft optimally poisoned training examples.
Guardrail Bypass Ensemble
Build an ensemble attack system that combines multiple bypass techniques for robustness.
Lab: Inference-Time Compute Exploitation
Exploit inference-time compute scaling to manipulate reasoning depth and resource consumption in thinking models.
Lab: Jailbreak Transferability Analysis
Analyze jailbreak transferability across model families to discover universal vulnerability patterns.
Lab: MCP Full Exploitation
Hands-on lab for conducting a complete Model Context Protocol server compromise including tool enumeration, permission escalation, cross-tool attacks, and data exfiltration through MCP channels.
Multi-Hop MCP Server Exploitation
Chain exploits across multiple MCP servers to achieve lateral movement in agent systems.
Lab: Long-Term Memory Extraction
Extract information from long-term agent memory stores through crafted queries and memory retrieval manipulation.
Model Distillation Security Lab
Extract model capabilities through distillation techniques using only black-box API access.
Lab: Model Merging Security Analysis
Analyze security implications of model merging techniques and test for backdoor propagation through merged model weights.
Lab: Model Steering with Activation Vectors
Use activation steering vectors to control model behavior without prompt modification for security testing.
Lab: Multi-Modal Attack Chain Construction
Construct attack chains that span text, image, and audio modalities to exploit cross-modal processing gaps.
Multi-Objective Attack Optimization
Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.
Lab: Advanced Multi-Turn Attacks
Hands-on lab for executing 20+ turn conversation steering strategies that gradually erode AI safety boundaries to extract protected information through patient, escalating dialogue.
Multi-Turn Crescendo Attack
Implement Microsoft's Crescendo attack pattern for gradual multi-turn jailbreaking.
Multimodal Attack Chain Lab
Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.
Lab: Neural Network Trojan Detection
Implement and test neural network trojan detection methods including activation clustering and spectral analysis.
Output Constraint Exploitation
Exploit structured output requirements to force models into revealing restricted information.
Production Environment Simulation Lab
Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.
Lab: Building a Prompt Injection Compiler
Build a compiler that transforms high-level attack specifications into optimized prompt injection payloads.
Prompt Leakage via Log Probabilities
Extract system prompt information using token log probability analysis.
Lab: Purple Team Exercise
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
Lab: Purple Team Exercise (Advanced Lab)
Hands-on lab for conducting simultaneous attack and defense operations against an AI system with real-time metrics tracking, adaptive defense deployment, and coordinated red-blue team workflows.
Lab: Quantization Security Testing
Test behavioral differences between full-precision and quantized models to discover quantization-induced vulnerabilities.
Advanced RAG Poisoning Techniques
Execute sophisticated RAG poisoning including gradient-guided document crafting.
Lab: RAG Full Chain Attack
Hands-on lab for executing a complete RAG attack chain from document injection through retrieval manipulation to data exfiltration, targeting every stage of the Retrieval-Augmented Generation pipeline.
Lab: Reasoning Trace Manipulation
Manipulate chain-of-thought reasoning traces to inject false premises and redirect model conclusions.
Lab: Advanced Red Team Orchestration
Build an advanced red team orchestration system that coordinates multiple attack agents against a defended target.
Lab: Representation Engineering for Security
Use representation engineering to analyze and manipulate internal model representations for security research.
Representation Probing for Vulnerabilities
Probe internal model representations to identify exploitable features and develop representation-level attacks.
Lab: Advanced Reward Hacking Techniques
Develop reward hacking exploits that manipulate RLHF reward models through adversarial optimization.
Reward Model Analysis Lab
Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.
Reward Model Gaming (Advanced Lab)
Craft inputs that score highly on reward models while containing adversarial content.
Lab: RLHF Training Exploitation
Exploit reward model weaknesses to craft inputs that score highly on safety metrics while achieving adversarial objectives.
Lab: Safety Classifier Model Extraction
Extract the decision boundary of safety classifiers through systematic probing to craft maximally evasive payloads.
Safety Classifier Inversion
Reverse-engineer safety classifier decision boundaries through systematic probing.
Lab: Sparse Attention Exploitation
Exploit sparse attention patterns in long-context models to hide injection payloads in low-attention regions.
Lab: Model Supply Chain Poisoning
Simulate model supply chain attacks by injecting backdoors into model weights distributed through public registries.
Lab: Token-Level Adversarial Optimization
Implement token-level adversarial optimization to discover minimal perturbations that bypass safety training.
Lab: Tool Chain Lateral Movement
Chain tool call exploits to achieve lateral movement across services connected to LLM agent systems.
Lab: Transfer Attack Development
Hands-on lab for crafting adversarial prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.
Lab: Transfer Attack Development (Advanced Lab)
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
A2A Protocol Task Injection
Inject malicious tasks into Agent-to-Agent protocol communication channels to redirect multi-agent workflows.
Agent Workflow State Tampering
Manipulate agent workflow state machines to skip validation steps and reach privileged execution paths.
Attention Pattern Manipulation
Craft inputs that manipulate transformer attention patterns to prioritize adversarial content over safety instructions.
Audio Injection via Speech-to-Text Models
Craft adversarial audio that embeds prompt injection payloads when transcribed by speech-to-text models.
AutoDAN Implementation and Testing
Implement the AutoDAN methodology for generating human-readable stealthy jailbreak prompts using gradient guidance.
Constitutional Classifier Bypass
Develop techniques to bypass Anthropic-style constitutional classifiers through adversarial input crafting.
Cross-Model GCG Transfer Attacks
Generate adversarial suffixes on open-source models and test their transferability to commercial APIs.
Building a Red Team Results Dashboard
Build a real-time dashboard for tracking and visualizing red team campaign results across targets and techniques.
Differential Model Behavior Analysis
Use differential testing across model versions and providers to discover inconsistent safety behaviors.
Embedding Inversion Attack Implementation
Implement embedding inversion to recover original text from vector database embeddings.
Fine-Tuning Alignment Removal Attack
Use fine-tuning API access to systematically remove safety alignment with minimal training examples.
GUI Agent Visual Injection Attack
Inject adversarial content into screenshots and UI elements processed by computer-use AI agents.
HarmBench Custom Attack Submission
Develop and evaluate custom attack methods against the HarmBench standardized evaluation framework.
LangChain CVE Exploitation Lab
Reproduce and analyze LangChain CVEs including CVE-2023-29374 and CVE-2023-36258 in a safe lab environment.
LoRA Backdoor Insertion Attack
Insert triggered backdoors through LoRA fine-tuning that activate on specific input patterns while passing safety evals.
MCP Multi-Server Pivot Chain
Chain exploits across multiple MCP servers to achieve lateral movement and capability escalation in agent systems.
Agent Memory Injection for Persistent Access
Inject persistent instructions into agent memory systems that survive across conversation sessions.
Multi-Agent Trust Boundary Exploitation
Exploit trust boundaries between cooperating agents to escalate privileges and access restricted capabilities.
Multi-Modal Attack Chain Orchestration
Orchestrate attacks across text, image, and document modalities to bypass per-modality safety filters.
Promptfoo Red Team Test Suite Development
Build comprehensive red team test suites in Promptfoo with custom graders and multi-model targeting.
Quantization-Induced Safety Regression Testing
Test how model quantization (INT8, INT4, GPTQ) degrades safety alignment and introduces exploitable gaps.
RAG Access Control Bypass
Bypass document-level access controls in enterprise RAG systems through query manipulation and context injection.
Reasoning Trace Exploitation in CoT Models
Exploit visible chain-of-thought reasoning traces in models like o1 and DeepSeek-R1 to manipulate outputs.
Representation Probing for Vulnerability Discovery
Probe model internal representations to discover exploitable features and latent vulnerability patterns.
Adversarial Reward Model Exploitation
Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.
Safety Classifier Reverse Engineering
Reverse-engineer a safety classifier's decision boundaries through systematic adversarial probing.
Code Agent Sandbox Escape Techniques
Develop and test sandbox escape techniques against code execution environments in AI coding assistants.
Steganographic Payload Embedding in Images
Hide prompt injection payloads in images using steganographic techniques undetectable to human observers.
Training Data Extraction from Production LLMs
Implement Carlini et al.'s techniques to extract memorized training data from production language model APIs.
LLM Watermark Detection and Removal
Detect and remove statistical watermarks from LLM-generated text while preserving content quality.
Lab: LLM Watermark Detection and Removal
Detect and analyze LLM text watermarks using statistical methods and test watermark removal through paraphrasing.
CTF: Cloud AI Heist
Extract secrets from a cloud-deployed AI application by exploiting misconfigurations, SSRF, metadata endpoints, and model-level vulnerabilities in a realistic cloud environment.
CTF: Defense Breaker
Bypass progressively sophisticated defense mechanisms protecting an AI system. Each level introduces a new defense layer while retaining all previous defenses, testing your ability to adapt and chain bypass techniques.
CTF: Fine-Tune Detective
Detect backdoors in fine-tuned language models through behavioral analysis, weight inspection, and activation pattern examination. Practice the forensic techniques needed to identify compromised models before deployment.
CTF: AI Forensics Investigation
Analyze logs, model outputs, and system artifacts to reconstruct an AI security incident. Develop forensic analysis skills for AI-specific attack patterns, data exfiltration traces, and adversarial prompt detection.
CTF: Supply Chain Attack
Find and exploit vulnerabilities in an ML supply chain including compromised dependencies, poisoned models, backdoored training data, and malicious model files. Practice ML-specific supply chain security assessment.
Advanced Encoding Chain Attacks
Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
Lab: Advanced Function Calling Exploitation
Exploit advanced function calling patterns including nested calls, parallel execution, and schema manipulation.
Lab: Advanced RAG Poisoning
Hands-on lab for crafting documents that reliably get retrieved and influence RAG responses for specific target queries.
Advanced Token Smuggling
Use Unicode homoglyphs, zero-width characters, and tokenizer quirks to smuggle payloads.
Simulation: AI Bug Bounty
Find and report vulnerabilities in a simulated AI bug bounty program, practicing professional vulnerability disclosure and bounty-eligible reporting.
Simulation: Red vs Blue
Competitive exercise where teams alternate between attacking and defending an AI application, scoring points for successful attacks and effective defenses.
Cross-Context Injection (Prompt Injection)
Prompt injection techniques that persist across context boundaries: surviving conversation resets, session switches, memory boundaries, and multi-agent handoffs.
Defense Evasion
Advanced techniques for bypassing safety filters, content classifiers, guardrails, and detection systems deployed to protect LLM applications.
Continuous Red Teaming Programs
Designing and operating ongoing AI red team programs with automated testing pipelines, metrics dashboards, KPI frameworks, alert-driven assessments, and integration with CI/CD and model deployment workflows.
Purple Teaming for AI
Collaborative attack-defense exercises for AI systems: structuring purple team engagements, real-time knowledge transfer, joint attack simulation, and measuring defensive improvement through iterative testing.
Advanced Training Attack Vectors
Cutting-edge training attacks: federated learning poisoning, model merging exploits, distributed training vulnerabilities, emergent capability risks, and synthetic data pipeline attacks.
Composite Attack Chaining
Combining multiple prompt injection techniques into compound attacks that defeat layered defenses, building attack chains that leverage the strengths of each individual technique.
Advanced Context Manipulation Walkthrough
Manipulate conversation context through history injection, context overflow, and attention weight exploitation.
Instruction Hierarchy Bypass
Advanced techniques to bypass instruction priority and hierarchy enforcement in language models, exploiting conflicts between system, user, and assistant-level directives.
Multi-Turn Progressive Injection
Gradually escalating prompt injection across conversation turns to build compliance, using psychological techniques like foot-in-the-door and norm erosion.
Advanced Prompt Leaking Walkthrough
Advanced techniques for extracting system prompts including iterative reconstruction and side-channel methods.
Recursive Injection Chains
Creating self-reinforcing injection chains that amplify across conversation turns, building compound prompts where each step strengthens the next injection's effectiveness.
Instruction Hierarchy Enforcement (Defense Walkthrough)
Step-by-step walkthrough for enforcing instruction priority in LLM applications, ensuring system-level instructions always take precedence over user inputs through privilege separation, instruction tagging, and validation layers.
NeMo Guardrails Advanced Configuration
Advanced NeMo Guardrails configuration including custom actions, knowledge base integration, and multi-model pipelines.
Garak Advanced Configuration Guide
Deep dive into Garak's configuration system including custom probes, detectors, and generators.
Writing Custom Garak Probes for Novel Attack Vectors
Advanced walkthrough for building custom Garak probes that target novel and emerging attack vectors, covering probe architecture, payload generation, detector pairing, and integration into automated scanning pipelines.
Building Custom Garak Detectors
Advanced walkthrough on creating custom garak detectors for specific success criteria, including regex-based detectors, ML-based classifiers, multi-signal scoring, and integration with external evaluation services.
Writing Garak Generator Plugins for Custom API Targets
Advanced walkthrough on writing garak generator plugins to connect to custom API endpoints, proprietary model servers, and non-standard inference interfaces for vulnerability scanning.
Setting Up Garak Probes for MCP Tool Interactions
Advanced walkthrough on configuring garak probes that target Model Context Protocol (MCP) tool interactions, testing for tool misuse, privilege escalation through tools, and data exfiltration via tool calls.