# expert
69 articlestagged with “expert”
Practice Exam 3: Expert Red Team
25-question expert-level practice exam covering research techniques, automation, fine-tuning attacks, supply chain security, and incident response.
Expert Comprehensive Practice Exam 2
Second expert-level comprehensive exam covering frontier research, novel attacks, and advanced defense.
Expert Certification Practice Exam
Expert-level practice exam covering frontier research, novel attacks, and advanced defense architectures.
Expert Level Exam Preparation Guide
Advanced study guide for expert-level assessments covering frontier research and novel techniques.
Capstone: Agentic System Red Team
Red team a multi-agent system with MCP servers, function calling, and inter-agent communication, producing an attack tree and comprehensive findings report.
Capstone: Full Red Team Engagement
Scope, plan, execute, and report a complete AI red team engagement against a multi-component AI application including chatbot, RAG, agent, and API layers.
Capstone: Vulnerability Research Project
Identify and responsibly disclose a novel AI vulnerability class, producing an advisory-format writeup, proof of concept, and MITRE ATLAS mapping.
Fall 2026 CTF: Advanced Attack Research
An expert-level CTF requiring novel technique development, research-grade problem solving, and original vulnerability discovery in AI systems.
Lab: Federated Learning Poisoning Attack
Hands-on lab for understanding and simulating poisoning attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted gradient updates.
CTF: Agent Escalation
Five-flag CTF challenge focused on exploiting agentic AI systems. Progress from basic tool abuse through privilege escalation, indirect injection, memory poisoning, and multi-agent chain attacks.
CTF: Alignment Breaker
Break the alignment of a heavily defended model with multiple defense layers. Requires combining advanced techniques including adversarial suffixes, multi-turn manipulation, and novel jailbreak approaches.
CTF: Boss Rush
Chain attacks across multiple AI models in sequence. Each model guards the next, requiring different attack techniques at each stage. Defeat all five models to extract the final flag in this ultimate red teaming challenge.
Expert AI Red Team Labs
Advanced labs tackling cutting-edge AI security challenges including quantization exploits, reward hacking, agent exploitation, multi-agent attacks, and watermark removal.
Activation Steering for Exploitation
Use activation steering techniques to manipulate model behavior at the representation level without prompt modification.
Lab: Adversarial ML From Scratch
Hands-on expert lab for implementing gradient-based adversarial attacks against language models from scratch without frameworks, building intuition for how adversarial perturbations exploit model gradients.
Lab: Adversarial Suffix Optimization
Implement GCG-style adversarial suffix attacks that automatically discover token sequences causing language models to comply with harmful requests. Covers gradient-based optimization, transferability analysis, and defense evaluation.
Lab: Alignment Stress Testing
Push language model alignment to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure alignment degradation curves, and map the boundaries of model compliance.
Backdoor Detection Evasion
Insert backdoors into fine-tuned models that evade state-of-the-art detection methods.
Lab: Build Agent Security Scanner
Build an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific attack surfaces that traditional LLM testing misses.
Lab: Build an AI Fuzzer
Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.
Lab: Build Behavior Diff Tool
Build a tool that systematically compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
Lab: Build Guardrail Evaluator
Build an automated framework for evaluating AI guardrails and safety filters. Test input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.
Lab: Build Jailbreak Automation
Build an automated jailbreak testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.
Cross-Architecture Transfer Lab
Develop attacks on one model architecture that reliably transfer to fundamentally different architectures.
Lab: Defense Engineering Lab
Expert-level lab for building and evaluating a production-grade multi-layer AI defense system including input classifiers, output monitors, semantic analysis, and adversarial robustness testing.
Emergent Behavior Exploitation
Identify and exploit emergent behaviors in frontier models that arise from scale and are not present in smaller models.
Lab: Emergent Capability Probing
Systematically test large language models for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. Build a structured probing framework for capability discovery.
Formal Verification Limits Lab
Explore the limits of formal verification for neural network behavior and identify gaps exploitable by adversaries.
Lab: Full-Stack AI Exploitation
Chain vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. Learn to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.
Lab: Computer Use Agent Exploitation
Hands-on lab exploring attack vectors against GUI-based computer use agents that interact with screens through vision and action primitives.
Lab: Deploy Honeypot AI
Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.
Interpretability-Guided Attack Design
Use mechanistic interpretability to identify exploitable circuits and design targeted attacks.
Model Extraction via API Access
Extract a functionally equivalent model using only API query access.
MoE Routing Exploitation Lab
Exploit Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.
Lab: Multi-Agent Attack Coordination
Hands-on lab building and deploying coordinated multi-agent attack systems that divide red teaming tasks across specialized attacking agents.
Neural Backdoor Engineering
Engineer sophisticated neural backdoors that activate on specific trigger patterns while evading detection methods.
Lab: Novel Jailbreak Research
Systematic methodology for discovering new jailbreak techniques against large language models. Learn to identify unexplored attack surfaces, develop novel attack vectors, and validate findings with scientific rigor.
Lab: ML Pipeline Poisoning
Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. Learn to identify and exploit weaknesses across the full ML lifecycle.
Lab: Exploiting Quantized Model Weaknesses
Hands-on lab exploring how model quantization degrades safety alignment, with techniques to find and exploit precision-related vulnerabilities.
Quantization-Induced Vulnerability Exploitation
Exploit behavioral differences between full-precision and quantized models.
Reasoning Model Chain-of-Thought Exploitation
Exploit extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.
Lab: Model Registry Compromise
Explore techniques for compromising model registries and substituting malicious models into production pipelines. Learn to detect model tampering, verify model provenance, and secure the model supply chain.
Representation Engineering Attacks
Manipulate internal model representations to alter behavior without prompt modification.
Lab: Vulnerability Research Methodology
Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, attack surface mapping, experimental design, validation protocols, and responsible disclosure.
Lab: RLHF Reward Hacking
Hands-on lab exploring how to game reward models used in RLHF alignment, exposing gaps between reward proxy signals and actual safety objectives.
Lab: Create a Safety Benchmark
Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.
End-to-End Training Time Attacks
Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.
Activation Patching for Targeted Behavior Modification
Use activation patching techniques to modify specific model behaviors without retraining or fine-tuning.
Alignment Faking Detection Lab
Implement detection methods for alignment faking behaviors where models behave differently during evaluation versus deployment.
Continuous Learning Pipeline Poisoning
Exploit continuous/online learning pipelines by injecting adversarial examples that gradually shift model behavior.
Deceptive Alignment Testing Framework
Build a testing framework for detecting mesa-optimization and deceptive alignment in fine-tuned models.
Emergent Capability Security Probing
Design evaluations that discover security-relevant emergent capabilities in frontier language models.
Federated Learning Poisoning Attack
Execute model poisoning attacks in a federated learning setting through adversarial participant manipulation.
Formal Verification of LLM Safety Properties
Apply formal verification methods to prove or disprove safety properties of LLM-based applications.
Gradient-Based Universal Prompt Search
Implement gradient-based optimization to search for universal adversarial prompts that transfer across contexts.
KV Cache Manipulation Attacks
Manipulate key-value caches in inference servers to inject or alter cached context across requests.
Mechanistic Interpretability for Exploit Discovery
Use mechanistic interpretability tools to discover exploitable circuits and features in transformer models.
Membership Inference Against Production LLMs
Implement membership inference attacks to determine whether specific data was used in training an LLM.
Model Extraction via Knowledge Distillation
Extract a functionally equivalent model from a commercial API using systematic distillation queries.
Model Merging Backdoor Propagation
Demonstrate how backdoors propagate through model merging techniques like TIES, DARE, and spherical interpolation.
MoE Routing Exploitation
Exploit Mixture-of-Experts routing mechanisms to activate specific expert networks for adversarial purposes.
Multi-Agent Consensus Poisoning
Poison multi-agent consensus mechanisms to manipulate collective decision-making in agent swarms.
Representation Engineering for Behavior Steering
Use representation engineering to steer model behavior by manipulating activation vectors during inference.
Reward Hacking in RLHF Pipelines
Craft adversarial training examples that exploit reward model weaknesses in RLHF training pipelines.
Minimal-Data Safety Fine-Tuning Removal
Determine the minimum number of fine-tuning examples needed to remove safety alignment from various model families.
Speculative Decoding Side-Channel Attacks
Exploit speculative decoding implementations to extract information about draft and verifier model behavior.
Token Probability System Prompt Reconstruction
Reconstruct complete system prompts using token-by-token probability analysis through API access.
LLM Watermark Removal Attacks
Develop techniques to remove or corrupt watermarks embedded in LLM-generated text.
Lab: AI Watermark Detection & Removal
Hands-on lab exploring techniques for detecting and removing statistical watermarks embedded in AI-generated text, and evaluating watermark robustness.