# research
59 articlestagged with “research”
Practice Exam 3: Expert Red Team
25-question expert-level practice exam covering research techniques, automation, fine-tuning attacks, supply chain security, and incident response.
Frontier Research Practice Exam
Practice exam covering alignment faking, sleeper agents, reasoning model attacks, and mechanistic interpretability.
Advanced Topics Study Guide
Study guide covering AI security research techniques, automation, forensics, emerging attack vectors, and tool development for advanced practitioners.
October 2026: DEF CON AI Village Challenge
Reproduce and extend attacks from AI Village talks and workshops, building on published research to develop practical exploitation techniques.
Collaborative Research Projects
Active collaborative research projects in AI security open for community participation.
Research Sprint: Novel Injection Techniques
Community research sprint focused on discovering and documenting novel injection techniques.
Fall 2026 CTF: Advanced Attack Research
An expert-level CTF requiring novel technique development, research-grade problem solving, and original vulnerability discovery in AI systems.
Research Challenge: Attack Interpretability
Community research challenge focused on understanding why specific adversarial techniques succeed using interpretability and mechanistic analysis methods.
Research Challenge: Cross-Model Transfer
Research-oriented challenge exploring the transferability of adversarial techniques across model families, with published findings.
Advanced Defense Techniques
Cutting-edge defense research including instruction hierarchy, constitutional AI, and representation engineering for safety -- what is promising versus what is actually deployed.
The Attacker Moves Second Problem
Why static LLM defenses fail against adaptive adversaries: analysis of 12 bypassed defenses and implications for defense design.
Chain-of-Thought Faithfulness Attacks
Exploiting unfaithful chain-of-thought reasoning where models' stated reasoning does not match their actual decision process.
Cultural Alignment Gaps in Safety Training
Research on cross-cultural gaps in safety alignment and their exploitation through cultural context manipulation.
Frontier Research
Cutting-edge AI security research covering reasoning model attacks, code generation security, computer use agents, AI-powered red teaming, robotics and embodied AI, and alignment faking.
Jailbreaking via Persona Engineering
Research on using sophisticated persona engineering to bypass safety training in frontier models.
Model Collapse in Recursive Training
Security implications of model collapse when models are trained on AI-generated data from previous generations.
Multi-Agent Deception Research
Research on deceptive behavior emerging in multi-agent systems without explicit training for deception.
Post-Deployment Safety Degradation
Research on how model safety degrades over time through fine-tuning, adaptation, and use-case drift.
Prompt Robustness Certification Research
Research on certifying prompt robustness with formal guarantees against bounded adversarial perturbations.
Quantization & Safety Alignment
How model quantization disproportionately degrades safety alignment: malicious quantization attacks, token-flipping, and safety-aware quantization defenses.
Reasoning Model Jailbreaks
How reasoning capabilities create novel jailbreak surfaces: chain-of-thought exploitation, scratchpad attacks, and why higher reasoning effort increases attack success.
RL-Based Jailbreak Optimization
Using reinforcement learning to optimize jailbreak strategies against black-box language models.
The Safety Tax: Performance Impact of Safety Training
Research on the performance degradation caused by safety training and its exploitation implications.
Universal Adversarial Triggers Research
Research on discovering universal adversarial triggers that cause specific behaviors across model families.
Adaptive Attacks Against Safety Training
Research on attacks that adapt to safety training procedures to discover persistent vulnerabilities.
Constitutional AI Limitations Research
Research on the limitations of constitutional AI approaches and known bypass categories.
Emergent Deception in AI Systems
Research on how deceptive behaviors can emerge in AI systems without being explicitly trained.
Interpretability-Driven Attack Design
Using interpretability insights to design more effective and targeted attacks on language models.
Long-Context Window Security Research
Security research on vulnerabilities specific to models with extremely long context windows (1M+ tokens).
Safety Loss During Model Distillation
Research on how safety alignment degrades during knowledge distillation from larger to smaller models.
Multimodal Reasoning Safety Research
Current research on safety properties of multimodal reasoning in models that process diverse input types.
Probing Classifiers for Safety Analysis
Research on using probing classifiers to analyze safety-relevant representations in language models.
Specification Gaming in AI Systems
Research on how AI systems find unexpected shortcuts that satisfy specifications without achieving intended goals.
Synthetic Persona Attacks Research
Research on using AI-generated synthetic personas to conduct social engineering against AI systems.
Pharmaceutical Drug Discovery AI Risks
Security risks in AI-powered drug discovery including molecular generation manipulation and data poisoning.
Adversarial Robustness Certification
Research into certifiable adversarial robustness for LLMs, including theoretical bounds and practical certification methods.
Attention Manipulation Research
Research into directly manipulating attention patterns to achieve injection objectives, informed by mechanistic interpretability insights.
Cross-Architecture Injection Transfer
Research into how injection techniques transfer across model architectures and what architectural properties determine transferability.
Cross-Lingual Injection Transfer Research
Research on how injection techniques transfer across languages and multilingual models.
Defense-Aware Payload Design
Designing injection payloads that adapt to and evade specific defense mechanisms through probing and feedback-based optimization.
Defense-Informed Injection Design
Methodology for designing injections that account for known defensive mechanisms.
Formal Verification of Injection Defenses
Research into formal methods for verifying injection defense properties and providing mathematical guarantees of defense effectiveness.
Injection Research
Advanced research in prompt injection, jailbreak automation, and multimodal attack vectors, covering cutting-edge techniques that push beyond standard injection approaches.
Injection Benchmarking Methodology
Standardized methodologies for benchmarking injection attacks and defenses to enable meaningful comparison across research papers and tools.
Injection Detection Research
State-of-the-art research in injection detection including perplexity-based methods, classifier approaches, and ensemble techniques.
Injection in Reasoning Models
Research into injection attacks specific to reasoning-augmented models that exploit chain-of-thought processes and self-reflection mechanisms.
Injection Scaling Laws
Research into how injection effectiveness scales with model size, training compute, and safety training investment.
Injection Attack Surface Taxonomy
Comprehensive taxonomy of all known injection attack surfaces in LLM-powered applications.
Injection Transferability Research
Research on how prompt injection techniques transfer across different model families and sizes.
Multi-Agent Injection Research
Research into how injections propagate through multi-agent systems and what properties determine infection spread rates.
Novel Injection Classes
Exploring emerging injection classes that don't fit traditional taxonomies, including structural, temporal, and cross-system injection vectors.
Semantic Space Injection Research
Research into injections that operate in semantic embedding space rather than token space, exploiting learned representations directly.
Temporal Dynamics of Injection Success
Research on how injection success rates change over time with model updates and defense evolution.
Lab: Novel Jailbreak Research
Systematic methodology for discovering new jailbreak techniques against large language models. Learn to identify unexplored attack surfaces, develop novel attack vectors, and validate findings with scientific rigor.
Lab: Vulnerability Research Methodology
Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, attack surface mapping, experimental design, validation protocols, and responsible disclosure.
Audio Modality Attacks
Comprehensive attack taxonomy for audio-enabled LLMs: adversarial audio generation, voice-based prompt injection, cross-modal split attacks, and ultrasonic perturbations.
Many-Shot Jailbreaking
Power-law scaling of in-context jailbreaks: why 5 shots fail but 256 succeed, context window size as attack surface, and mitigations for long-context exploitation.
OSINT for AI Red Teaming
Gathering intelligence about AI deployments from public sources: documentation, job postings, research papers, social media, and technical artifacts.
Full Engagement: Legal Research AI System
Full red team engagement of a legal research AI with case law access, privilege management, and citation generation.