# research

practice-examfrontierresearch

Frontier Research Practice Exam

Practice exam covering alignment faking, sleeper agents, reasoning model attacks, and mechanistic interpretability.

study-guideadvancedresearchautomationforensics

Advanced Topics Study Guide

Study guide covering AI security research techniques, automation, forensics, emerging attack vectors, and tool development for advanced practitioners.

challengedefconai-villageresearchreproductionoctober-2026

October 2026: DEF CON AI Village Challenge

Reproduce and extend attacks from AI Village talks and workshops, building on published research to develop practical exploitation techniques.

communityresearchcollaborativeprojects

Collaborative Research Projects

Active collaborative research projects in AI security open for community participation.

sprintresearchchallengescommunity

Research Sprint: Novel Injection Techniques

Community research sprint focused on discovering and documenting novel injection techniques.

ctfcompetitionfall-2026advancedresearchexpertcapture-the-flag

Fall 2026 CTF: Advanced Attack Research

An expert-level CTF requiring novel technique development, research-grade problem solving, and original vulnerability discovery in AI systems.

communitychallengeresearchinterpretability

Research Challenge: Attack Interpretability

Community research challenge focused on understanding why specific adversarial techniques succeed using interpretability and mechanistic analysis methods.

communitychallengeresearchtransfer

Research Challenge: Cross-Model Transfer

Research-oriented challenge exploring the transferability of adversarial techniques across model families, with published findings.

advanced-defenseinstruction-hierarchyconstitutional-airepresentation-engineeringresearch

Advanced Defense Techniques

Cutting-edge defense research including instruction hierarchy, constitutional AI, and representation engineering for safety -- what is promising versus what is actually deployed.

defenseadaptive-attacksred-teamingresearchadversarial-robustness

The Attacker Moves Second Problem

Why static LLM defenses fail against adaptive adversaries: analysis of 12 bypassed defenses and implications for defense design.

chainfaithfulnessthoughtresearchfrontier

Chain-of-Thought Faithfulness Attacks

Exploiting unfaithful chain-of-thought reasoning where models' stated reasoning does not match their actual decision process.

culturalalignmentresearchgapsfrontier

Cultural Alignment Gaps in Safety Training

Research on cross-cultural gaps in safety alignment and their exploitation through cultural context manipulation.

frontierresearchreasoningcode-modelscomputer-usealignment-fakingembodied-ai

Frontier Research

Cutting-edge AI security research covering reasoning model attacks, code generation security, computer use agents, AI-powered red teaming, robotics and embodied AI, and alignment faking.

viajailbreakresearchpersonaengineeringfrontier

Jailbreaking via Persona Engineering

Research on using sophisticated persona engineering to bypass safety training in frontier models.

exploitationcollapseresearchmodelfrontier

Model Collapse in Recursive Training

Security implications of model collapse when models are trained on AI-generated data from previous generations.

researchdeceptionagentmultifrontier

Multi-Agent Deception Research

Research on deceptive behavior emerging in multi-agent systems without explicit training for deception.

deploymentsafetypostresearchdegradationfrontier

Post-Deployment Safety Degradation

Research on how model safety degrades over time through fine-tuning, adaptation, and use-case drift.

certificationrobustnesspromptresearchfrontier

Prompt Robustness Certification Research

Research on certifying prompt robustness with formal guarantees against bounded adversarial perturbations.

quantizationsafetyalignmentdeploymentmodel-compressionresearch

Quantization & Safety Alignment

How model quantization disproportionately degrades safety alignment: malicious quantization attacks, token-flipping, and safety-aware quantization defenses.

reasoningjailbreakchain-of-thoughto1o3adaptive-attacksresearch

Reasoning Model Jailbreaks

How reasoning capabilities create novel jailbreak surfaces: chain-of-thought exploitation, scratchpad attacks, and why higher reasoning effort increases attack success.

basedoptimizationresearchfrontierjailbreak

RL-Based Jailbreak Optimization

Using reinforcement learning to optimize jailbreak strategies against black-box language models.

safetytaxfrontierresearch

The Safety Tax: Performance Impact of Safety Training

Research on the performance degradation caused by safety training and its exploitation implications.

universaladversarialtriggersresearchfrontier

Universal Adversarial Triggers Research

Research on discovering universal adversarial triggers that cause specific behaviors across model families.

frontier-researchadaptive-attackssafety-trainingresearch

Adaptive Attacks Against Safety Training

Research on attacks that adapt to safety training procedures to discover persistent vulnerabilities.

frontier-researchconstitutional-ailimitationsresearch

Constitutional AI Limitations Research

Research on the limitations of constitutional AI approaches and known bypass categories.

frontier-researchemergent-deceptionresearchsafety

Emergent Deception in AI Systems

Research on how deceptive behaviors can emerge in AI systems without being explicitly trained.

frontier-researchinterpretabilityattack-designresearch

Interpretability-Driven Attack Design

Using interpretability insights to design more effective and targeted attacks on language models.

frontier-researchlong-contextsecurityresearch

Long-Context Window Security Research

Security research on vulnerabilities specific to models with extremely long context windows (1M+ tokens).

frontier-researchdistillationsafety-lossresearch

Safety Loss During Model Distillation

Research on how safety alignment degrades during knowledge distillation from larger to smaller models.

frontier-researchmultimodal-reasoningsafetyresearch

Multimodal Reasoning Safety Research

Current research on safety properties of multimodal reasoning in models that process diverse input types.

frontier-researchprobing-classifierssafety-analysisresearch

Probing Classifiers for Safety Analysis

Research on using probing classifiers to analyze safety-relevant representations in language models.

frontier-researchspecification-gamingrewardresearch

Specification Gaming in AI Systems

Research on how AI systems find unexpected shortcuts that satisfy specifications without achieving intended goals.

frontier-researchsynthetic-personaattacksresearch

Synthetic Persona Attacks Research

Research on using AI-generated synthetic personas to conduct social engineering against AI systems.

industry-verticalspharmadrug-discoveryresearch

Pharmaceutical Drug Discovery AI Risks

Security risks in AI-powered drug discovery including molecular generation manipulation and data poisoning.

researchcertificationrobustnessguarantees

Adversarial Robustness Certification

Research into certifiable adversarial robustness for LLMs, including theoretical bounds and practical certification methods.

researchattentionmanipulationmechanistic

Attention Manipulation Research

Research into directly manipulating attention patterns to achieve injection objectives, informed by mechanistic interpretability insights.

researchinjectiontransferarchitecture

Cross-Architecture Injection Transfer

Research into how injection techniques transfer across model architectures and what architectural properties determine transferability.

injectionlingualcrossresearch

Cross-Lingual Injection Transfer Research

Research on how injection techniques transfer across languages and multilingual models.

researchinjectiondefense-awareadaptive

Defense-Aware Payload Design

Designing injection payloads that adapt to and evade specific defense mechanisms through probing and feedback-based optimization.

defenseinjectiondesigninformedresearch

Defense-Informed Injection Design

Methodology for designing injections that account for known defensive mechanisms.

researchformal-verificationproofsguarantees

Formal Verification of Injection Defenses

Research into formal methods for verifying injection defense properties and providing mathematical guarantees of defense effectiveness.

injectionprompt-injectionjailbreaksmultimodalresearchautomation

Injection Research

Advanced research in prompt injection, jailbreak automation, and multimodal attack vectors, covering cutting-edge techniques that push beyond standard injection approaches.

researchbenchmarkingmethodologyevaluation

Injection Benchmarking Methodology

Standardized methodologies for benchmarking injection attacks and defenses to enable meaningful comparison across research papers and tools.

researchdetectionclassificationdefense

Injection Detection Research

State-of-the-art research in injection detection including perplexity-based methods, classifier approaches, and ensemble techniques.

researchreasoningchain-of-thoughtinjection

Injection in Reasoning Models

Research into injection attacks specific to reasoning-augmented models that exploit chain-of-thought processes and self-reflection mechanisms.

researchscalinglawsmodel-size

Injection Scaling Laws

Research into how injection effectiveness scales with model size, training compute, and safety training investment.

surfacetaxonomyinjectionresearch

Injection Attack Surface Taxonomy

Comprehensive taxonomy of all known injection attack surfaces in LLM-powered applications.

injection-researchtransferabilitycross-modelresearch

Injection Transferability Research

Research on how prompt injection techniques transfer across different model families and sizes.

researchmulti-agentinjectionpropagation

Multi-Agent Injection Research

Research into how injections propagate through multi-agent systems and what properties determine infection spread rates.

researchinjectionnovelclassification

Novel Injection Classes

Exploring emerging injection classes that don't fit traditional taxonomies, including structural, temporal, and cross-system injection vectors.

researchsemanticembeddinginjection

Semantic Space Injection Research

Research into injections that operate in semantic embedding space rather than token space, exploiting learned representations directly.

injectiontemporalresearchdynamics

Temporal Dynamics of Injection Success

Research on how injection success rates change over time with model updates and defense evolution.

labexpertjailbreakresearchnovel-attackshands-on

Lab: Novel Jailbreak Research

Systematic methodology for discovering new jailbreak techniques against large language models. Learn to identify unexplored attack surfaces, develop novel attack vectors, and validate findings with scientific rigor.

labexpertresearchmethodologyvulnerability-discovery

Lab: Vulnerability Research Methodology

Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, attack surface mapping, experimental design, validation protocols, and responsible disclosure.

audioadversarialmultimodalvoiceprompt-injectionspeech-llmresearch

Audio Modality Attacks

Comprehensive attack taxonomy for audio-enabled LLMs: adversarial audio generation, voice-based prompt injection, cross-modal split attacks, and ultrasonic perturbations.

jailbreakmany-shotin-context-learningcontext-windowprompt-injectionresearch

Many-Shot Jailbreaking

Power-law scaling of in-context jailbreaks: why 5 shots fail but 256 succeed, context window size as attack surface, and mitigations for long-context exploitation.

OSINTintelligencereconnaissancepublic-sourcesresearch

OSINT for AI Red Teaming

Gathering intelligence about AI deployments from public sources: documentation, job postings, research papers, social media, and technical artifacts.

walkthroughsengagementlegalresearch

Full Engagement: Legal Research AI System

Full red team engagement of a legal research AI with case law access, privilege management, and citation generation.

practice-examexpertresearchautomationfine-tuningsupply-chainincident-response

Practice Exam 3: 專家紅隊

25-question expert-level practice exam covering research techniques, automation, fine-tuning attacks, supply chain security, and incident response.

practice-examfrontierresearch

Frontier Research Practice Exam

Practice exam covering alignment faking, sleeper agents, reasoning model attacks, and mechanistic interpretability.

study-guideadvancedresearchautomationforensics

進階 Topics Study 指南

Study guide covering AI security research techniques, automation, forensics, emerging attack vectors, and tool development for advanced practitioners.

fine-tuningalignmentbackdoorsdata-poisoningsafetyresearch

微調安全研究的教訓

來自微調安全研究的關鍵教訓——涵蓋對齊侵蝕、後門植入、資料投毒、安全評估落差，以及微調管線的防禦策略。

2026 年提示詞注入現況

提示詞注入攻擊如何從簡單的指令覆蓋演進為複雜的多階段利用鏈。

prompt-injectionresearchtrends

AI 安全最新動態 — 2026 年 3 月

2026 年 3 月最重要 AI 安全發展、工具更新、研究亮點與新興攻擊向量的月度彙整。

roundupai-securitytoolsresearchregulationmonthly

2026 年 10 月：DEF CON AI Village 挑戰

重現並擴展 AI Village 演講與工作坊之攻擊，建立於已發表研究之上以發展實務利用技術。

challengedefconai-villageresearchreproductionoctober-2026

communityresearchcollaborativeprojects

Collaborative Research Projects

Active collaborative research projects in AI security open for community participation.

sprintresearchchallengescommunity

Research Sprint: Novel Injection Techniques

Community research sprint focused on discovering and documenting novel injection techniques.

ctfcompetitionfall-2026advancedresearchexpertcapture-the-flag

2026 秋季 CTF：進階攻擊研究

需新穎技術發展、研究級問題解決與 AI 系統中原創漏洞發現之專家級 CTF。

communitychallengeresearchinterpretability

Research Challenge: 攻擊 Interpretability

Community research challenge focused on understanding why specific adversarial techniques succeed using interpretability and mechanistic analysis methods.

communitychallengeresearchtransfer

Research Challenge: Cross-模型 Transfer

Research-oriented challenge exploring the transferability of adversarial techniques across model families, with published findings.

advanced-defenseinstruction-hierarchyconstitutional-airepresentation-engineeringresearch

進階防禦技術

前沿防禦研究，包括指令階層、Constitutional AI，以及為安全之表徵工程——何者具前景、何者已實際部署。

defenseadaptive-attacksred-teamingresearchadversarial-robustness

The 攻擊er Moves Second Problem

Why static LLM defenses fail against adaptive adversaries: analysis of 12 bypassed defenses and implications for defense design.

chainfaithfulnessthoughtresearchfrontier

Chain-of-Thought Faithfulness 攻擊s

Exploiting unfaithful chain-of-thought reasoning where models' stated reasoning does not match their actual decision process.

code-modelscopilotsuggestion-poisoningresearch

程式碼生成模型安全研究

程式碼生成模型的前沿安全研究——涵蓋 Copilot 利用、建議投毒、儲存庫投毒與 AI 驅動開發工具安全。

culturalalignmentresearchgapsfrontier

Cultural Alignment Gaps in Safety 訓練

Research on cross-cultural gaps in safety alignment and their exploitation through cultural context manipulation.

frontierresearchreasoningcode-modelscomputer-usealignment-fakingembodied-ai

前沿研究

涵蓋推理模型攻擊、程式碼生成安全、電腦使用代理、AI 驅動紅隊演練、機器人與具身 AI，以及對齊造假的尖端 AI 安全研究。

viajailbreakresearchpersonaengineeringfrontier

越獄ing via Persona Engineering

Research on using sophisticated persona engineering to bypass safety training in frontier models.

exploitationcollapseresearchmodelfrontier

模型 Collapse in Recursive 訓練

安全 implications of model collapse when models are trained on AI-generated data from previous generations.

researchdeceptionagentmultifrontier

Multi-代理 Deception Research

Research on deceptive behavior emerging in multi-agent systems without explicit training for deception.

deploymentsafetypostresearchdegradationfrontier

Post-Deployment Safety Degradation

Research on how model safety degrades over time through fine-tuning, adaptation, and use-case drift.

certificationrobustnesspromptresearchfrontier

Prompt Robustness Certification Research

Research on certifying prompt robustness with formal guarantees against bounded adversarial perturbations.

quantizationsafetyalignmentdeploymentmodel-compressionresearch

Quantization & Safety Alignment

How model quantization disproportionately degrades safety alignment: malicious quantization attacks, token-flipping, and safety-aware quantization defenses.

reasoningjailbreakchain-of-thoughto1o3adaptive-attacksresearch

Reasoning 模型越獄s

How reasoning capabilities create novel jailbreak surfaces: chain-of-thought exploitation, scratchpad attacks, and why higher reasoning effort increases attack success.

basedoptimizationresearchfrontierjailbreak

RL-Based 越獄 Optimization

Using reinforcement learning to optimize jailbreak strategies against black-box language models.

safetytaxfrontierresearch

The Safety Tax: Performance Impact of Safety 訓練

Research on the performance degradation caused by safety training and its exploitation implications.

universaladversarialtriggersresearchfrontier

Universal Adversarial Triggers Research

Research on discovering universal adversarial triggers that cause specific behaviors across model families.

frontier-researchadaptive-attackssafety-trainingresearch

Adaptive 攻擊s Against Safety 訓練

Research on attacks that adapt to safety training procedures to discover persistent vulnerabilities.

frontier-researchconstitutional-ailimitationsresearch

Constitutional AI Limitations Research

Research on the limitations of constitutional AI approaches and known bypass categories.

frontier-researchemergent-deceptionresearchsafety

Emergent Deception in AI Systems

Research on how deceptive behaviors can emerge in AI systems without being explicitly trained.

frontier-researchinterpretabilityattack-designresearch

Interpretability-Driven 攻擊 Design

Using interpretability insights to design more effective and targeted attacks on language models.

frontier-researchlong-contextsecurityresearch

Long-Context Window 安全 Research

安全 research on vulnerabilities specific to models with extremely long context windows (1M+ tokens).

frontier-researchdistillationsafety-lossresearch

Safety Loss During 模型 Distillation

Research on how safety alignment degrades during knowledge distillation from larger to smaller models.

frontier-researchmultimodal-reasoningsafetyresearch

Multimodal Reasoning Safety Research

Current research on safety properties of multimodal reasoning in models that process diverse input types.

frontier-researchprobing-classifierssafety-analysisresearch

Probing Classifiers for Safety Analysis

Research on using probing classifiers to analyze safety-relevant representations in language models.

frontier-researchspecification-gamingrewardresearch

Specification Gaming in AI Systems

Research on how AI systems find unexpected shortcuts that satisfy specifications without achieving intended goals.

frontier-researchsynthetic-personaattacksresearch

Synthetic Persona 攻擊s Research

Research on using AI-generated synthetic personas to conduct social engineering against AI systems.

industry-verticalspharmadrug-discoveryresearch

Pharmaceutical Drug Discovery AI Risks

安全 risks in AI-powered drug discovery including molecular generation manipulation and data poisoning.

researchcertificationrobustnessguarantees

Adversarial Robustness Certification

Research into certifiable adversarial robustness for LLMs, including theoretical bounds and practical certification methods.

researchattentionmanipulationmechanistic

Attention Manipulation Research

Research into directly manipulating attention patterns to achieve injection objectives, informed by mechanistic interpretability insights.

researchinjectiontransferarchitecture

Cross-Architecture Injection Transfer

Research into how injection techniques transfer across model architectures and what architectural properties determine transferability.

injectionlingualcrossresearch

Cross-Lingual Injection Transfer Research

Research on how injection techniques transfer across languages and multilingual models.

researchinjectiondefense-awareadaptive

防禦-Aware Payload Design

Designing injection payloads that adapt to and evade specific defense mechanisms through probing and feedback-based optimization.

defenseinjectiondesigninformedresearch

防禦-Informed Injection Design

Methodology for designing injections that account for known defensive mechanisms.

researchformal-verificationproofsguarantees

Formal Verification of Injection 防禦s

Research into formal methods for verifying injection defense properties and providing mathematical guarantees of defense effectiveness.

injectionprompt-injectionjailbreaksmultimodalresearchautomation

注入研究

提示詞注入、越獄自動化與多模態攻擊向量的進階研究，涵蓋超越標準注入方法的尖端技術。

researchbenchmarkingmethodologyevaluation

Injection Benchmarking Methodology

Standardized methodologies for benchmarking injection attacks and defenses to enable meaningful comparison across research papers and tools.

researchdetectionclassificationdefense

Injection Detection Research

State-of-the-art research in injection detection including perplexity-based methods, classifier approaches, and ensemble techniques.

researchreasoningchain-of-thoughtinjection

Injection in Reasoning 模型s

Research into injection attacks specific to reasoning-augmented models that exploit chain-of-thought processes and self-reflection mechanisms.

researchscalinglawsmodel-size

Injection Scaling Laws

Research into how injection effectiveness scales with model size, training compute, and safety training investment.

surfacetaxonomyinjectionresearch

Injection 攻擊 Surface Taxonomy

Comprehensive taxonomy of all known injection attack surfaces in LLM-powered applications.

injection-researchtransferabilitycross-modelresearch

Injection Transferability Research

Research on how prompt injection techniques transfer across different model families and sizes.

researchmulti-agentinjectionpropagation

Multi-代理 Injection Research

Research into how injections propagate through multi-agent systems and what properties determine infection spread rates.

researchinjectionnovelclassification

Novel Injection Classes

Exploring emerging injection classes that don't fit traditional taxonomies, including structural, temporal, and cross-system injection vectors.

researchsemanticembeddinginjection

Semantic Space Injection Research

Research into injections that operate in semantic embedding space rather than token space, exploiting learned representations directly.

injectiontemporalresearchdynamics

Temporal Dynamics of Injection Success

Research on how injection success rates change over time with model updates and defense evolution.

labexpertjailbreakresearchnovel-attackshands-on

實驗室: Novel 越獄 Research

labexpertresearchmethodologyvulnerability-discovery

實驗室: 漏洞 Research Methodology

Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, attack surface mapping, experimental design, validation protocols, and responsible disclosure.

audioadversarialmultimodalvoiceprompt-injectionspeech-llmresearch

Audio Modality 攻擊s

Comprehensive attack taxonomy for audio-enabled LLMs: adversarial audio generation, voice-based prompt injection, cross-modal split attacks, and ultrasonic perturbations.

jailbreakmany-shotin-context-learningcontext-windowprompt-injectionresearch

Many-Shot 越獄ing

Power-law scaling of in-context jailbreaks: why 5 shots fail but 256 succeed, context window size as attack surface, and mitigations for long-context exploitation.

OSINTintelligencereconnaissancepublic-sourcesresearch

AI 紅隊的 OSINT

自公開來源蒐集關於 AI 部署的情資：文件、徵才啟事、研究論文、社群媒體與技術產物。

walkthroughsengagementlegalresearch

Full Engagement: Legal Research AI System

Full red team engagement of a legal research AI with case law access, privilege management, and citation generation.