October 2026: DEF CON AI Village Challenge
Reproduce and extend attacks from AI Village talks and workshops, building on published research to develop practical exploitation techniques.
October 2026: DEF CON AI Village Challenge
Reproduce attacks from DEF CON AI Village presentations and extend them with your own innovations. This challenge bridges the gap between academic research and practical exploitation by requiring you to take published techniques, implement them, verify they work, and push them further.
Challenge Overview
| Detail | Value |
|---|---|
| Duration | October 1--31, 2026 |
| Difficulty | Advanced |
| Estimated Time | 12--20 hours |
| Prerequisites | Research paper comprehension, prompt injection expertise, tool development experience |
| Max Points | 100 |
Why Reproduction Matters
AI security talks frequently demonstrate impressive attacks on stage, but the gap between a conference demo and a practical, reliable technique is significant. Reproduction forces you to:
- Understand the technique deeply. You cannot reproduce what you do not understand. Reimplementing an attack reveals the nuances that slides and demos gloss over.
- Identify environmental assumptions. Conference demos often rely on specific model versions, configurations, or contexts that may not be documented. Reproduction reveals these dependencies.
- Discover failure modes. Live demos show the successful case. Reproduction shows you the 8 failed attempts before the success.
- Build transferable skills. The process of reading research, implementing techniques, and testing them systematically is the core skill of offensive security research.
The Tracks
The challenge offers three tracks based on different AI Village presentations. You may attempt any or all tracks.
Track A: Automated Jailbreak Discovery (35 points)
Source material: Based on research into automated red teaming and jailbreak optimization techniques presented at AI security conferences.
Reproduction objective (20 points): Implement an automated jailbreak discovery system based on the published approach. Your implementation should:
- Take a target behavior description as input (e.g., "generate instructions for picking a lock")
- Automatically generate and test jailbreak candidates
- Use a feedback loop to refine candidates based on model responses
- Produce a working jailbreak within a configurable number of iterations
Reproduction criteria:
| Criterion | Points |
|---|---|
| Working implementation that generates candidates | 5 |
| Feedback loop that improves candidates over iterations | 5 |
| Success rate within 2x of reported results | 5 |
| Clear documentation of differences from original approach | 5 |
Extension objective (15 points): Extend the automated system in a way not described in the original research. Examples:
- Add multi-model support (test candidates against multiple models simultaneously)
- Implement a different optimization strategy (evolutionary, gradient-free, LLM-based)
- Add transferability testing (automatically test whether successful jailbreaks transfer across models)
- Implement defense detection (identify what defense the target is using and adapt strategy)
Extension scoring: Based on the significance of the extension, quality of implementation, and empirical results.
Track B: Agent Exploitation Framework (35 points)
Source material: Based on research into systematic exploitation of AI agent systems with tool access.
Reproduction objective (20 points): Build a testing framework that:
- Maps an agent's tool capabilities through systematic probing
- Identifies trust boundary violations between the agent and its tools
- Tests for indirect prompt injection surfaces
- Generates an attack surface report
Reproduction criteria:
| Criterion | Points |
|---|---|
| Tool capability enumeration working | 5 |
| Trust boundary testing implemented | 5 |
| Indirect injection surface identification | 5 |
| Generated report matches expected findings on test agent | 5 |
Extension objective (15 points): Extend the framework beyond the original scope:
- Add automated exploitation (not just identification, but automated payload generation)
- Implement multi-agent system support (agents that coordinate with other agents)
- Add persistence detection (identify whether an agent's compromised state persists)
- Build a defense recommendation engine based on identified vulnerabilities
Track C: Model Fingerprinting and Capability Elicitation (30 points)
Source material: Based on research into techniques for determining which model is behind an API endpoint and mapping its capabilities without documentation.
Reproduction objective (15 points): Implement a model fingerprinting system that:
- Sends a set of probe queries to an unknown model endpoint
- Analyzes the responses to determine the model family (GPT, Claude, Gemini, Llama, Mistral)
- Estimates the model version or size class
- Maps the model's safety boundaries (what it will and will not do)
Reproduction criteria:
| Criterion | Points |
|---|---|
| Model family identification with >80% accuracy | 5 |
| Version/size estimation with reasonable accuracy | 3 |
| Safety boundary mapping produces useful results | 4 |
| Efficient probe set (minimizes required queries) | 3 |
Extension objective (15 points): Extend the fingerprinting system:
- Add system prompt detection (determine whether a system prompt is present and estimate its content)
- Implement fine-tuning detection (distinguish base models from fine-tuned variants)
- Add defense fingerprinting (identify what safety layers are deployed beyond base model alignment)
- Build a "most likely to succeed" attack selector based on fingerprint results
Implementation Requirements
Code Quality
Your implementation will be evaluated on:
- Functionality. Does it work as described?
- Code quality. Is the code readable, well-structured, and documented?
- Reproducibility. Can someone else run your code and get similar results?
- Testing. Did you include tests that verify your implementation works?
Documentation Requirements
For each track, submit:
# Track [A/B/C]: [Title]
## Original Research Summary
[1 paragraph summarizing the source material and its key contributions]
## Reproduction
### Approach
[How you implemented the technique. What decisions did you make
and why?]
### Differences from Original
[Where your implementation differs from the original and why.
Differences are expected -- document them honestly.]
### Results
[Quantitative results compared to the original paper's reported
results]
### Challenges Encountered
[What was harder than expected? What was not documented in the
original research that you had to figure out?]
## Extension
### Motivation
[Why you chose this extension and what gap it addresses]
### Implementation
[Technical description of the extension]
### Results
[What the extension achieved. Quantitative results where possible.]
### Limitations
[Where the extension falls short or fails]Evaluation
Scoring Summary
| Track | Reproduction | Extension | Total |
|---|---|---|---|
| Track A: Automated Jailbreaking | 20 | 15 | 35 |
| Track B: Agent Exploitation | 20 | 15 | 35 |
| Track C: Model Fingerprinting | 15 | 15 | 30 |
You can earn up to 100 points by completing all three tracks, or focus on one or two tracks for a focused submission.
Bonus Considerations
- Submissions that combine techniques across tracks (e.g., using fingerprinting results to guide automated jailbreaking) receive up to 5 bonus points.
- Submissions that include a comparative analysis across models or techniques receive up to 5 bonus points.
Research Ethics
Responsible Research Practices
- Cite all source material. Do not present reproduced work as original.
- Share your code and results with the community after the challenge closes to advance collective knowledge.
- Focus on demonstrating vulnerability existence and recommending mitigations, not on creating turnkey attack tools for harmful use.
Common Pitfalls in Research Reproduction
Assuming Published Results Are the Full Story
Conference talks optimize for audience engagement, not reproducibility. Important details are often omitted for time:
- Hyperparameters and configuration. The talk mentions "we used GPT-4" but does not specify temperature, system prompt, or API version. These details significantly affect results.
- Failure rates. A demo showing 3 successful attacks may not mention the 50 failed attempts before finding those 3. Report your failure rate honestly.
- Cherry-picked examples. Published examples are selected for clarity and impact. Your results on random samples will look messier. This is normal and expected.
Conflating Reproduction with Replication
Reproduction means implementing the same technique and verifying it works. Replication means getting the same quantitative results. These are different standards:
- Reproduction is always achievable if the technique is valid. If you implement the technique correctly and it does not work at all, the original claim may be wrong, or the technique may depend on a specific model version that has been updated.
- Replication of exact numbers is often impossible due to model version changes, API behavior differences, and stochastic variation. Getting within 20% of reported numbers is generally sufficient.
Underestimating Implementation Time
A technique described in 2 slides may take 20 hours to implement. The gap between "understand the concept" and "working code" is consistently larger than expected. Budget twice the time you think you need.
Overvaluing Novelty in Extensions
The most valuable extensions are often not the most novel -- they are the most practical. Extending a technique from 1 model to 3 models (systematic testing) is often more valuable than adding a speculative new feature that works in theory but is not validated.
Getting Started
- Choose your track(s). Read the source material summaries on the challenge platform and pick the track(s) that match your skills and interests.
- Read the original research. Links to source presentations and papers are provided on the challenge platform. Read them thoroughly before writing any code.
- Plan your implementation. Write pseudocode and identify the key technical challenges before coding.
- Build incrementally. Start with the simplest component and verify it works before adding complexity.
- Document as you go. Write your documentation alongside your implementation, not after. This catches gaps in your understanding early.
The Value of Reproduction Skills
Research reproduction is an underappreciated skill in AI security. The ability to read a paper, understand the technique, implement it, and verify it works is foundational to professional practice:
- Staying current. The field moves fast. New techniques are published weekly. If you can reproduce techniques from papers, you can keep your toolkit current without waiting for someone else to build tools.
- Evaluating claims. Not everything published at conferences works as described. The ability to test claims through reproduction is critical for separating genuine advances from overfitted demos.
- Building on prior work. The best new techniques build on existing ones. Reproduction gives you the working implementation needed to start extending.
- Contributing to the field. Reproduction studies that confirm or challenge published results are valuable contributions. Negative results (techniques that do not reproduce) are especially important.
This challenge is not just about building tools -- it is about developing the research methodology that enables continuous learning throughout your career.
Further Reading
- Frontier & Emerging Attacks -- cutting-edge attack research
- Exploit Dev & Tooling -- building offensive tools
- Injection Research & Automation -- automated attack techniques
- November 2026 Challenge -- the next challenge