Community Contributed Challenges
How to submit your own AI security challenges to the community, including the review process, quality standards, and contribution guidelines.
Community Contributed Challenges
The best challenges come from the community. If you have designed an interesting AI security exercise, puzzle, or scenario, you can submit it for inclusion in the community challenge library. Contributed challenges are reviewed, tested, and published for the entire community to enjoy.
Why Contribute
Contributing challenges benefits you and the community:
- Teaching deepens understanding. Designing a challenge requires understanding the technique well enough to create a controlled scenario around it. This is a higher bar than simply executing the technique.
- Community recognition. Your name (or handle) is attached to every challenge you create. Quality challenges earn reputation and recognition within the community.
- Improving the field. Every new challenge is a training opportunity for the next generation of AI security practitioners. Your contribution has a multiplier effect.
- Feedback on your ideas. The review process and community response to your challenge provide valuable feedback on your thinking.
Submission Process
Step 1: Proposal
Before building a full challenge, submit a proposal that describes:
# Challenge Proposal: [Title]
## Concept
[2-3 paragraphs describing what the challenge tests and why it is interesting]
## Learning Objectives
[What skills or knowledge does this challenge develop?]
## Difficulty Level
[Beginner / Intermediate / Advanced / Expert]
## Estimated Time
[How long should a participant at the target difficulty level expect to spend?]
## Technical Requirements
[What infrastructure does the challenge need? A model API, a web server,
a database, custom tools?]
## Novelty
[How does this differ from existing challenges? What makes it unique?]Submit proposals through the community platform. A reviewer will respond within 7 days with one of:
- Accepted -- proceed to building the full challenge
- Revision requested -- the concept is interesting but needs refinement (specific feedback provided)
- Declined -- the concept overlaps too much with existing challenges or does not meet quality standards (reasoning provided)
Step 2: Challenge Development
Once your proposal is accepted, build the full challenge. A complete challenge submission includes:
Challenge specification:
# [Challenge Title]
## Overview
[Detailed description of the scenario and objectives]
## Setup Instructions
[How to deploy the challenge environment]
## Objectives
[Numbered list of objectives with point values]
## Hints
[Progressive hints, from vague to specific]
## Solution
[Complete walkthrough of the intended solution]
## Alternative Solutions
[Other valid approaches you have identified]
## Scoring Rubric
[How to evaluate submissions]Technical implementation:
- Source code for any custom components (chatbots, agents, filters, APIs)
- Deployment configuration (Docker, docker-compose, or equivalent)
- Test scripts that verify the challenge works correctly
- Solution verification script that confirms the intended solution works
Documentation:
- README with setup instructions
- Dependencies and requirements list
- Known issues or limitations
Step 3: Review
Submitted challenges go through a two-phase review:
Technical review (1--2 weeks):
- A reviewer deploys the challenge and verifies it works
- The intended solution is tested
- The challenge is probed for unintended solutions that trivialize it
- Code quality and deployment reliability are assessed
Community testing (1--2 weeks):
- 3--5 volunteer testers attempt the challenge without seeing the solution
- Testers provide feedback on difficulty, clarity, and fun factor
- Feedback is shared with you for final revisions
Step 4: Publication
After review, your challenge is published to the community challenge library with:
- Your attribution as the challenge author
- An editorial introduction placing the challenge in the broader curriculum
- Community difficulty rating based on tester feedback
Quality Standards
What Makes a Good Challenge
Clear objectives. Participants should know exactly what they are trying to achieve. Ambiguous objectives frustrate participants and make scoring inconsistent.
Appropriate difficulty. The challenge should be hard enough to require genuine effort but not so hard that it is inaccessible to the target audience. The best challenges have a "smooth difficulty curve" -- easy to start, hard to finish.
Educational value. Participants should learn something from attempting the challenge, even if they do not complete it. Challenges that require obscure trivia or lucky guessing do not teach useful skills.
Realistic scenarios. The best challenges model situations that practitioners encounter in real engagements. Artificial constraints are acceptable for pedagogical reasons, but the core scenario should feel plausible.
Fair solutions. The intended solution should be findable through systematic thinking and skill application. Challenges that require "guessing the author's mind" or rely on hidden assumptions are poor design.
Common Rejection Reasons
| Issue | Why It Is Rejected | How to Fix |
|---|---|---|
| Too similar to existing challenges | The community library already covers this technique | Add a novel twist or combine techniques in a new way |
| Single-trick solution | The challenge reduces to knowing one specific technique | Add layers that require multiple skills |
| Unclear instructions | Testers could not understand what they were supposed to do | Get feedback from non-authors before submitting |
| Unreliable infrastructure | The challenge environment crashes or behaves inconsistently | Add health checks, error handling, and deployment testing |
| No educational value | Completing the challenge does not develop any transferable skill | Redesign around a learning objective, not just a puzzle |
| Unfair difficulty | Success depends on guessing rather than skill | Ensure the solution is reachable through systematic exploration |
Contribution Guidelines
Technical Requirements
- Containerized deployment. Challenges must be deployable via Docker or docker-compose. No manual server setup.
- Idempotent setup. Running the setup script multiple times should produce the same result. Challenge state should be resettable.
- Resource bounds. Challenges should run on a single machine with 4 CPU cores, 8GB RAM, and optionally 1 GPU. If your challenge requires more resources, note this in the proposal.
- No external dependencies at runtime. The challenge should not depend on external services (third-party APIs, live websites) that could become unavailable. Mock external services if needed.
Content Requirements
- No real harmful content. Challenges involving jailbreaking should use benign proxy tasks. Challenges involving data exfiltration should use fictional data.
- Clear licensing. All code must be licensed under MIT or Apache 2.0. All content must be original or properly attributed.
- Inclusive language. Challenge scenarios should not rely on stereotypes or potentially offensive content.
Maintenance Commitment
When you contribute a challenge, you agree to:
- Respond to bug reports within 7 days
- Update the challenge if underlying models or APIs change
- Participate in at least one round of community testing for someone else's challenge
Current Contributed Challenges
The following challenges were submitted and reviewed by community members:
| Challenge | Author | Difficulty | Topic |
|---|---|---|---|
| Encoding Puzzle | cipher_smith | Intermediate | Encoding and payload obfuscation |
| Defense Gauntlet | blue_team_boss | Advanced | Defensive engineering |
| Prompt Golf | minmax_hacker | Intermediate | Minimal jailbreak optimization |
Getting Started
- Study existing challenges. Review the monthly challenges and the contributed challenges above to understand the quality bar and format.
- Identify a gap. What technique or scenario is not covered by existing challenges? Where did you struggle to find practice material when learning?
- Draft a proposal. Use the proposal template above. Focus on the learning objective first, then design the challenge around it.
- Build a prototype. Before writing the full submission, build a minimal version and test it yourself. Many challenge ideas that sound good in theory do not work in practice.
- Submit and iterate. Submit your proposal, incorporate feedback, build the full challenge, and go through the review process.
The community challenge library grows through contributions from practitioners at all levels. Your unique perspective and experience can create learning opportunities that no one else can.
Challenge Design Principles
Good challenge design is a skill in itself. These principles, developed from years of CTF and challenge design experience, will help you create challenges that participants enjoy and learn from.
The Funnel of Difficulty
The best challenges have a "funnel" structure: easy to start, progressively harder to complete. The first steps should be accessible to anyone at the target difficulty level, giving participants early momentum and confidence. The final objectives should stretch even strong participants.
A concrete example: a prompt extraction challenge might have the flag visible in the system prompt (easy to locate) but behind multiple defense layers (hard to extract). The participant quickly understands what they need -- the flag is right there -- but getting it requires progressively more sophisticated techniques.
Avoid "Guess What I'm Thinking" Challenges
The worst challenges require participants to guess the author's specific intended approach rather than rewarding any valid approach. Signs that a challenge has this problem:
- The solution requires trying a very specific prompt that is not suggested by any clue in the challenge
- Multiple valid-looking approaches exist but only one actually works, and there is no way to distinguish them without trial and error
- The challenge description omits information that is critical to solving it
Provide Meaningful Feedback
Participants should be able to tell whether they are making progress. This does not mean giving away the solution -- it means designing the challenge so that partial successes are visible. For example:
- A system that returns different error messages for different failure modes gives the participant information about what defense they hit
- A multi-objective challenge where objectives can be completed independently lets participants measure partial progress
- Debug output or tool call traces that show the system's internal state help participants understand why their approach is or is not working
Test with Real Users
No amount of self-testing replaces having someone else attempt your challenge. Ask two or three people to try it before submitting:
- One person at the target difficulty level (to verify the challenge is appropriately hard)
- One person above the target difficulty level (to check for unintended shortcuts)
- One person below the target difficulty level (to verify the instructions are clear even if the challenge is too hard for them)
Write Complete Solutions
Your submitted solution should cover:
- The intended approach -- the path you designed the challenge to teach
- Alternative approaches you are aware of -- other valid solutions that work
- Non-solutions that look promising -- approaches that seem like they should work but do not, and why they fail
- Edge cases -- unusual approaches that technically satisfy the success criteria but are not in the spirit of the challenge (decide whether to allow or block these)
Maintaining Your Challenge
Contributing a challenge is an ongoing responsibility. Models change, APIs evolve, and defenses improve. Common maintenance tasks:
- Model version updates. If your challenge targets a specific model version, test it when new versions are released. Behavior changes may make the challenge easier, harder, or impossible.
- Infrastructure updates. Keep container images and dependencies up to date. Security vulnerabilities in challenge infrastructure are embarrassing.
- Community feedback. Monitor feedback on your challenge. If multiple participants report the same confusion or frustration, the challenge needs revision.
- Difficulty recalibration. As the community's collective skill improves, challenges that were once hard become routine. Consider adding harder objectives or creating a "v2" with updated defenses.