Community Challenges Overview
How to participate in monthly AI red teaming challenges, earn points, share results, and grow your skills alongside the community.
Community Challenges
The Community Challenges program provides structured, hands-on exercises that let you practice AI red teaming skills in realistic scenarios. Each month brings a new challenge focused on a different area of the discipline -- from prompt extraction to full engagement simulations. Work independently, compare results with peers, and build a portfolio of demonstrated capability.
Why Participate
Challenges serve several purposes beyond practice:
- Skill validation. Completing challenges provides concrete evidence of your capabilities. Each challenge has a defined scoring rubric so you can measure your progress objectively.
- Technique discovery. Working on challenges forces you to develop novel approaches. Some of the most interesting techniques in the field have emerged from CTF-style exercises.
- Community learning. After each challenge closes, participants share their approaches. Reviewing how others solved the same problem is one of the most effective ways to expand your technique repertoire.
- Portfolio building. Documented challenge completions demonstrate practical skill to employers and clients in a way that certifications alone cannot.
How Challenges Work
Monthly Challenges
A new challenge launches on the first of each month. Each challenge follows this lifecycle:
| Phase | Duration | What Happens |
|---|---|---|
| Active | Days 1--21 | The challenge is open. Work independently, submit your results through the platform. No spoilers in community channels. |
| Discussion | Days 22--28 | The challenge remains open but discussion is allowed. Ask questions, share partial approaches, collaborate on stuck points. |
| Review | Days 29--end of month | Solutions and writeups are published. Community voting on best writeups. Scoring finalized. |
Difficulty Levels
Challenges are tagged with difficulty levels to help you choose appropriate targets:
| Level | Expected Background | Time Commitment |
|---|---|---|
| Beginner | Completed the Foundations section of this wiki | 2--4 hours |
| Intermediate | Comfortable with prompt injection and basic agent exploitation | 4--8 hours |
| Advanced | Experience with multi-step attacks, tool exploitation, RAG poisoning | 8--16 hours |
| Expert | Professional red teaming experience or equivalent depth | 16--40 hours |
Scoring System
Each challenge defines its own scoring rubric, but the general framework is consistent:
Point Categories
- Objective completion (0--60 points). Did you achieve the stated goals? Partial credit is awarded for partial completion.
- Technique quality (0--20 points). How clean, reliable, and well-understood is your approach? A technique that works 9 out of 10 times scores higher than one that works 1 out of 10.
- Documentation (0--10 points). Did you write up your approach clearly enough that someone else could reproduce it? Good documentation includes the reasoning behind your choices, not just the final payloads.
- Innovation (0--10 points). Did you discover something novel? This could be a new technique, an unexpected interaction, or a creative application of known methods.
Scoring Tiers
| Tier | Points | Significance |
|---|---|---|
| Platinum | 90--100 | Exceptional performance with novel contributions |
| Gold | 75--89 | Strong performance demonstrating solid methodology |
| Silver | 50--74 | Competent execution with room for improvement |
| Bronze | 25--49 | Partial completion showing foundational understanding |
| Participant | 1--24 | Attempted the challenge and submitted results |
Submitting Results
What to Include in Your Submission
Every submission should contain:
- Executive summary. One paragraph describing what you achieved and your overall approach.
- Environment setup. What tools, models, and configurations you used. Others should be able to reproduce your setup.
- Attack narrative. Step-by-step description of your approach, including dead ends and pivots. The journey matters as much as the destination.
- Evidence. Screenshots, logs, transcripts, or other artifacts demonstrating your results.
- Payloads and techniques. The actual prompts, scripts, or tools you used, with explanation of why they work.
- Lessons learned. What surprised you? What would you do differently? What did you learn?
Submission Format
Submissions are accepted as Markdown documents with embedded code blocks. Use the following template structure:
# [Challenge Name] - Submission by [Your Handle]
## Executive Summary
[One paragraph overview]
## Environment
- Model(s) tested: [list]
- Tools used: [list]
- Platform/API version: [details]
## Approach
### Phase 1: [Name]
[Description, reasoning, results]
### Phase 2: [Name]
[Description, reasoning, results]
## Evidence
[Screenshots, logs, transcripts]
## Payloads
[Code blocks with your actual techniques]
## Lessons Learned
[Reflections]Sharing Results and Writeups
After the Review phase, participants are encouraged to publish detailed writeups. The best writeups share these qualities:
- Honesty about failures. Documenting what did not work is often more instructive than documenting what did. Include your dead ends.
- Explanation of reasoning. Do not just show payloads -- explain why you thought they would work, what mental model you were using, and how you iterated.
- Comparison with alternatives. If you tried multiple approaches, explain why you chose the one you did and how the alternatives compared.
- Reproducibility. Someone reading your writeup should be able to follow your steps and get similar results.
Community Voting
During the Review phase, community members can vote on submissions in three categories:
- Most Educational -- the writeup that teaches the most
- Most Creative -- the most novel or unexpected approach
- Best Documentation -- the clearest, most reproducible writeup
Winners in each category receive bonus points and recognition on the leaderboard.
Leaderboard and Rankings
The community leaderboard tracks cumulative performance across challenges:
- Points accumulate across months. Consistency is rewarded -- participating regularly builds your score faster than occasional high scores.
- Separate leaderboards exist for each difficulty level, so beginners compete with beginners.
- Seasonal rankings reset quarterly, giving new participants a fresh start.
- All-time rankings persist and reflect sustained contribution to the community.
Code of Conduct
Additional rules:
- No spoilers during the Active phase. Share your excitement, not your solutions.
- Respect other participants. Constructive feedback on writeups is welcome; dismissive or elitist commentary is not.
- Attribute borrowed techniques. If your approach builds on someone else's published work, cite it.
- Report platform issues responsibly. If you find a bug in the challenge infrastructure itself (not a challenge objective), report it to organizers rather than exploiting it.
Challenge Types
Monthly Challenges
Monthly challenges are the core of the program. Each month focuses on a specific AI security topic -- prompt extraction, jailbreak research, agent exploitation, defense building, and more. Monthly challenges are designed for solo work and run for the full calendar month.
The monthly challenge series follows a deliberate curriculum arc. Early-year challenges focus on foundational techniques (prompt extraction, jailbreaking), mid-year challenges tackle intermediate and advanced topics (agent exploitation, RAG poisoning, infrastructure security), and late-year challenges build toward professional-level exercises (full engagement simulations, research reproduction).
You do not need to start in January. Each challenge is self-contained and can be attempted independently, though later challenges may reference techniques introduced in earlier ones.
Seasonal Competitions
Seasonal competitions are quarterly capture-the-flag (CTF) events that run over a 48-hour weekend. Unlike monthly challenges, CTFs feature multiple challenge categories, support team participation, and use dynamic scoring where challenge point values decrease as more teams solve them.
CTFs are higher-intensity than monthly challenges. They reward breadth (solving challenges across multiple categories) and speed (dynamic scoring favors early solvers). Team coordination becomes a factor, and time management is as important as technical skill.
Community Contributed Challenges
Community members can submit their own challenges for inclusion in the library. Contributed challenges go through a review process to ensure quality, and they remain permanently available in the challenge library. Contributing a challenge is one of the most impactful ways to give back to the community -- every challenge you create becomes a training opportunity for others.
Tools and Environment Setup
Baseline Toolkit
Most challenges can be attempted with these tools:
| Tool | Purpose | Where to Get It |
|---|---|---|
| curl or httpie | API interaction | System package manager |
| Python 3.10+ | Scripting and automation | python.org |
| jq | JSON parsing | System package manager |
| A text editor or IDE | Writing submissions and scripts | Your preference |
| Burp Suite Community | HTTP traffic analysis (for infrastructure challenges) | portswigger.net |
Challenge-Specific Tools
Some challenges require additional tools. These are listed on each challenge's page. Common additions include:
- Jupyter notebooks for data analysis challenges (incident response, forensics)
- Docker for challenges that provide local environments
- PyTorch or transformers for model-level challenges
- nmap for infrastructure reconnaissance challenges
API Keys and Access
All challenges are accessed through the community challenge platform. After creating an account, you receive an API token that authenticates your requests to challenge endpoints. Each challenge has its own set of endpoints, rate limits, and resource quotas documented on its page.
Frequently Asked Questions
Can I use automated tools (scripts, fuzzers, LLMs) to solve challenges?
Yes, unless a specific challenge says otherwise. Using tools is a legitimate skill. However, you must document your approach in your submission -- a successful automated solution with no explanation of how or why it works receives reduced documentation points.
Can I collaborate with others on monthly challenges?
Monthly challenges are designed for solo work during the Active phase (days 1--21). During the Discussion phase (days 22--28), collaboration is encouraged. For CTFs, team participation is explicitly supported.
What if I find a vulnerability in the challenge platform itself?
Report it to the organizers through the responsible disclosure channel on the community platform. Genuine platform vulnerabilities earn bonus points and recognition. Do not exploit platform bugs to gain an unfair advantage.
Do past challenges expire?
No. Past challenges remain accessible indefinitely. Scores from past challenges count toward your all-time ranking but not toward the month's leaderboard.
Can I re-attempt a challenge I have already submitted?
Yes. Your highest score is kept. Re-attempting after studying other participants' writeups is an excellent learning strategy.
Getting Started
- Choose your first challenge. If you are new, start with the current month's challenge or pick a past challenge at your difficulty level.
- Set up your environment. Each challenge page lists the tools and access you need.
- Work through it. Budget the time indicated by the difficulty level. Do not rush.
- Write it up. Even if you do not complete every objective, document what you tried and learned.
- Submit and engage. Submit your results, then participate in the Discussion and Review phases.
The monthly challenges, seasonal competitions, and community-contributed challenges below offer something for every skill level and interest area. Pick a starting point and begin.