Spring 2026 CTF: Multi-Category AI Security
A multi-category AI security capture-the-flag competition with 20 flags across 5 categories: prompt injection, agent exploitation, RAG attacks, model security, and forensics.
Spring 2026 CTF: Multi-Category AI Security
The Spring 2026 CTF is a broad-spectrum AI security competition featuring 20 challenges across 5 categories. Whether your strength is prompt injection, agent exploitation, infrastructure security, or forensic analysis, there are challenges for you.
Categories and Challenges
Category 1: Prompt Injection Gauntlet (4 challenges)
A series of chatbots with escalating defenses. Extract the flag from each one.
| Challenge | Difficulty | Starting Points | Description |
|---|---|---|---|
| PI-1: Warm Up | Easy | 200 | Undefended chatbot. Extract the flag from its system prompt. A free confidence booster. |
| PI-2: The Gatekeeper | Medium | 350 | Chatbot with instruction-based defense and input keyword filtering. Find the bypass. |
| PI-3: Double Agent | Hard | 450 | Chatbot with dual-LLM architecture: a generator and a safety judge. Both must be bypassed. |
| PI-4: The Vault | Expert | 500 | Chatbot with architectural prompt isolation, I/O filtering, LLM judge, and rate limiting. The flag is in a separated context the model cannot directly access. |
Category notes: These challenges test pure prompt injection skill. No tools, no agents, just you vs. the model and its defenses. PI-4 has historically been solved by fewer than 10% of teams.
Category 2: Agent Exploitation (4 challenges)
Exploit AI agents with tool access to extract flags stored in systems the agents can access but should not reveal.
| Challenge | Difficulty | Starting Points | Description |
|---|---|---|---|
| AE-1: Tool Time | Easy | 200 | An agent with a file reading tool. The flag is in a file the agent can read but is instructed not to share. |
| AE-2: Chain Reaction | Medium | 350 | An agent with email, calendar, and database tools. The flag is constructed from data across all three tools. You must cause the agent to chain tool calls in a specific sequence. |
| AE-3: Indirect Route | Hard | 450 | An agent that reads a web page you control. Inject instructions through the web page content that cause the agent to extract and deliver the flag. |
| AE-4: The Warden | Expert | 500 | An agent with a tool call validator that checks every tool invocation against an allow-list. The flag requires a tool call that the validator should block. |
Category notes: Agent challenges require understanding how agents decide which tools to call and how to influence that decision process. The indirect injection challenge (AE-3) requires hosting content on a URL you control -- a webhook service URL is provided in the challenge description.
Category 3: RAG Attacks (4 challenges)
Exploit retrieval-augmented generation systems by manipulating the retrieval process, the knowledge base, or the generation step.
| Challenge | Difficulty | Starting Points | Description |
|---|---|---|---|
| RAG-1: Knowledge Seeker | Easy | 200 | A Q&A system with a searchable knowledge base. The flag is in a document that the system is instructed not to surface. Retrieve it anyway. |
| RAG-2: Poisoned Well | Medium | 350 | You can add one document to the knowledge base. Craft a document that causes the system to output the flag (hidden in the system prompt) when a specific query is asked. |
| RAG-3: Embedding Escape | Hard | 450 | The system uses embedding-based access control -- documents tagged as "confidential" are filtered from retrieval results based on their embedding cluster. Place a document that evades this filter. |
| RAG-4: Oracle | Expert | 500 | A RAG system where you cannot see the retrieved documents or the system prompt. You can only see the final generated response. Extract the flag through careful query crafting and response analysis. |
Category notes: RAG challenges test understanding of how retrieval, ranking, and generation interact. RAG-4 is a black-box challenge that requires inferring internal state from observable outputs.
Category 4: Model Security (4 challenges)
Exploit the model serving infrastructure, API layer, and configuration of AI deployments.
| Challenge | Difficulty | Starting Points | Description |
|---|---|---|---|
| MS-1: Verbose Error | Easy | 200 | Cause the model API to produce an error message that leaks the flag. Error handling is often the weakest link. |
| MS-2: Tenant Hop | Medium | 350 | A multi-tenant model serving platform. Access another tenant's cached response that contains the flag. |
| MS-3: Config Leak | Hard | 450 | Extract model serving configuration (which contains the flag) through side-channel analysis: timing differences, token count patterns, or response behavior variations. |
| MS-4: Weight Watchers | Expert | 500 | The flag is encoded in the model's behavior -- it was fine-tuned to output the flag in response to a specific trigger phrase. Discover the trigger through behavioral analysis. |
Category notes: Model security challenges require thinking beyond the prompt layer. These test infrastructure awareness, API exploitation skills, and understanding of model internals.
Category 5: AI Forensics (4 challenges)
Analyze logs, artifacts, and traces to find flags hidden in the evidence of AI security incidents.
| Challenge | Difficulty | Starting Points | Description |
|---|---|---|---|
| AF-1: Log Dive | Easy | 200 | Analyze API logs to find the flag hidden in a series of suspicious requests. Pattern recognition in noisy data. |
| AF-2: Conversation Autopsy | Medium | 350 | A conversation log contains evidence of a successful jailbreak. The flag is the exact payload the attacker used (you must reconstruct it from partial logs). |
| AF-3: Poisoned Pipeline | Hard | 450 | Training logs and data samples from a poisoning attack. Identify the poisoned samples -- the flag is constructed from their IDs. |
| AF-4: Ghost in the Machine | Expert | 500 | A complex multi-component system's logs contain evidence of an advanced persistent threat. The attacker left a flag as a calling card, but it is distributed across multiple log sources and encrypted with a key that must be derived from the attack pattern. |
Category notes: Forensics challenges test analytical thinking and attention to detail. Large datasets and red herrings make these time-intensive. Prioritize efficiently.
Infrastructure
Challenge Access
Each team receives a dedicated challenge portal after registration:
https://ctf.redteams.wiki/spring-2026/team/<team-id>/
From the portal, you can:
- Access each challenge's description and API endpoints
- Submit flags for scoring
- View the scoreboard
- Request challenge instance resets (limited to 3 resets per challenge)
Flag Submission
Flags follow the format FLAG\{description-value\}. Submit flags through the portal or API:
POST https://ctf.redteams.wiki/api/submit
Content-Type: application/json
Authorization: Bearer <team-token>
{
"challenge_id": "PI-1",
"flag": "FLAG{example-flag-value}"
}Hint System
Hints are released on a schedule:
| Time | Hints Released |
|---|---|
| T+12 hours | Hints for Easy and Medium challenges |
| T+24 hours | Hints for Hard challenges |
| T+36 hours | Hints for Expert challenges |
Each hint reveals one piece of information about the intended approach. Using a hint does not reduce your score.
Scoring
Dynamic Scoring
Challenge point values start at the listed maximum and decrease as more teams solve them:
| Solves | Approximate Points (from 500 max) |
|---|---|
| 1 solve | 490 |
| 5 solves | 420 |
| 10 solves | 340 |
| 20 solves | 210 |
| 50+ solves | 50 (minimum) |
First Blood Bonus
The first team to solve each challenge receives a 10% bonus on that challenge's points.
Tiebreaker
Ties are broken by submission timestamp of the team's last scoring flag. Teams that finish earlier win ties.
Prizes
| Place | Prize |
|---|---|
| 1st Place | Custom challenge coin + featured writeup on the wiki + 1,000 leaderboard bonus points |
| 2nd Place | Featured writeup on the wiki + 500 leaderboard bonus points |
| 3rd Place | Featured writeup on the wiki + 250 leaderboard bonus points |
| Category Leaders | Recognition badge for the highest scorer in each category |
| Best Writeup | Community-voted award for the most educational post-CTF writeup (500 bonus points) |
Preparation Guide
Recommended Skill Levels
| Category | Minimum Background |
|---|---|
| Prompt Injection | Completed the January and February monthly challenges or equivalent experience |
| Agent Exploitation | Completed the March monthly challenge or equivalent experience |
| RAG Attacks | Understanding of RAG architecture and embedding-based retrieval |
| Model Security | Basic API security testing experience, familiarity with model serving |
| AI Forensics | Log analysis experience, understanding of AI attack patterns |
Warm-Up Resources
- Work through past monthly challenges for the relevant categories
- Review the Prompt Injection & Jailbreaks section for PI challenges
- Review Agent & Agentic Exploitation for AE challenges
- Review RAG, Data & Training Attacks for RAG challenges
- Review Cloud AI Security for MS challenges
- Review AI Forensics & Incident Response for AF challenges
Post-CTF
After the CTF closes:
- Scoreboard freezes and final rankings are published within 1 hour
- Challenge source code is released within 48 hours
- Official writeups from challenge authors are published within 1 week
- Community writeups are encouraged and eligible for the Best Writeup award for 2 weeks after the CTF
The community writeup channel opens immediately after the CTF closes. Share your approaches, partial solutions, and lessons learned.
Strategy for the Spring CTF
Category Prioritization
With 20 challenges across 5 categories, you cannot solve everything in 48 hours. Prioritize strategically:
- Start with your strongest category. Early solves earn the most points due to dynamic scoring. Solve the easy and medium challenges in your best category first.
- Pick up easy challenges across all categories. Easy challenges (200 starting points) are worth solving even in weak categories because they take minimal time.
- Assess hard challenges before committing. Read all hard and expert challenge descriptions before attempting any of them. Some hard challenges in unfamiliar categories may be easier for you than others.
- Leave expert challenges for the second day. Expert challenges take the longest. Spend day one collecting points from easier challenges, then dedicate day two to hard and expert attempts.
Team Role Assignments
For a team of 4, a recommended role assignment:
| Role | Primary Categories | Secondary |
|---|---|---|
| Player 1 | Prompt Injection + RAG | Agent Exploitation Easy |
| Player 2 | Agent Exploitation + Model Security | Forensics Easy |
| Player 3 | AI Forensics + Model Security | RAG Easy |
| Player 4 | Flexible (supports wherever progress stalls) | Documentation and writeups |
The flexible player role is critical. Having someone who can jump into any category, provide a fresh perspective on stuck challenges, and handle support tasks (managing notes, tracking the scoreboard, coordinating hint usage) significantly improves team performance.
Time Management
A 48-hour CTF with 20 challenges means an average of 2.4 hours per challenge. But this average is misleading -- easy challenges take 15 minutes while expert challenges take 4+ hours. A better time budget:
| Difficulty | Time Budget | Challenges | Total Time |
|---|---|---|---|
| Easy (5) | 30 min each | 5 | 2.5 hours |
| Medium (6) | 1.5 hours each | 6 | 9 hours |
| Hard (5) | 3 hours each | 5 | 15 hours |
| Expert (4) | 5 hours each | 4 | 20 hours |
| Sleep and breaks | -- | -- | 16 hours |
| Total | 62.5 hours |
This exceeds 48 hours for a solo player, which is why team participation is so important. A 4-person team has 192 person-hours available (minus sleep), which comfortably covers all challenges.