# labs

Geavanceerde AI red team-labs

Advanced hands-on labs covering automated attack algorithms, production red team tooling, multi-model testing, and sophisticated exploitation techniques for experienced AI security practitioners.

labsadvanced

advancedlabragmanipulationlabs

Lab: geavanceerde RAG-manipulatie

Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.

advancedlabadversarialrobustnessevallabs

Evaluatie van adversarial robuustheid

Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.

labsgcgadversarial-suffixoptimization

Optimalisatie van adversarial suffixes (GCG)

Implement the Greedy Coordinate Gradient attack to generate adversarial suffixes.

labsagenttool-chainexploitation

Misbruik van de toolchain van agents

Chain multiple tool calls in an agent system to achieve multi-step exploitation.

labsalignmentboundary-probingadvanced

Lab: onderzoek van alignment-grenzen

Systematically probe alignment boundaries to map the exact thresholds where safety training engages and disengages.

autodanadvancedlabimplementationlabs

Lab: AutoDAN implementeren

Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.

labsdefense-evaluationautomatedadvanced

Framework voor geautomatiseerde verdedigingsevaluatie

Build an automated framework to evaluate defensive measures across attack categories.

labsautomated-pipelinered-teamadvanced

Lab: een geautomatiseerde red team-pijplijn bouwen

Build a complete automated red teaming pipeline with attack generation, execution, scoring, and reporting.

labsbackdoorpersistence-testingadvanced

Lab: persistentie van backdoors door veiligheidstraining

Test whether fine-tuned backdoors persist through subsequent safety training rounds and RLHF alignment.

labsconstitutional-aibypassadvanced

Lab: technieken om Constitutional AI te omzeilen

Test and bypass Constitutional AI safety mechanisms by exploiting the critique-revision training methodology.

labscross-lingualtransfer-attacksadvanced

Lab: cross-linguale transfer-aanvallen

Develop attacks in one language that transfer to others by exploiting shared multilingual representation spaces.

labstransfercross-modeladvanced

Cross-model transfer-aanvallen

Develop attacks on open-source models that transfer to closed-source commercial APIs.

exploitationadvancedcrosslabpluginlabs

Cross-plug-in exploitatieketen

Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.

labscurriculumpoisoningadvanced

Lab: poisoning van het trainingscurriculum

Exploit training data ordering and curriculum learning to amplify the impact of small numbers of poisoned examples.

classifiercustomadvancedlabtraininglabs

Training van een custom safety-classifier

Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.

labsharnesscustomred-team

Een custom red team-harness bouwen

Build a complete red team testing harness with parallel execution, logging, and scoring.

defenseadvancedlabteamredlabs

War game: verdediging versus red team

Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.

labsdifferential-testingcross-modeladvanced

Differentieel testen over modellen heen

Use differential testing to find behavior inconsistencies across model providers.

labsdistributed-trainingattack-simulationadvanced

Lab: simulatie van aanvallen op gedistribueerde training

Simulate attacks on distributed training infrastructure including gradient poisoning and aggregation manipulation.

labsembeddingadversarialadvanced

Adversarial verstoring van embeddings

Craft adversarial inputs that produce target embeddings for retrieval manipulation.

labsevaluationframework-gamingadvanced

Lab: evaluatieframeworks manipuleren

Demonstrate how to game safety evaluation frameworks to produce artificially high safety scores while retaining vulnerabilities.

labsfederated-learningpoisoningadvanced

Lab: poisoning-aanvallen op federated learning

Execute model poisoning attacks in a federated learning simulation by manipulating local model updates.

labsfine-tuningbackdooradvanced

Backdoor invoegen via fine-tuning

Insert a triggered backdoor during fine-tuning that activates on specific input patterns.

labsgradient-basedattacksadvanced

Lab: implementatie van een gradient-gebaseerde aanval

Implement gradient-based adversarial attacks on open-weight models including PGD, FGSM adapted for text inputs.

advancedlabgradientguidedpoisoninglabs

Gradient-gestuurde datavergiftiging

Use gradient information from open-source models to craft optimally poisoned training examples.

labsguardrail-bypassensembleadvanced

Ensemble om guardrails te omzeilen

Build an ensemble attack system that combines multiple bypass techniques for robustness.

labsinference-timecompute-exploitationadvanced

Lab: misbruik van rekenkracht tijdens inferentie

Exploit inference-time compute scaling to manipulate reasoning depth and resource consumption in thinking models.

labsjailbreaktransferabilityadvanced

Lab: analyse van de overdraagbaarheid van jailbreaks

Analyze jailbreak transferability across model families to discover universal vulnerability patterns.

Multi-hop misbruik van MCP-servers

Chain exploits across multiple MCP servers to achieve lateral movement in agent systems.

labsmcpmulti-hopadvanced

labsmemory-extractionlong-termadvanced

Lab: extractie van langetermijngeheugen

Extract information from long-term agent memory stores through crafted queries and memory retrieval manipulation.

distillationadvancedlabattacklabsmodel

Lab: beveiliging van modeldistillatie

Extract model capabilities through distillation techniques using only black-box API access.

labsmodel-mergingsecurity-analysisadvanced

Lab: beveiligingsanalyse van model merging

Analyze security implications of model merging techniques and test for backdoor propagation through merged model weights.

labssteering-vectorsactivationadvanced

Lab: modelsturing met activatievectoren

Use activation steering vectors to control model behavior without prompt modification for security testing.

labsmulti-modalattack-chainadvanced

Lab: een multimodale aanvalsketen construeren

Construct attack chains that span text, image, and audio modalities to exploit cross-modal processing gaps.

advancedlablabsattackmultiobjective

Multi-objective aanvalsoptimalisatie

Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.

labscrescendomulti-turnadvanced

Multi-turn crescendo-aanval

Implement Microsoft's Crescendo attack pattern for gradual multi-turn jailbreaking.

chainadvancedlabmultimodalattacklabs

Lab voor multimodale aanvalsketens

Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.

labsmultimodalimageinjection

Multimodale image-injectie

Embed adversarial text in images that triggers prompt injection in vision-language models.

labsneural-trojandetectionadvanced

Lab: detectie van trojans in neurale netwerken

Implement and test neural network trojan detection methods including activation clustering and spectral analysis.

labsoutput-constraintsexploitationadvanced

Misbruik van uitvoerbeperkingen

Exploit structured output requirements to force models into revealing restricted information.

labspairautomatedjailbreaking

Geautomatiseerde PAIR-aanvallen implementeren

Implement the PAIR (Prompt Automatic Iterative Refinement) attack for automated jailbreaking.

productionadvancedlabsimulationlabs

Lab voor simulatie van een productieomgeving

Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.

labsinjection-compilertoolingadvanced

Lab: een prompt injection-compiler bouwen

Build a compiler that transforms high-level attack specifications into optimized prompt injection payloads.

labslogprobsprompt-leakageadvanced

Promptlekkage via log probabilities

Extract system prompt information using token log probability analysis.

labsquantizationsecurity-testingadvanced

Lab: beveiligingstesten van kwantisatie

Test behavioral differences between full-precision and quantized models to discover quantization-induced vulnerabilities.

Geavanceerde RAG-poisoning-technieken

Execute sophisticated RAG poisoning including gradient-guided document crafting.

labsragpoisoningadvanced

labsreasoning-tracemanipulationadvanced

Lab: manipulatie van reasoning traces

Manipulate chain-of-thought reasoning traces to inject false premises and redirect model conclusions.

labsred-team-orchestrationadvanced-techniquesadvanced

Lab: geavanceerde red team-orchestratie

Build an advanced red team orchestration system that coordinates multiple attack agents against a defended target.

labsrepresentation-engineeringsecurityadvanced

Lab: representation engineering voor beveiliging

Use representation engineering to analyze and manipulate internal model representations for security research.

probingadvancedlabrepresentationlabs

Representation probing voor kwetsbaarheden

Probe internal model representations to identify exploitable features and develop representation-level attacks.

labsreward-hackingadvanced-techniquesadvanced

Lab: geavanceerde reward hacking-technieken

Develop reward hacking exploits that manipulate RLHF reward models through adversarial optimization.

analysisadvancedlabrewardlabsmodel

Lab voor analyse van reward-modellen

Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.

labsreward-modelgamingadvanced

Gaming van reward-modellen (geavanceerd lab)

Craft inputs that score highly on reward models while containing adversarial content.

labsrlhfexploitationadvanced

Lab: misbruik van RLHF-training

Exploit reward model weaknesses to craft inputs that score highly on safety metrics while achieving adversarial objectives.

labssafety-classifierextractionadvanced

Lab: modelextractie van een veiligheidsclassifier

Extract the decision boundary of safety classifiers through systematic probing to craft maximally evasive payloads.

labsclassifierinversionadvanced

Inversie van veiligheidsclassifiers

Reverse-engineer safety classifier decision boundaries through systematic probing.

labssparse-attentionexploitationadvanced

Lab: misbruik van sparse attention

Exploit sparse attention patterns in long-context models to hide injection payloads in low-attention regions.

labssupply-chainmodel-poisoningadvanced

Lab: poisoning van de model-supply chain

Simulate model supply chain attacks by injecting backdoors into model weights distributed through public registries.

labstapautomatedjailbreaking

Implementatie van Tree of Attacks (TAP)

Build a TAP attack system that generates and prunes jailbreak attempts using tree search.

labstoken-leveladversarial-optimizationadvanced

Lab: adversarial optimalisatie op tokenniveau

Implement token-level adversarial optimization to discover minimal perturbations that bypass safety training.

labstool-chainlateral-movementadvanced

Lab: lateral movement via de tool-keten

Chain tool call exploits to achieve lateral movement across services connected to LLM agent systems.

labsa2atask-injectionadvanced

A2A protocol task injection

Inject malicious tasks into Agent-to-Agent protocol communication channels to redirect multi-agent workflows.

labsagent-workflowstate-tamperingadvanced

Tampering met de workflow-state van agents

Manipulate agent workflow state machines to skip validation steps and reach privileged execution paths.

labsattentionmanipulationtransformeradvanced

Manipulatie van attention-patronen

Craft inputs that manipulate transformer attention patterns to prioritize adversarial content over safety instructions.

labsaudioinjectionwhisperadvanced

Audio injection via speech-to-text-modellen

Craft adversarial audio that embeds prompt injection payloads when transcribed by speech-to-text models.

labsautodanstealthy-jailbreakadvanced

AutoDAN implementeren en testen

Implement the AutoDAN methodology for generating human-readable stealthy jailbreak prompts using gradient guidance.

labsconstitutional-classifiersbypassadvanced

Constitutional classifier omzeilen

Develop techniques to bypass Anthropic-style constitutional classifiers through adversarial input crafting.

labsgcgcross-modeltransferadvanced

Cross-model GCG transfer-aanvallen

Generate adversarial suffixes on open-source models and test their transferability to commercial APIs.

labsdashboardred-teamvisualizationadvanced

Een dashboard voor red team-resultaten bouwen

Build a real-time dashboard for tracking and visualizing red team campaign results across targets and techniques.

labsdifferential-testingmodel-behavioradvanced

Analyse van differentieel modelgedrag

Use differential testing across model versions and providers to discover inconsistent safety behaviors.

labsembeddinginversionprivacyadvanced

Implementatie van embedding-inversieaanval

Implement embedding inversion to recover original text from vector database embeddings.

labsfine-tuningalignment-removaladvanced

Aanval om alignment via fine-tuning te verwijderen

Use fine-tuning API access to systematically remove safety alignment with minimal training examples.

labsgui-agentvisual-injectionadvanced

Visuele injection-aanval op GUI-agents

Inject adversarial content into screenshots and UI elements processed by computer-use AI agents.

labsharmbenchcustom-attacksevaluationadvanced

Eigen aanval indienen bij HarmBench

Develop and evaluate custom attack methods against the HarmBench standardized evaluation framework.

labslangchaincveexploitationadvanced

Lab: misbruik van LangChain-CVE's

Reproduce and analyze LangChain CVEs including CVE-2023-29374 and CVE-2023-36258 in a safe lab environment.

labslorabackdoorinsertionadvanced

Aanval om een backdoor in LoRA in te voegen

Insert triggered backdoors through LoRA fine-tuning that activate on specific input patterns while passing safety evals.

labsmcpmulti-serverpivotadvanced

Pivot-keten over meerdere MCP-servers

Chain exploits across multiple MCP servers to achieve lateral movement and capability escalation in agent systems.

labsagent-memoryinjectionpersistenceadvanced

Agent memory injection voor persistente toegang

Inject persistent instructions into agent memory systems that survive across conversation sessions.

labsmulti-agenttrust-boundaryadvanced

Misbruik van vertrouwensgrenzen in multi-agent-systemen

Exploit trust boundaries between cooperating agents to escalate privileges and access restricted capabilities.

labsmultimodalchain-attackorchestrationadvanced

Orkestratie van multimodale aanvalsketens

Orchestrate attacks across text, image, and document modalities to bypass per-modality safety filters.

labspromptfoored-teamtest-suiteadvanced

Ontwikkeling van een red team-testsuite met promptfoo

Build comprehensive red team test suites in Promptfoo with custom graders and multi-model targeting.

labsquantizationsafety-regressionadvanced

Regressietesten van veiligheid bij kwantisatie

Test how model quantization (INT8, INT4, GPTQ) degrades safety alignment and introduces exploitable gaps.

labsragaccess-controlbypassadvanced

Toegangscontrole van RAG omzeilen

Bypass document-level access controls in enterprise RAG systems through query manipulation and context injection.

labsreasoning-tracechain-of-thoughtexploitationadvanced

Misbruik van reasoning traces in CoT-modellen

Exploit visible chain-of-thought reasoning traces in models like o1 and DeepSeek-R1 to manipulate outputs.

labsrepresentation-probinginternalsadvanced

Representation probing voor het ontdekken van kwetsbaarheden

Probe model internal representations to discover exploitable features and latent vulnerability patterns.

labsreward-modeladversarialadvanced

Adversarial misbruik van reward models

Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.

labssafety-classifierreverse-engineeringadvanced

Reverse engineering van veiligheidsclassifiers

Reverse-engineer a safety classifier's decision boundaries through systematic adversarial probing.

labssandbox-escapecode-agentsadvanced

Technieken voor sandbox escape van code-agents

Develop and test sandbox escape techniques against code execution environments in AI coding assistants.

labssteganographyimagespayloadadvanced

Steganografische payloads in afbeeldingen verbergen

Hide prompt injection payloads in images using steganographic techniques undetectable to human observers.

labstraining-dataextractionproductionadvanced

Extractie van trainingsdata uit productie-LLM's

Implement Carlini et al.'s techniques to extract memorized training data from production language model APIs.

labswatermarkdetectionremovaladvanced

Detectie en verwijdering van LLM-watermerken

Detect and remove statistical watermarks from LLM-generated text while preserving content quality.

labswatermarkingdetection-removaladvanced

Lab: detectie en verwijdering van LLM-watermerken

Detect and analyze LLM text watermarks using statistical methods and test watermark removal through paraphrasing.

labsgetting-startedbeginnersetup

Aan de slag met AI-redteaming-labs

Introduction to the hands-on lab environment for AI red teaming. Learn what you'll need, how labs are structured, and get an overview of all beginner exercises.

labsapi-authenticationtestingbeginner

Lab: beveiligingstesten van API-authenticatie

Test LLM API authentication mechanisms for vulnerabilities including token leakage, replay attacks, and scope bypass.

explorationlabbeginnerlabsmultiapi

Verkenning van multi-provider-API's

Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.

labsapirate-limitsbeginner

API rate limiting en foutafhandeling

Test LLM API rate limits and implement proper error handling for automated testing workflows.

labsapiresponse-parsingbeginner

Lab: API-responses parsen en analyseren

Parse and analyze LLM API responses to identify anomalies, safety filter triggers, and information leakage patterns.

labsautomated-testingsetupbeginner

Lab: geautomatiseerd LLM-testen opzetten

Configure automated testing pipelines for LLM red teaming using Python scripts, test runners, and result collection.

labsadversarial-examplescraftingbeginner

Lab: basale adversarial voorbeelden maken

Create basic adversarial examples that cause LLMs to misclassify, misinterpret, or bypass safety checks on text input.

basiclabbeginnerlabsautomatedtesting

Basisopzet voor geautomatiseerd testen

Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.

labsclassifierevasionbeginner

Basis classifier-ontwijking

Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.

labsdata-exfiltrationbeginner

Basistechnieken voor data-exfiltratie

Extract sensitive information from LLM applications using social engineering and misdirection.

testingdefensebasiclabbeginnerlabs

Basistesten van verdedigingsmechanismen

Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.

labsfuzzingintroductionbeginner

Lab: inleiding tot LLM-fuzzing

Learn the basics of fuzzing LLM inputs by mutating prompts to discover unexpected behaviors and edge cases.

labsjailbreaktaxonomybeginner

Lab: taxonomie van jailbreak-technieken

Explore the major categories of jailbreak techniques and practice classifying attack payloads by technique type.

labsrag-testingbasicsbeginner

Lab: basaal beveiligingstesten van een RAG-systeem

Introduction to testing Retrieval-Augmented Generation systems for document injection and retrieval manipulation vulnerabilities.

reportingbasiclabbeginnerexerciselabs

Oefening in het documenteren van beveiligingsbevindingen

Practice documenting security findings in a professional format with reproducible steps and impact assessment.

labsprompt-injectionrole-playbeginner

Basis role-play prompt injection

Use role-play techniques to bypass basic LLM safety guardrails through character assumption.

labssocial-engineeringllmbeginner

Lab: social engineering van LLM-applicaties

Practice social engineering techniques adapted for LLM applications including authority escalation and urgency injection.

labscontent-policymappingbeginner

Lab: grenzen van het contentbeleid in kaart brengen

Map the content policy boundaries of LLM applications to identify permitted and restricted topic areas systematically.

labscontext-windowoverflowbeginner

Basis context window overflow

Fill the context window with padding content to push safety instructions out of the attention window.

historyconversationlabbeginnerbasicslabs

Analyse van conversatiegeschiedenis

Analyze how conversation history influences model behavior and identify opportunities for context manipulation.

labsconversation-historymanipulationbeginner

Lab: manipulatie van de conversatiegeschiedenis

Test how LLM applications handle conversation history including truncation, injection, and context window management.

labsdefense-identificationreconnaissance

LLM-verdedigingen identificeren

Map the defensive layers of an LLM application through systematic probing and error analysis.

labsdefense-testingintroductionbeginner

Lab: inleiding tot het testen van verdedigingen

Learn to systematically test LLM application defenses by probing input filters, output validators, and guardrails.

labsdelimitersescapebeginner

Delimiter escape-technieken

Practice escaping common delimiters used to separate system prompts from user input.

securitylabbeginnerembeddingbasicslabs

Embedding-grondbeginselen voor beveiliging

Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.

labsembeddingsfundamentalsbeginner

Lab: grondbeginselen van embeddings voor redteamers

Learn embedding fundamentals including vector similarity, semantic search, and how embeddings enable RAG systems.

labsencodingobfuscationbeginner

Grondbeginselen van encoding en obfuscatie

Explore how Base64, ROT13, and Unicode encoding can bypass input filters in LLM applications.

labsenvironmenthardeningbeginner

Lab: hardening van de testomgeving

Secure your AI red team testing environment to prevent credential leaks, accidental production testing, and data exposure.

labserror-messagesreconnaissancebeginner

Lab: foutmeldingen analyseren voor verkenning

Analyze LLM application error messages to extract information about system architecture, models, and defensive layers.

errorlabbeginnerintelligencelabsmessage

Inlichtingen verzamelen uit foutmeldingen

Extract system architecture information from error messages and response patterns in LLM applications.

labsevidence-collectionloggingbeginner

Lab: bewijsverzameling bij LLM-testen

Set up evidence collection workflows including request/response logging, screenshots, and reproducibility documentation.

labsfew-shotinjectionbeginner

Grondbeginselen van few-shot injection

Craft few-shot examples that prime the model to follow attacker instructions in subsequent turns.

Je eerste Claude-API-call

Set up the Anthropic SDK and make your first Claude API call with system prompts and messages.

labsanthropicapibeginner

Je eerste LLM-API-call met OpenAI

Set up your Python environment and make your first LLM API call to understand request/response patterns.

labsopenaiapibeginner

labsformat-injectionstringsbeginner

Lab: format string-injectie in LLM's

Practice injecting format strings and template directives to manipulate LLM output structure and content.

labsgarakscanningbeginner

Je eerste Garak-kwetsbaarheidsscan

Install and run NVIDIA's Garak tool to perform automated vulnerability scanning on an LLM endpoint.

labshallucinationdetectionbeginner

Lab: grondbeginselen van hallucinatiedetectie

Learn to detect and trigger hallucinations in LLM outputs including factual errors, fabricated citations, and invented APIs.

injectionlabbeginnerlabshelloworld

Hello world-prompt injection

Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.

labsinjection-logginganalysisbeginner

Lab: loganalyse van injection-pogingen

Build a logging pipeline to capture and analyze injection attempts for pattern discovery and technique refinement.

labsinstruction-hierarchytestingbeginner

Testen van de instructiehiërarchie

Test how models prioritize conflicting instructions between system, user, and assistant roles.

labstemperaturesamplingjailbreaking

Effecten van temperature en sampling op jailbreaks

Experiment with temperature and top-p settings to understand their impact on jailbreak success rates.

exploitationoutputlabbeginnerlabsjson

Grondbeginselen van het misbruiken van JSON-uitvoer

Exploit LLM JSON mode to bypass content filters and extract information through structured output channels.

labsjson-modeoutput-testingbeginner

Lab: beveiligingstesten van de JSON-uitvoermodus

Test JSON output mode for injection vulnerabilities including schema manipulation and structured output exploitation.

labsmulti-languagetestingbeginner

Lab: meertalige prompts testen

Test LLM safety boundaries across multiple languages to identify inconsistencies in safety training coverage.

labslocal-modelollama-setupbeginner

Lab: een lokaal model opzetten voor testen

Set up local LLM instances using Ollama for safe, cost-free red team testing without API costs or rate limits.

comparisonsecuritylabbeginnerlabsmodel

Lab voor modelbeveiligingsvergelijking

Compare the security posture of different LLM models by running identical test suites across providers.

labsfingerprintingmodel-identificationbeginner

Lab: basale modelfingerprinting

Learn to identify which LLM model powers an application through behavioral fingerprinting and response analysis.

labsmulti-turnprobingbeginner

Probing van multi-turn-conversaties

Use multi-turn conversations to gradually escalate requests and probe LLM safety boundaries.

labsmultimodalbasicsbeginner

Lab: grondbeginselen van het testen van multimodale invoer

Introduction to testing multimodal LLMs with image and text inputs to understand cross-modal behavior.

labsoutput-filteringdetectionbeginner

Lab: uitvoerfilters detecteren

Identify and characterize output filtering mechanisms in LLM applications through systematic response analysis.

outputformatlabbeginnercontrollabs

Lab voor controle over het uitvoerformaat

Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.

labsoutput-manipulationformatbeginner

Manipulatie van het uitvoerformaat

Force LLMs to output in specific formats (JSON, XML, code) to bypass output sanitization.

labsloggingpayloadsbeginner

Payload-logging opzetten

Build a payload logging system to track prompt injection attempts and model responses.

labsplaygroundexplorationbeginner

Lab: verkenning van de LLM-playground

Explore LLM provider playgrounds to understand model behavior, parameter effects, and interactive testing workflows.

labbeginnerlabsplaygroundtesting

Beveiligingstesten van LLM-playgrounds

Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.

labsprompt-leakingdefense-testingbeginner

Lab: verdedigingen tegen prompt leaking testen

Test various prompt leaking defense configurations to evaluate their effectiveness against extraction techniques.

analysisstructurelabbeginnerpromptlabs

Lab voor analyse van promptstructuur

Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.

labsprompt-templatestestingbeginner

Lab: kwetsbaarheden van prompt-templates testen

Test common prompt template patterns for injection vulnerabilities including string interpolation and placeholder abuse.

labspromptfooevaluationbeginner

Promptfoo opzetten voor LLM-evaluatie

Configure Promptfoo to create automated test suites for evaluating LLM safety and robustness.

labsrate-limitingenumerationbeginner

Lab: enumeratie en omzeilen van rate limits

Enumerate API rate limits and test common bypass techniques including header manipulation and request distribution.

mappingratelabbeginnerlimitlabs

In kaart brengen van rate limits en quota

Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.

labsrefusal-patternsanalysisbeginner

Lab: weigeringspatronen van modellen analyseren

Systematically map model refusal patterns to understand safety training boundaries and identify potential bypasses.

labsreport-writingdocumentationbeginner

Lab: grondbeginselen van het schrijven van red team-rapporten

Practice writing clear, actionable red team findings reports with evidence, risk ratings, and remediation guidance.

analysisresponselabbeginnerbasicslabs

Grondbeginselen van responsanalyse

Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.

labsconsistencytestingbeginner

Lab: consistentie van responses testen

Test LLM response consistency across repeated queries to identify non-deterministic safety boundary behavior.

labssafety-benchmarkstestingbeginner

Lab: veiligheidsbenchmarks uitvoeren

Run standardized safety benchmarks against LLM models to establish baseline safety profiles for comparison.

safetymappinglabbeginnerboundarylabs

In kaart brengen van veiligheidsgrenzen

Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.

labssafety-testingfundamentalsbeginner

Lab: inleiding tot veiligheidstesten

Learn the fundamentals of LLM safety testing including test case design, baseline measurement, and result documentation.

probingsafetylabbeginnertraininglabs

Probing van de grenzen van veiligheidstraining

Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.

labsscoringvulnerability-ratingbeginner

Lab: grondbeginselen van kwetsbaarheidsscoring

Learn vulnerability scoring frameworks adapted for LLM systems including severity, exploitability, and impact assessment.

labspayload-encodingtechniquesbeginner

Lab: eenvoudige technieken voor payload-encoding

Practice encoding injection payloads using Base64, hex, URL encoding, and Unicode to bypass basic input filters.

labssystem-promptenumerationbeginner

Lab: technieken voor enumeratie van system prompts

Practice systematic approaches to enumerating hidden system prompt content through varied probing strategies.

labssystem-promptextractionbeginner

Grondbeginselen van extractie van de systeemprompt

Learn to extract hidden system prompts from LLM applications using direct and indirect techniques.

labstemperaturesampling-securitybeginner

Lab: beveiligingseffecten van temperatuur en sampling

Explore how temperature and sampling parameters affect model susceptibility to jailbreaks and safety boundary enforcement.

labstest-case-designmethodologybeginner

Lab: LLM-red team-testcases ontwerpen

Design effective red team test cases with clear objectives, success criteria, and reproducible execution procedures.

labstokenscountingbeginner

Tokens tellen en schatten

Understand tokenization by counting and estimating tokens across different models and encoders.

labstokensmanipulationbeginner

Lab: grondbeginselen van tokenmanipulatie

Understand how tokenization affects LLM behavior and practice basic token-level manipulation techniques.

labstool-comparisonevaluationbeginner

Lab: red team-testtools vergelijken

Compare Garak, PyRIT, and Promptfoo capabilities through hands-on exercises using each tool against the same target.

labbeginnerconfigtoolinstallationlabs

Installatie en configuratie van red team-tools

Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.

labsapi-keyscopepermissionsbeginner

Testen van API-key scope en permissies

Test API key scoping and permission boundaries to identify over-privileged access configurations.

labsapiheadersanalysisbeginner

Analyse van API-responseheaders

Analyze HTTP response headers from LLM APIs to fingerprint providers, versions, and middleware.

labsindirect-injectionbasicsbeginner

Basis indirecte prompt injection

Plant and trigger a basic indirect prompt injection payload in content consumed by an LLM.

labsmutationpayloadsbeginner

Basistechnieken voor payload-mutatie

Learn fundamental payload mutation strategies including synonym replacement and structural transforms.

labsragquery-injectionbeginner

Basis RAG query injection

Craft user queries that manipulate RAG retrieval to surface unintended documents.

labsrate-limitingabusebeginner

Basispatronen voor misbruik van rate limits

Test common rate-limit bypass patterns including header manipulation and endpoint discovery.

labsencodingbypassunicodebeginner

Technieken om karaktercodering te omzeilen

Bypass input filters using Unicode normalization, homoglyph substitution, and mixed-script encoding.

labspersonacapability-mappingbeginner

Mapping van chatbot-persona en -capaciteiten

Map the configured persona, capabilities, and restrictions of an LLM chatbot through systematic probing.

labscompletion-hijackingbasicsbeginner

Grondbeginselen van completion hijacking

Craft partial sentences that steer model completions toward attacker-desired outputs.

labsconversation-resetcontextbeginner

Conversation reset-aanvallen

Exploit conversation resets and context clearing to weaken model adherence to safety instructions.

labsemojiunicodeinjectionbeginner

Injectietechnieken met emoji en Unicode

Use emoji sequences and Unicode special characters to bypass text-based input filters.

labserror-messagesexploitationbeginner

Misbruik van foutmeldingen

Trigger and analyze error messages from LLM applications to reveal architecture and configuration details.

labsharmbenchevaluationbeginner

Je eerste HarmBench-evaluatie

Run a standardized safety evaluation using the HarmBench framework against a target model.

labsinspect-aievaluationbeginner

Je eerste Inspect AI-evaluatie

Set up and run a basic AI safety evaluation using the UK AISI Inspect framework.

labsjsoninjectionstructured-databeginner

Grondbeginselen van JSON-injectie

Inject adversarial content through JSON-formatted inputs to exploit structured data processing.

labsllm-guardscanningbeginner

Je eerste LLM Guard-scan

Install and configure LLM Guard to scan inputs and outputs, understanding detection capabilities and gaps.

labsmarkdownexfiltrationrenderingbeginner

Exfiltratie via Markdown-rendering

Exploit Markdown rendering in chat UIs to exfiltrate data through image tags and links.

labsfingerprintingmodel-identificationbeginner

Grondbeginselen van modelfingerprinting

Identify which LLM model powers an application through behavioral fingerprinting techniques.

labsnemo-guardrailsintrobeginner

Inleiding tot NeMo Guardrails

Set up NVIDIA NeMo Guardrails and create your first Colang-based guardrail for an LLM application.

labsfilenamesinjectionuploadbeginner

Prompt injection via bestandsnamen

Embed prompt injection payloads in filenames and metadata of uploaded documents.

labsprompt-leakingsummarizationbeginner

Prompt leaking via samenvattingsverzoeken

Use summarization requests to trick an LLM into revealing its hidden system prompt through indirect disclosure.

labsrefusal-analysisreconnaissancebeginner

Weigeringsberichten analyseren voor intel

Extract useful information about model configuration and guardrails by systematically analyzing refusal messages.

labstimingside-channelanalysisbeginner

Side-channel-analyse van responstiming

Use response timing differences to infer information about model processing and guardrail activation.

labssafety-boundariesmappingbeginner

Oefening in het in kaart brengen van veiligheidsgrenzen

Systematically map the safety boundaries of an LLM application across multiple topic categories.

labsoutput-constraintsattacksbeginner

Eenvoudige aanvallen op uitvoerbeperkingen

Force models to output in constrained formats that bypass output safety filters.

labssystem-promptreconstructionbeginner

Reconstructie van de systeemprompt op basis van aanwijzingen

Piece together a hidden system prompt by collecting partial information from multiple probe techniques.

labssystem-prompterror-injectionbeginner

Extractie van de systeemprompt via foutinjectie

Trigger specific error conditions that cause the model to inadvertently reveal system prompt contents.

labstemperaturetop-ksafetybeginner

Effecten van temperature en top-k op veiligheid

Systematically test how temperature, top-k, and top-p parameters affect safety guardrail effectiveness.

labstranslationinjectionmultilingualbeginner

Prompt injection via vertaling

Exploit LLM translation capabilities to smuggle instructions through language boundaries.

labsxmlinjectionstructured-databeginner

XML-injectie in LLM-contexten

Exploit XML tag handling in LLM applications to manipulate instruction parsing.

Agent Heist: niveau 2 — MCP-server overnemen

Take control of an MCP-enabled agent by poisoning tool descriptions and chaining exploits.

labsctfagentmcp

labsctfalignmentfine-tuned

Alignment Breaker: niveau 2 — op veiligheid fine-getuned model

Bypass safety fine-tuning on a model with RLHF, constitutional AI, and classifier-based defenses.

Cloud Infiltrator-uitdaging

Navigate through cloud AI service misconfigurations to access a protected model endpoint and extract its secrets.

infiltratorlabscloudctf

labsctfsystem-promptextraction

Context Heist: kampioenschap system prompt-extractie

Extract system prompts from 10 progressively harder LLM applications using diverse techniques.

heistctfchallengedatalabs

Data Heist-uitdaging

Extract a secret database of customer records from a RAG-enabled chatbot with strict data access controls.

Defense Destroyer-uitdaging

Bypass a state-of-the-art multi-layer defense system to extract a protected secret from the model.

destroyerlabsdefensectf

Defense Gauntlet: niveau 2 — gelaagde beveiliging

Bypass a layered defense system with input classification, guardrails, and output filtering.

labsctfdefenselayered

Encoding Maestro-uitdaging

Solve 10 encoding puzzles that require creative use of Base64, Unicode, ROT13, and custom encodings to bypass filters.

encodinglabsmaestroctf

forensicsdetectivelabsctf

Forensics Detective-uitdaging

Analyze logs and artifacts from an AI security incident to reconstruct the attack chain and identify the attacker's technique.

Guardrail Olympics: meerdere frameworks omzeilen

Bypass guardrail implementations from NeMo, LLM Guard, Rebuff, and custom classifiers.

labsctfguardrailsbypass

labsctfjailbreakingadaptive

Jailbreak Gauntlet: niveau 2 — adaptieve verdedigingen

Break through adaptive defense systems that learn from your previous attack attempts.

Model Whisperer-uitdaging

Extract the exact system prompt from 8 progressively harder LLM applications using creative extraction techniques.

whispererlabsmodelctf

ctfsmugglerchallengepromptlabs

Prompt Smuggler-uitdaging

Smuggle a specific payload through 5 layers of increasingly sophisticated input filtering to capture the flag.

RAG Infiltrator: niveau 2 — enterprise-kennisbank

Infiltrate and exfiltrate data from a multi-tier enterprise RAG system with access controls.

labsctfragenterprise

Shadow Agent-uitdaging

Take covert control of a multi-agent system by poisoning inter-agent communication without triggering monitors.

agentlabsshadowctf

labsctfsupply-chainregistry

Supply Chain Saboteur: niveau 2 — model registry

Compromise a model registry to inject backdoored model weights into a deployment pipeline.

Time Traveler-uitdaging

Exploit conversation memory and session management to manipulate the temporal context of an AI assistant.

timelabstravelerctf

labsctfadversarialoptimization

Token Wars: adversarial optimalisatie-uitdaging

Optimize adversarial tokens to achieve target model behaviors under strict character limits.

Tool Master-uitdaging

Exploit the tool-use capabilities of an agent system to access unauthorized functions and capture hidden flags.

toollabsmasterctf

Agent Maze Runner: navigatie met meerdere tools

Navigate a maze of agent tools, each with unique vulnerabilities, to reach and exfiltrate a hidden flag.

labsctfagentmulti-tool

AI-wargame: aanvaller versus verdediger in real-time

Compete in a real-time wargame where you attack an LLM system while an AI defender adapts to your techniques.

labsctfwargamereal-time

Classifier Gauntlet: ontwijking in 10 fases

Bypass 10 progressively harder input classifiers using different evasion techniques at each stage.

labsctfclassifierevasion

labsctfexfiltrationmulti-channel

Data Heist: exfiltratie via meerdere kanalen

Exfiltrate a secret key through a heavily monitored LLM system using covert exfiltration channels.

labsctfdefenseconstruction

Defense Builder: bouwen en overleven

Build defensive guardrails for an LLM application and survive 100 automated attack rounds.

AI Escape Room: agent breakout-uitdaging

Break an AI agent out of its sandboxed environment by chaining tool-use vulnerabilities and injection techniques.

labsctfagentescape-room

labsctffunction-callingfortress

Function Calling Fortress Breach

Bypass function calling restrictions to invoke unauthorized tools and extract a flag from a sandboxed agent.

labsctfguardrailsspeedrun

Guardrail Speedrun: snelste-bypass-uitdaging

Bypass 5 different guardrail implementations as fast as possible in a timed competition format.

labsctfhidden-payloadencoding

Invisible Ink: uitdaging voor extractie van verborgen payloads

Extract a hidden flag by decoding multiple layers of obfuscation in model responses across encoded channels.

labsctfmcpserver-takeover

MCP-servertakeover-uitdaging

Compromise an MCP server through tool description injection and use it to pivot to other connected services.

Multimodale cipher: cross-modale ontsleuteling

Decode a flag split across text, image, and audio inputs processed by a multimodal AI system.

labsctfmultimodalcipher

labsctfpolyglotmultilingual

Polyglot injection-uitdaging

Craft a single injection payload that works across 5 different languages and 3 different model providers.

labsctfsystem-promptarchaeology

Prompt-archeologie: diepgaand herstel van de systeemprompt

Recover fragments of a deeply hidden, multi-layer system prompt using various extraction techniques.

RAG-speurtocht: infiltratie van de vector-DB

Infiltrate a vector database by crafting documents that surface for specific queries to reveal flag fragments.

labsctfragvector-db

labsctfreasoningchain-hijack

Reasoning chain hijack: CTF over CoT-misbruik

Hijack a reasoning model's chain-of-thought to redirect its conclusions while maintaining coherent reasoning.

labsctfred-vs-bluetournament

Red vs Blue-toernooi-uitdaging

Alternate between attacking and defending LLM systems in a scored tournament format.

labsctfstealthexfiltration

Stealth-exfiltratie-uitdaging: zero detection

Exfiltrate data from a monitored LLM system without triggering any of the deployed anomaly detectors.

labsctfsupply-chainbackdoor-detection

Supply Chain Detective: vind de backdoor

Analyze a model pipeline to find where a backdoor was inserted — data, training, or post-processing.

labsctfsleeper-agentdetection

Tijdbom onschadelijk maken: detectie van sleeper agents

Detect and neutralize a sleeper agent behavior trigger hidden in a fine-tuned model before it activates.

labsctftoken-smugglingadversarial

Token Smuggler Extreme: adversarial tokens maken

Craft adversarial token sequences under extreme character limits to achieve target model behaviors.

labsteeringexpertlabsactivation

Activation steering voor misbruik

Use activation steering techniques to manipulate model behavior at the representation level without prompt modification.

labsbackdoordetection-evasionexpert

Ontwijken van backdoor-detectie

Insert backdoors into fine-tuned models that evade state-of-the-art detection methods.

architecturecrosstransferlabexpertlabs

Lab: cross-architectuur transfer

Develop attacks on one model architecture that reliably transfer to fundamentally different architectures.

exploitationlabexpertbehavioremergentlabs

Misbruik van emergent gedrag

Identify and exploit emergent behaviors in frontier models that arise from scale and are not present in smaller models.

labexpertformalverificationlabslimits

Lab: grenzen van formele verificatie

Explore the limits of formal verification for neural network behavior and identify gaps exploitable by adversaries.

labexpertguidedattacklabsinterpretability

Aanvalsontwerp gestuurd door interpreteerbaarheid

Use mechanistic interpretability to identify exploitable circuits and design targeted attacks.

labsmodel-extractionapiexpert

Modelextractie via API-toegang

Extract a functionally equivalent model using only API query access.

routingmoeexploitationlabexpertlabs

Lab: misbruik van MoE-routing

Exploit Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.

neurallablabsexpertbackdoorengineering

Engineering van neurale backdoors

Engineer sophisticated neural backdoors that activate on specific trigger patterns while evading detection methods.

labsquantizationexploitationexpert

Misbruik van kwetsbaarheden door kwantisatie

Exploit behavioral differences between full-precision and quantized models.

labsreasoningchain-of-thoughtexpert

Misbruik van chain-of-thought in redeneermodellen

Exploit extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.

labsrepresentation-engineeringattacksexpert

Aanvallen via representation engineering

Manipulate internal model representations to alter behavior without prompt modification.

attackstimelabexperttraininglabs

End-to-end aanvallen tijdens training

Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.

labsactivation-patchingbehavior-modificationexpert

Activation patching voor gerichte gedragsaanpassing

Use activation patching techniques to modify specific model behaviors without retraining or fine-tuning.

labsalignment-fakingdetectionexpert

Lab: detecteren van alignment faking

Implement detection methods for alignment faking behaviors where models behave differently during evaluation versus deployment.

labscontinuous-learningpipeline-poisoningexpert

Poisoning van de continuous learning-pipeline

Exploit continuous/online learning pipelines by injecting adversarial examples that gradually shift model behavior.

labsdeceptive-alignmenttestingexpert

Framework voor het testen op deceptive alignment

Build a testing framework for detecting mesa-optimization and deceptive alignment in fine-tuned models.

labsemergent-capabilitiessecurity-probingexpert

Beveiligingsonderzoek van emergente capaciteiten

Design evaluations that discover security-relevant emergent capabilities in frontier language models.

labsfederated-learningpoisoningexpert

Poisoning-aanval op federated learning

Execute model poisoning attacks in a federated learning setting through adversarial participant manipulation.

labsformal-verificationsafety-propertiesexpert

Formele verificatie van veiligheidseigenschappen van LLM's

Apply formal verification methods to prove or disprove safety properties of LLM-based applications.

labsgradient-baseduniversal-promptexpert

Gradient-gebaseerde universele prompt-zoekopdracht

Implement gradient-based optimization to search for universal adversarial prompts that transfer across contexts.

labskv-cachemanipulationinferenceexpert

Manipulatie-aanvallen op de KV-cache

Manipulate key-value caches in inference servers to inject or alter cached context across requests.

labsmechanistic-interpretabilityexploit-discoveryexpert

Mechanistische interpreteerbaarheid voor het ontdekken van exploits

Use mechanistic interpretability tools to discover exploitable circuits and features in transformer models.

labsmembership-inferenceproductionexpert

Membership inference tegen LLM's in productie

Implement membership inference attacks to determine whether specific data was used in training an LLM.

labsmodel-extractionknowledge-distillationexpert

Modelextractie via knowledge distillation

Extract a functionally equivalent model from a commercial API using systematic distillation queries.

labsmodel-mergingbackdoorpropagationexpert

Verspreiding van backdoors via model merging

Demonstrate how backdoors propagate through model merging techniques like TIES, DARE, and spherical interpolation.

labsmoeroutingexploitationexpert

Misbruik van MoE-routing

Exploit Mixture-of-Experts routing mechanisms to activate specific expert networks for adversarial purposes.

labsmulti-agentconsensus-poisoningexpert

Vergiftiging van multi-agent-consensus

Poison multi-agent consensus mechanisms to manipulate collective decision-making in agent swarms.

labsrepresentation-engineeringbehavior-steeringexpert

Representation engineering voor gedragssturing

Use representation engineering to steer model behavior by manipulating activation vectors during inference.

labsreward-hackingrlhfexpert

Reward hacking in RLHF-pijplijnen

Craft adversarial training examples that exploit reward model weaknesses in RLHF training pipelines.

labssafety-removalminimal-dataexpert

Veiligheid verwijderen via fine-tuning met minimale data

Determine the minimum number of fine-tuning examples needed to remove safety alignment from various model families.

labsspeculative-decodingside-channelexpert

Side-channel-aanvallen op speculative decoding

Exploit speculative decoding implementations to extract information about draft and verifier model behavior.

labstoken-probabilitysystem-promptreconstructionexpert

Reconstructie van de systeemprompt via token probabilities

Reconstruct complete system prompts using token-by-token probability analysis through API access.

labswatermarkremovalexpert

Aanvallen om LLM-watermerken te verwijderen

Develop techniques to remove or corrupt watermarks embedded in LLM-generated text.

labshands-onpracticectfsimulationsexercises

Labs en hands-on oefenen

Progressive hands-on lab exercises spanning beginner to expert difficulty, including CTF challenges and full engagement simulations for AI red teaming skill development.

AI-red team-labs voor gevorderden

Overview of intermediate-level hands-on labs covering agent exploitation, RAG poisoning, multi-turn attacks, automated campaigns, and advanced injection techniques for AI red teaming.

labsintermediate

encodingchainsintermediateadvancedlablabs

Geavanceerde encoding chain-aanvallen

Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.

labsfunction-callingadvancedintermediate

Lab: geavanceerd misbruik van function calling

Exploit advanced function calling patterns including nested calls, parallel execution, and schema manipulation.

labsadversarialprompt-generationintermediate

Lab: geautomatiseerde generatie van adversarial prompts

Build automated systems for generating adversarial prompts using mutation, crossover, and selection strategies.

labsagentmemoryintermediate

Manipulatie van agentgeheugen

Exploit persistent memory in LLM agents to plant false context that persists across sessions.

labsagent-workflowtestingintermediate

Lab: beveiligingstesten van agent-workflows

Test multi-step agent workflows for injection points, state manipulation, and unauthorized action execution.

intermediateabuselabpatternslabsapi

Lab: API-misbruikpatronen

Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.

labsapifuzzingintermediate

API-parameterfuzzing voor LLM's

Systematically fuzz LLM API parameters to discover unexpected behaviors and bypasses.

labsapi-versioningsecurity-gapsintermediate

Lab: beveiligingslekken bij API-versionering

Exploit version differences in LLM APIs to find deprecated endpoints and backward-compatible safety bypasses.

labscanary-tokenstestingintermediate

Lab: effectiviteit van canary tokens testen

Deploy and test canary tokens in system prompts to evaluate their effectiveness at detecting prompt extraction.

labschain-of-thoughtexploitationintermediate

Lab: misbruik van chain-of-thought

Exploit chain-of-thought reasoning to leak internal model reasoning, bypass safety filters, and manipulate decision processes.

labscicdpipeline-testingintermediate

Lab: CI/CD-pijplijnbeveiliging voor LLM-apps

Test CI/CD pipeline security for LLM applications including prompt template injection, model artifact tampering, and deployment attacks.

labsclassifierevasion-advancedintermediate

Lab: geavanceerde classifier-ontwijking

Evade advanced input classifiers using adversarial perturbation, semantic paraphrasing, and gradient-free optimization.

classifierintermediateevasionlablabs

Lab: ontwijking van ML-classifiers

Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.

labscloud-platformtestingintermediate

Lab: beveiligingstesten van cloud-AI-platforms

Test cloud AI platform configurations including AWS Bedrock guardrails, Azure content filters, and GCP safety settings.

labscode-generationsecurity-testingintermediate

Lab: beveiligingstesten van codegeneratie

Test LLM code generation for insecure patterns, injection vulnerabilities, and code execution safety issues.

labscompliancetestingintermediate

Lab: grondbeginselen van AI-compliancetesten

Test LLM applications against OWASP LLM Top 10 and NIST AI RMF requirements using structured test procedures.

labscontext-windowoptimizationintermediate

Lab: optimalisatie van aanvallen op het contextvenster

Optimize prompt injection placement within the context window to maximize attack effectiveness using attention dynamics.

contextintermediatelabpoisoningwindowlabs

Lab: poisoning van het context window

Exploit context window management to inject persistent adversarial content that influences future model responses.

intermediatesplittingconversationlablabs

Lab: conversation splitting-aanval

Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.

labsconversationstate-poisoningintermediate

Poisoning van de conversatie-state

Exploit conversation history manipulation to alter model behavior across turns.

labscross-contextinjectionintermediate

Cross-context injection

Inject prompts that persist across separate conversation contexts in shared deployments.

labswordlistsjailbreakingintermediate

Generatie van custom jailbreak-wordlists

Generate targeted jailbreak wordlists using mutation strategies and seed prompts.

labsexfiltrationside-channelsintermediate

Data-exfiltratie via side channels

Extract data from LLM systems using timing, token probability, and formatting side channels.

labsdefense-bypasschainsintermediate

Verdediging-bypassketens bouwen

Combine multiple bypass techniques into reliable chains that defeat layered defenses.

labsdefenseregression-testingintermediate

Lab: opzet voor regressietesten van verdedigingen

Build a regression testing framework to continuously verify that LLM defenses remain effective against known attack patterns.

intermediateinjectionlablabsragdocument

Lab: op documenten gebaseerde RAG injection

Inject adversarial content into documents that will be processed by a RAG system to influence model responses.

labsembeddingmanipulation-advancedintermediate

Lab: geavanceerde embedding-manipulatie

Manipulate embedding vectors to achieve target similarity scores for RAG poisoning and retrieval manipulation.

intermediatelabpracticalmanipulationembeddinglabs

Praktische manipulatie van embeddings

Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.

labsembeddingssimilarityintermediate

Embedding-gelijkenisaanvallen

Manipulate text to achieve target embedding similarity scores for retrieval poisoning.

fineintermediatetuninglablabsapi

Beveiligingsonderzoek van de fine-tuning-API

Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.

labsfine-tuningsafety-testingintermediate

Lab: testen van de veiligheidsimpact van fine-tuning

Measure how fine-tuning affects model safety by comparing pre and post fine-tuning safety benchmark scores.

labsguardrail-bypasstechniquesintermediate

Lab: technieken om guardrails te omzeilen

Practice guardrail bypass techniques against NeMo Guardrails, LLM Guard, and custom classifier-based defenses.

labsguardrailsfingerprintingintermediate

Fingerprinting van guardrails

Systematically map the rules and thresholds of input/output guardrail systems.

guardrailruleintermediatelablabsextraction

Lab: extractie van guardrail-regels

Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.

labshallucinationexploitationintermediate

Lab: LLM-hallucinaties misbruiken

Exploit hallucination tendencies to trigger fabricated tool calls, invented API endpoints, and false fact injection.

labsincident-responsesimulationintermediate

Lab: simulatie van AI-incidentrespons

Practice AI incident response procedures through a simulated prompt injection incident with escalation and containment.

labsindirect-injectionwebintermediate

Indirecte injection via webcontent

Plant prompt injection payloads in web pages consumed by RAG-enabled LLM applications.

labsinstruction-followingmeasurementintermediate

Lab: het volgen van instructies meten

Quantitatively measure instruction following compliance to identify where models prioritize competing instructions.

labsllm-judgemanipulationintermediate

Manipulatie van een LLM-judge

Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.

labslogprobsanalysisintermediate

Lab: log probability-analyse voor aanvallen

Use token log probabilities to extract information about training data, system prompts, and model internals.

labsmcpadvanced-testingintermediate

Lab: geavanceerd testen van het MCP-protocol

Perform advanced security testing of MCP servers including transport analysis, capability enumeration, and tool poisoning.

labsmcptool-poisoningintermediate

Tool-poisoning-aanval in MCP

Exploit MCP tool descriptions to inject instructions that redirect agent behavior.

labsmcp-fuzzingprotocolintermediate

Lab: fuzzing van het Model Context Protocol

Fuzz MCP protocol messages to discover parsing errors, validation gaps, and unexpected behavior in MCP servers.

fingerprintingintermediatelabpracticallabsmodel

Praktisch lab voor model-fingerprinting

Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.

labsmonitoringsecurity-setupintermediate

Lab: opzet van LLM-beveiligingsmonitoring

Deploy a comprehensive security monitoring system for LLM applications with anomaly detection and alerting.

labsmulti-agentcommunication-testingintermediate

Lab: communicatie tussen agents testen

Test security of inter-agent communication in multi-agent systems including message injection and trust boundary violations.

intermediatebypasslablanguagelabsmulti

Lab voor meertalige veiligheidsbypass

Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.

labsmultilingualinjectionintermediate

Meertalige injectie-aanvallen

Exploit language switching and low-resource language gaps to bypass safety training.

labsmulti-modelchainingintermediate

Chaining van aanvallen over meerdere modellen

Chain attacks across multiple LLM models in a pipeline to bypass per-model defenses.

labsmulti-modelcomparison-testingintermediate

Lab: beveiligingstesten met vergelijking over meerdere modellen

Compare security postures across multiple LLM providers by running identical attack suites and analyzing differential responses.

pipelineintermediatelablabsmultimodel

Lab voor aanvallen op multi-modelpijplijnen

Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.

labsmulti-turnstate-trackingintermediate

Lab: state tracking-aanvallen over meerdere beurten

Exploit conversation state tracking to build persistent attack context across multiple conversation turns.

labsmultimodalintermediate-testingintermediate

Lab: multimodaal beveiligingstesten voor gevorderden

Test multimodal LLMs with crafted images containing embedded text, adversarial perturbations, and visual injection payloads.

discoverychannelintermediateoutputlablabs

Lab voor ontdekking van uitvoerkanalen

Discover alternative output channels in LLM applications that bypass standard output filtering.

labsoutput-schemaexploitationintermediate

Lab: misbruik van het uitvoerschema

Exploit structured output schemas to force models into revealing information through required field population.

labsprompt-cachingside-channelintermediate

Lab: side-channel-aanvallen op prompt-caching

Exploit prompt caching mechanisms to detect cached prompts and extract information through timing side channels.

intermediatetechniqueslabpromptlabsleakage

Lab voor technieken van promptlekkage

Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.

labspyritautomatedintermediate

Geautomatiseerde redteaming met PyRIT

Use Microsoft's PyRIT framework to run automated multi-turn attack campaigns.

labsragadvanced-testingintermediate

Lab: geavanceerd beveiligingstesten van RAG

Test RAG systems for chunking exploitation, reranking manipulation, and cross-document injection attacks.

labsragpoisoningintermediate

Vergiftiging van RAG-context

Poison a vector database to inject adversarial content into RAG retrieval results.

intermediatechainlabhijackingreasoninglabs

Kaping van de redeneerketen

Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.

labsrole-hierarchyexploitationintermediate

Lab: misbruik van de rolhiërarchie

Exploit multi-role LLM architectures by manipulating the instruction hierarchy between system, user, and assistant roles.

labssafety-filterbenchmarkingintermediate

Lab: benchmarking van veiligheidsfilters

Benchmark safety filters across providers using standardized test suites to compare detection rates and false positives.

craftingintermediatesemanticlabdeceptionlabs

Lab voor het maken van semantische misleiding

Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.

labssemanticinjectionintermediate

Het maken van semantische injectie

Craft semantically coherent injections that evade both classifiers and human review.

labssemantic-searchmanipulationintermediate

Lab: manipulatie van semantisch zoeken

Manipulate semantic search results by crafting documents that achieve target similarity scores against specific queries.

labssession-isolationtestingintermediate

Lab: sessie-isolatie testen

Test session isolation in multi-tenant LLM deployments to detect cross-session data leakage and context contamination.

exploitationintermediatesessionlablabsstate

Lab voor misbruik van sessiestatus

Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.

labsstreamingresponse-manipulationintermediate

Lab: manipulatie van streaming-responses

Exploit streaming response mechanisms to bypass output filters that only check complete responses.

intermediatebypassoutputlabstructuredlabs

Veiligheidsbypass via gestructureerde uitvoer

Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.

labsstructured-outputexploitationintermediate

Misbruik van gestructureerde uitvoer

Exploit JSON mode and structured output constraints to bypass safety filters.

labssystem-promptreconstructionintermediate

Lab: reconstructie van de system prompt (lab voor gevorderden)

Reconstruct full system prompts through iterative partial extraction and cross-referencing multiple leakage vectors.

labstoken-smugglingadvancedintermediate

Geavanceerde token smuggling

Use Unicode homoglyphs, zero-width characters, and tokenizer quirks to smuggle payloads.

labstool-resultinjectionintermediate

Lab: injection-aanvallen via toolresultaten

Inject adversarial content through tool call results to poison model reasoning and redirect subsequent actions.

labsunicodenormalization-bypassintermediate

Lab: bypass-aanvallen via Unicode-normalisatie

Exploit Unicode normalization differences between input validation and model processing to smuggle injection payloads.

labsvector-dbaccess-controlintermediate

Lab: toegangscontrole van een vectordatabase testen

Test vector database access controls for bypass vulnerabilities including namespace traversal and filter manipulation.

labsagentgoal-hijackingintermediate

Agent goal-hijacking

Redirect an AI agent's objectives through carefully crafted inputs that override its primary task.

labsassistant-prefillinjectionintermediate

Assistant prefill injection-aanvallen

Exploit assistant message prefilling to prime model responses and bypass safety alignment.

labschat-historymanipulationintermediate

Aanvallen via manipulatie van chatgeschiedenis

Inject false messages into conversation history to alter model behavior in subsequent turns.

labsclassifierboundary-mappingintermediate

Grenzen van de input-classifier in kaart brengen

Systematically probe input classifiers to map their decision boundaries and find bypass regions.

labscontext-windowboundaryintermediate

Misbruik van de grenzen van het context window

Exploit the boundaries of context windows to push safety instructions beyond the model's attention.

labscrescendoimplementationintermediate

Implementatie van de Crescendo-aanval

Implement Microsoft's Crescendo multi-turn escalation attack with automated conversation management.

labscross-pluginchainingintermediate

Cross-plug-in exploitatieketens koppelen

Chain exploits across multiple plugins/tools in an agent system to achieve multi-step attacks.

labsfunction-schemamanipulationintermediate

Manipulatie van function schema's

Exploit function calling schemas to inject instructions through parameter descriptions and enums.

labsgarakcustom-probesintermediate

Custom garak-probes bouwen

Develop custom probe modules for NVIDIA Garak to test organization-specific vulnerability patterns.

labsguardrailslatencydetectionintermediate

Detectie van guardrails op basis van latency

Use timing side channels to identify and characterize guardrail implementations in LLM applications.

labsinspect-aicustom-evaluationsintermediate

Custom veiligheidsevaluaties met Inspect AI

Build custom safety evaluation tasks in Inspect AI to test application-specific vulnerability classes.

labsjailbreakbenchcomparativeintermediate

Vergelijkend testen met JailbreakBench

Run standardized jailbreak benchmarks across multiple models and compare vulnerability profiles.

labsjson-modesafety-bypassintermediate

Veiligheid omzeilen in JSON-modus

Exploit structured JSON output modes to bypass safety filters that only check plain-text responses.

labsllm-guardevasionintermediate

Ontwijkingstechnieken voor LLM Guard

Develop evasion techniques against LLM Guard input scanners and output detectors.

labsmany-shotjailbreakingintermediate

Many-shot jailbreaking implementeren

Implement Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.

labsmcpresourceexfiltrationintermediate

Data-exfiltratie via MCP-resources (lab voor gevorderden)

Exploit MCP resource URIs to access and exfiltrate data beyond intended scope boundaries.

labsmultilingualsafety-gapsintermediate

Misbruik van meertalige veiligheidshiaten

Systematically identify and exploit safety training gaps across low-resource languages.

labsmulti-turnstate-trackingintermediate

Misbruik van state tracking over meerdere beurten

Exploit state management in multi-turn conversations to gradually shift model behavior over many turns.

labsnemo-guardrailsbypassintermediate

Technieken voor het omzeilen van NeMo Guardrails

Test and bypass NVIDIA NeMo Guardrails configurations using known evasion techniques.

labsoutput-filterbypassencodingintermediate

Uitvoerfilter omzeilen via encoding

Bypass output safety filters by instructing the model to encode responses in Base64, hex, or custom schemes.

labspdfinjectionragintermediate

PDF-documentinjectie voor RAG-systemen

Craft adversarial PDF documents that inject instructions when processed by RAG document loaders.

labsprompt-templatevariable-injectionintermediate

Variabele-injectie in prompttemplates

Inject adversarial content through template variable expansion in prompt management systems.

labspyritcustom-strategiesintermediate

Aangepaste aanvalsstrategieën in PyRIT

Build custom attack strategies and converters in Microsoft's PyRIT framework for targeted campaigns.

labsragdocument-injectionintermediate

Campagne voor RAG-documentinjectie

Design and execute a document injection campaign against a RAG-powered application with vector search.

labssemantic-searchpoisoningembeddingsintermediate

Vergiftiging van semantisch zoeken

Craft adversarial documents that rank highly in semantic search for targeted queries in RAG systems.

labsstreamingresponseexfiltrationintermediate

Exfiltratie via streaming responses

Exploit streaming API responses to exfiltrate data that might be caught by batch output filters.

labslogprobssystem-promptintermediate

Extractie van de systeemprompt via token probabilities

Exploit token probability APIs to reconstruct hidden system prompts character by character.

labstokenizerboundary-manipulationintermediate

Manipulatie van tokengrenzen

Exploit tokenizer-specific behavior by crafting inputs that split across token boundaries in unexpected ways.

labstool-resultinjectionintermediate

Injectie-aanvallen via toolresultaten

Craft malicious tool return values that inject instructions back into the model's reasoning chain.

labsvisiontext-injectionimagesintermediate

Tekst injecteren in VLM's via afbeeldingen

Embed adversarial text in images that vision-language models read and follow as instructions.

labswebhookexfiltrationintermediate

Lab: data-exfiltratie via webhooks

Test data exfiltration paths through LLM-triggered webhooks, API callbacks, and external URL rendering.

vehiclesimautonomoussimulationslabs

Assessment van AI-communicatie in autonome voertuigen

Assess the AI communication and decision-making systems in an autonomous vehicle scenario.

labssimulationcode-reviewsecurity

Assessment van een codereview-assistent

Test a code review AI for vulnerabilities in code analysis, suggestion generation, and repository access.

moderationsimcontentsimulationslabs

Assessment van AI voor contentmoderatie

Assess an AI content moderation system for bypass techniques, false positive manipulation, and adversarial content generation.

labssimulationcustomer-serviceagent

Red team van een klantenservice-agent

Red team a customer service agent with tool access to order systems, refunds, and customer data.

labssimulationeducationtutor

Beoordeling van educatieve AI-tutor

Assess an AI tutoring system for content safety, answer manipulation, and student data exposure.

labssimulationfintechassessment

Beveiligingsbeoordeling van een fintech-chatbot

Conduct a full security assessment of a financial services chatbot handling sensitive transactions.

governmentservicessimsimulationslabs

AI-beoordeling van overheidsdienstverlening

Assess an AI system that helps citizens navigate government services for data privacy and manipulation risks.

labssimulationhealthcarediagnostic

AI-beoordeling van diagnostiek in de zorg

Assess a healthcare diagnostic AI for safety-critical vulnerabilities and data privacy compliance.

siminsurancesimulationslabsclaims

AI-beoordeling van verzekeringsclaims

Conduct a complete red team assessment of an AI system that processes insurance claims and makes approval recommendations.

labssimulationlegalassessment

Beoordeling van juridische AI voor documentreview

Assess a legal AI system that reviews contracts for vulnerabilities in document processing and privilege escalation.

simmedicaltriagesimulationslabs

AI-beoordeling van medische triage

Assess an AI-powered medical triage system for safety-critical vulnerabilities and clinical decision manipulation.

labssimulationmulti-agentworkflow

Beoordeling van multi-agent-workflows

Red team a multi-agent system with specialized agents communicating via A2A protocol.

simrecruitmentsimulationslabs

Assessment van een AI-wervingssysteem

Red team an AI recruitment system for bias exploitation, resume injection, and candidate data privacy issues.

smartsimhomesimulationslabs

Beoordeling van een smarthome-AI-assistent

Red team a smart home AI assistant with device control, user profile access, and purchase capabilities.

labssimulationsupply-chainpipeline

Assessment van de AI-supply chain-pipeline

Assess the full ML pipeline from data ingestion through model deployment for supply chain attacks.

tradingsimsimulationslabsbot

Assessment van een AI-handelsbot

Red team an AI-powered trading system for market manipulation, data leakage, and adversarial input attacks.

labssimulationpentestingautomated-agent

Assessment van een geautomatiseerde pentesting-agent

Assess an AI-powered automated penetration testing agent for scope violations, unauthorized actions, and data handling.

labssimulationautonomous-vehiclesafety-critical

Assessment van planning-AI in autonome voertuigen

Assess the safety boundaries of an autonomous vehicle planning AI through adversarial scenario injection.

labssimulationcontent-moderationplatform

Assessment van een AI-platform voor contentmoderatie

Assess an AI content moderation system for bypass techniques, false negative exploitation, and bias.

labssimulationcdpcustomer-data

Assessment van AI in een customer data platform

Red team an AI system integrated with a customer data platform handling PII, behavioral data, and segmentation.

labssimulationcyber-threat-intelsecurity

Assessment van AI voor cyber threat intelligence

Red team a cyber threat intelligence AI that processes IOCs, threat reports, and attack attribution.

labssimulationdata-analyticscopilot

Assessment van een data-analytics copilot

Red team a data analytics copilot with SQL generation capabilities and access to enterprise databases.

labssimulationdevopsinfrastructure

Beveiligingsassessment van een DevOps AI-assistent

Assess a DevOps AI assistant with access to CI/CD pipelines, cloud infrastructure, and deployment systems.

labssimulationenterpriseknowledge-management

AI-beoordeling van kennismanagement in ondernemingen

Assess an enterprise knowledge management system with role-based access, document permissions, and multi-tenant isolation.

labssimulationgovernmentcitizen-services

Beoordeling van een chatbot voor overheidsdienstverlening

Red team a government citizen services chatbot that accesses tax records, benefits, and identity information.

labssimulationincident-responsesoc

Assessment van een AI-gedreven incident response-systeem

Red team an AI incident response system in a SOC environment with access to SIEM, EDR, and ticketing.

labssimulationinsuranceclaims

Beveiligingsbeoordeling van een bot voor verzekeringsclaims

Conduct a full security assessment of an insurance claims processing chatbot with access to policy and claims data.

labssimulationlegalcontract-review

AI-beoordeling van juridische contractreview

Red team an AI contract review system for privilege violations, document injection, and hallucinated clauses.

labssimulationmedicaltriage

Beoordeling van een chatbot voor medische triage

Red team a medical triage chatbot for dangerous medical advice, data exposure, and safety-critical failures.

labssimulationmulti-agentresearch-team

Beoordeling van multi-agent-onderzoeksteams

Assess a multi-agent research team with specialized researcher, writer, and reviewer agents communicating via A2A.

labssimulationreal-estatevaluation

AI-beoordeling van vastgoedtaxatie

Assess an AI real estate valuation system for manipulation of property valuations and data exposure.

labssimulationrecruitinghr

Assessment van een AI-wervingsassistent

Red team an AI recruiting assistant that screens resumes, schedules interviews, and accesses candidate data.

labssimulationsmart-homeiot

Beoordeling van een smarthome-AI-assistent (simulatie)

Assess a smart home AI assistant that controls lights, locks, cameras, and thermostat via tool integrations.

labssimulationsupply-chainoptimization

AI-beoordeling van supply chain-optimalisatie

Assess an AI supply chain optimization system for manipulation of demand forecasts and routing decisions.

labssimulationtradingfinance

Beveiligingsassessment van een AI-handelsassistent

Conduct a security assessment of an AI trading assistant with access to portfolio data and trade execution.

labssimulationtravelbooking-agent

Red team voor een reisboekingsagent

Red team a travel booking AI agent with access to payment systems, loyalty programs, and personal data.