# lab

labtoolscomparisonmethodology

Lab: toolvergelijking — hetzelfde doelwit, 4 tools

Praktijklab waarin Garak, PyRIT, promptfoo en Inspect AI worden vergeleken tegen hetzelfde doelmodel. Beoordeel dekking, snelheid, kwaliteit van bevindingen en praktische trade-offs van elke tool.

labembeddingshands-onpythonintermediate

Lab: embeddingruimtes verkennen

Praktisch lab met Python om embeddingruimtes te visualiseren, semantische gelijkenis te meten en te demonstreren hoe adversarial documenten kunnen worden gemaakt die matchen met doel-queries.

labreconnaissanceattack-surfacemappingintermediate

Lab: het aanvalsoppervlak van een AI-systeem in kaart brengen

Praktijklab dat je door de verkenning van een AI-systeem leidt — componenten identificeren, gegevensstromen in kaart brengen, tools inventariseren en het aanvalsoppervlak documenteren.

Lab: gesimuleerde robotbesturingsexploitatie

Praktische laboefeningen voor het exploiteren van door LLM's bestuurde robots in simulatie: omgevingsopzet, injectie-aanvallen, testen van veiligheidsbypass en meerstaps exploitatieketens met PyBullet.

labrobotsimulation

advancedlabragmanipulationlabs

Lab: geavanceerde RAG-manipulatie

Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.

advancedlabadversarialrobustnessevallabs

Evaluatie van adversarial robuustheid

Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.

Lab: adversarial suffixes genereren

Implement the Greedy Coordinate Gradient (GCG) algorithm to generate adversarial suffixes that cause language models to comply with harmful requests by appending optimized token sequences.

labadversarial-suffixgcg

labattack-chainexploitmulti-stageadvanced

Lab: een aanvalsketen construeren

Hands-on lab for chaining three or more distinct vulnerabilities into a complete exploit sequence that achieves objectives impossible with any single technique alone.

labaudioadversarialmultimodaladvancedhands-on

Lab: adversarial voorbeelden in audio

Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.

autodanadvancedlabimplementationlabs

Lab: AutoDAN implementeren

Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.

labbackdoordetectionforensicsfine-tuning

Lab: backdoor-detectie in fine-tuned modellen

Analyze a fine-tuned language model to find and characterize an inserted backdoor, using behavioral probing, activation analysis, and statistical testing techniques.

Lab: pijplijn voor continue geautomatiseerde redteaming

Build an end-to-end CART pipeline that continuously generates, executes, and scores adversarial attacks against LLM applications, with alerting and trend tracking.

labcartcontinuous

labcloudassessmentend-to-endapi-securityadvancedhands-on

Lab: beveiligingsbeoordeling van cloud-AI

Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.

labcloudassessmentinfrastructuresecurityadvanced

Lab: cloud-AI-beoordeling

Hands-on lab for conducting an end-to-end security assessment of a cloud-deployed AI system including infrastructure review, API testing, model security evaluation, and data flow analysis.

labcontainer-securitybreakoutinfrastructure

Lab: breakout uit een gecontaineriseerd model

Explore techniques for escaping from containerized AI applications to the host system, testing container isolation boundaries in ML deployment environments.

labcross-modeltransfer-attacksmulti-model

Lab: transfer-aanvallen tussen modellen

Test whether jailbreaks discovered on one language model transfer effectively to others, building a systematic methodology for cross-model vulnerability research.

exploitationadvancedcrosslabpluginlabs

Cross-plug-in exploitatieketen

Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.

classifiercustomadvancedlabtraininglabs

Training van een custom safety-classifier

Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.

labharnessproductionframework

Lab: een productie-red team-harness bouwen

Build a full-featured, production-quality red team harness with multi-model support, async testing, structured result storage, and HTML reporting.

labcustom-harnesstesting-frameworkdomain-specificadvancedhands-on

Lab: eigen test-harness voor specifieke applicaties

Build a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.

defenseadvancedlabteamredlabs

War game: verdediging versus red team

Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.

labensemble-attacksmulti-modelautomated-attacks

Lab: ensemble-aanvallen

Use multiple language models collaboratively to discover attack strategies that bypass any single model's defenses, leveraging model diversity for more effective red teaming.

labfederated-learningpoisoningexperthands-on

Lab: poisoning-aanval op federated learning

Hands-on lab for understanding and simulating poisoning attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted gradient updates.

Lab: een backdoor invoegen via fine-tuning

Advanced lab demonstrating how fine-tuning can insert hidden backdoors into language models that activate on specific trigger phrases while maintaining normal behavior otherwise.

labfine-tuningbackdoor

labfuzzingautomated-testingboundary-testing

Lab: bouw een AI-fuzzing-harness

Create a systematic fuzzing framework for testing LLM boundaries, generating and mutating inputs to discover unexpected model behaviors and safety edge cases.

labgpuside-channelhardwareinformation-leakage

Lab: side-channel-aanvallen op de GPU

Demonstrate information leakage through GPU memory residuals and timing side channels, showing how shared GPU infrastructure can expose sensitive model data.

advancedlabgradientguidedpoisoninglabs

Gradient-gestuurde datavergiftiging

Use gradient information from open-source models to craft optimally poisoned training examples.

Lab: guardrail-bypasses aaneenschakelen

Advanced lab on identifying, isolating, and chaining multiple guardrail bypass techniques to defeat layered defense systems in production LLM applications.

labguardrailsbypass

labforensicsincident-responseinvestigation

Lab: onderzoek van AI-incidenten

Investigate logs and artifacts from a compromised AI system to reconstruct the attack chain, identify the vulnerability exploited, and determine the scope of the breach.

labinference-serverinfrastructurevllmtriton

Lab: misbruik van inference-servers

Attack vLLM, TGI, and Triton inference servers to discover information disclosure vulnerabilities, denial-of-service vectors, and configuration weaknesses in model serving infrastructure.

labmcpexploitationtool-usefull-chainadvanced

Lab: volledig misbruik van MCP

Hands-on lab for conducting a complete Model Context Protocol server compromise including tool enumeration, permission escalation, cross-tool attacks, and data exfiltration through MCP channels.

labmodel-comparisonsafety-evaluationbenchmarking

Lab: beveiligingsvergelijking van modellen

Systematically compare the safety posture of major language models using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.

distillationadvancedlabattacklabsmodel

Lab: beveiliging van modeldistillatie

Extract model capabilities through distillation techniques using only black-box API access.

labmodel-servinginfrastructuretensorflow-servingtorchserve

Lab: aanvallen op model serving-frameworks

Exploit vulnerabilities in TensorFlow Serving, TorchServe, and Triton Inference Server, targeting model loading, API endpoints, and management interfaces.

labmulti-modelcomparativebenchmarking

Lab: vergelijkend redteamen over meerdere modellen

Test the same attack suite across GPT-4, Claude, Llama, and Gemini. Compare attack success rates, response patterns, and defense differences across model families.

advancedlablabsattackmultiobjective

Multi-objective aanvalsoptimalisatie

Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.

labmulti-turnconversation-steeringcrescendoadvanced

Lab: geavanceerde multi-turn-aanvallen

Hands-on lab for executing 20+ turn conversation steering strategies that gradually erode AI safety boundaries to extract protected information through patient, escalating dialogue.

chainadvancedlabmultimodalattacklabs

Lab voor multimodale aanvalsketens

Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.

labmultimodalpipelinevision

Lab: multimodale aanvalspijplijn

Build an automated multimodal attack pipeline that generates adversarial images, combines them with text prompts, and tests against vision-language models (VLMs).

labpairautomatedattack-algorithm

Lab: een PAIR-aanval implementeren

Implement the PAIR (Prompt Automatic Iterative Refinement) algorithm where an attacker LLM iteratively refines jailbreak prompts against a target LLM until a successful attack is found.

labpairautomated-attacksjailbreaking

Lab: implementatie van een PAIR-aanval

Build a complete Prompt Automatic Iterative Refinement system that uses an attacker LLM to automatically generate and refine jailbreak prompts against a target model.

productionadvancedlabsimulationlabs

Lab voor simulatie van een productieomgeving

Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.

labpurple-teamattack-defensecollaborativeadvancedhands-on

Lab: purple team-oefening

Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.

labpurple-teamattack-defensemetricsadvanced

Lab: purple team-oefening (lab voor gevorderden)

Hands-on lab for conducting simultaneous attack and defense operations against an AI system with real-time metrics tracking, adaptive defense deployment, and coordinated red-blue team workflows.

labragfull-chaindocument-injectionexfiltrationadvanced

Lab: aanval op de volledige RAG-keten

Hands-on lab for executing a complete RAG attack chain from document injection through retrieval manipulation to data exfiltration, targeting every stage of the Retrieval-Augmented Generation pipeline.

labreasoningo1chain-of-thought

Lab: misbruik van reasoning-modellen

Attack reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection techniques.

laborchestrationautomationred-team-ops

Lab: red team-orchestratie

Build an orchestration system that coordinates multiple attack strategies simultaneously, managing parallel attack campaigns and synthesizing results into comprehensive risk assessments.

labregression-testingsafetyautomationci-cd

Lab: veiligheidsregressietesten op schaal

Build automated pipelines that detect safety degradation across model versions, ensuring that updates and fine-tuning do not introduce new vulnerabilities or weaken existing protections.

probingadvancedlabrepresentationlabs

Representation probing voor kwetsbaarheden

Probe internal model representations to identify exploitable features and develop representation-level attacks.

analysisadvancedlabrewardlabsmodel

Lab voor analyse van reward-modellen

Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.

labroutingmodel-selectionarchitecture-attacks

Lab: misbruik van model-routing

Attack systems that route requests to different models based on complexity or content, exploiting routing logic to reach less-defended models or bypass safety filters.

labtaptree-of-attacksalgorithm

Lab: implementatie van het TAP-algoritme

Implement the TAP (Tree of Attacks with Pruning) algorithm that uses tree-based search over attack prompts with branch pruning to efficiently find jailbreaks.

labtraining-dataextractionprivacy

Lab: extractie van trainingsdata op schaal

Extract memorized training data from language models using prefix-based extraction, divergence testing, and membership inference. Measure extraction rates and assess privacy risks.

labtransfer-attackscross-modeladversarialadvanced

Lab: ontwikkeling van transfer-aanvallen

Hands-on lab for crafting adversarial prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.

labtransfer-attacksadversarialcross-modeladvancedhands-on

Lab: ontwikkeling van transfer-aanvallen (lab voor gevorderden)

Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.

labanthropicclaudeapibeginner

Lab: grondbeginselen van de Anthropic Claude API

Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.

explorationlabbeginnerlabsmultiapi

Verkenning van multi-provider-API's

Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.

labapi-keyssecuritysecrets-managementbeginner

Lab: beveiliging van API-sleutels

Learn common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.

Lab: modeltesten via de API

Learn to test language models through their APIs including OpenAI, Anthropic, and local Ollama endpoints. Build reusable API testing functions with proper error handling.

labapitestingopenai

basiclabbeginnerlabsautomatedtesting

Basisopzet voor geautomatiseerd testen

Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.

labdefenseinput-filterregexbeginner

Lab: bouw je eerste verdediging

Hands-on lab for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block prompt injection attempts.

testingdefensebasiclabbeginnerlabs

Basistesten van verdedigingsmechanismen

Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.

labdefenseinput-filterbeginnerhands-on

Lab: bouw je eerste verdediging (beginnerlab)

Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.

reportingbasiclabbeginnerexerciselabs

Oefening in het documenteren van beveiligingsbevindingen

Practice documenting security findings in a professional format with reproducible steps and impact assessment.

labmodel-comparisonsafetybenchmarkingbeginnerhands-on

Lab: modelvergelijking

Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.

labcomparisonsafetymulti-modelbeginner

Lab: vergelijk de veiligheid van modellen

Hands-on lab for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle prompt injection, jailbreaks, and safety boundary enforcement.

labcontext-manipulationprompt-injectionpersonabeginnerhands-on

Lab: contextmanipulatie

Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.

labcontext-overflowprompt-injectionattentionbeginner

Lab: context overflow-aanvallen

Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.

historyconversationlabbeginnerbasicslabs

Analyse van conversatiegeschiedenis

Analyze how conversation history influences model behavior and identify opportunities for context manipulation.

labdefense-bypassevasionfiltersbeginnerhands-on

Lab: grondbeginselen van het omzeilen van verdedigingen

Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.

labdefense-evasionfiltersbypass

Lab: basale verdedigingsontwijking

Encounter and bypass basic LLM defenses including keyword filters, content classifiers, and input preprocessing through encoding tricks, synonym substitution, and payload splitting.

labdelimiter-injectionprompt-injectionxmlmarkdownbeginner

Lab: delimiter injection-aanvallen

Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.

labdelimiter-escapeprompt-injectionboundariesbeginnerhands-on

Lab: delimiter escape-aanvallen

Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.

securitylabbeginnerembeddingbasicslabs

Embedding-grondbeginselen voor beveiliging

Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.

Lab: encoding- en obfuscatietechnieken

Hands-on lab exploring how text encoding, character substitution, and obfuscation methods can bypass LLM content filters and safety mechanisms.

labencodingobfuscation

Lab: je red team-omgeving opzetten

Step-by-step guide to setting up a complete AI red teaming environment with Python, API clients, scanning tools, and local models.

labenvironmentsetuptools

errorlabbeginnerintelligencelabsmessage

Inlichtingen verzamelen uit foutmeldingen

Extract system architecture information from error messages and response patterns in LLM applications.

labethicsresponsibledocumentationbeginnerhands-on

Lab: ethisch redteamen

Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.

labethicsresponsible-testingdocumentationbeginner

Lab: ethisch redteamen (beginnerlab)

Hands-on lab for practicing responsible AI red teaming with proper documentation, scope management, ethical boundaries, and disclosure procedures.

labfew-shotin-context-learningprompt-injectionbeginner

Lab: few-shot-manipulatie-aanvallen

Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.

labprompt-injectionbeginnerhands-on

Lab: je eerste prompt injection

Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.

labjailbreakollamabeginnerhands-on

Lab: je eerste jailbreak

Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.

Lab: scannen met garak

Install and run the Garak LLM vulnerability scanner against local models. Learn to configure probes, generators, and evaluators for comprehensive scanning.

labgarakscanningtools

labgaraktool-setupvulnerability-scannerbeginnerhands-on

Lab: garak opzetten en je eerste scan

Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.

injectionlabbeginnerlabshelloworld

Hello world-prompt injection

Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.

labinjection-detectiondefenseclassificationbeginnerhands-on

Lab: tool voor injection-detectie

Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.

labprompt-injectionsurveytechniquesbeginnerhands-on

Lab: overzicht van injection-technieken

Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.

labinstruction-followingpriorityconflicting-instructionsbeginnerhands-on

Lab: prioriteit bij het volgen van instructies

Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.

labjailbreakbeginnertechniques

Lab: basale jailbreak-technieken

Hands-on exploration of jailbreak techniques including role-play, DAN-style prompts, and academic framing against multiple models.

exploitationoutputlabbeginnerlabsjson

Grondbeginselen van het misbruiken van JSON-uitvoer

Exploit LLM JSON mode to bypass content filters and extract information through structured output channels.

labmarkdown-injectiondata-exfiltrationrenderingbeginner

Lab: Markdown-injectie

Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.

comparisonsecuritylabbeginnerlabsmodel

Lab voor modelbeveiligingsvergelijking

Compare the security posture of different LLM models by running identical test suites across providers.

labmulti-languageprompt-injectiontranslationbeginnerhands-on

Lab: meertalige injection

Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.

labmulti-turnescalationsocial-engineeringbeginner

Lab: multi-turn escalatie-aanvallen

Build trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation techniques.

labollamasetuplocal-modelsbeginner

Lab: Ollama opzetten voor lokaal LLM-testen

Install and configure Ollama for local LLM red teaming, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.

outputformatlabbeginnercontrollabs

Lab voor controle over het uitvoerformaat

Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.

laboutput-formatexploitationdata-leakagebeginnerhands-on

Lab: misbruik van het uitvoerformaat

Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.

laboutputformatmanipulation

Lab: manipulatie van het uitvoerformaat

Hands-on lab on forcing LLMs to output in specific formats including JSON injection, markdown manipulation, and structured output exploitation for data exfiltration.

laboutput-steeringprompt-injectionoutput-manipulationbeginnerhands-on

Lab: uitvoersturing

Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.

labpayload-craftingprompt-injectionred-teamingbeginnerhands-on

Lab: payloads maken

Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.

labencodingbypasscontent-filtersbeginner

Lab: encoding-bypasses

Test Base64, ROT13, hex, and Unicode encoding techniques to evade LLM content filters while preserving the semantic meaning of adversarial payloads.

labbeginnerlabsplaygroundtesting

Beveiligingstesten van LLM-playgrounds

Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.

labprompt-leakingsystem-promptextractionbeginnerhands-on

Lab: grondbeginselen van prompt leaking

Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.

analysisstructurelabbeginnerpromptlabs

Lab voor analyse van promptstructuur

Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.

labpromptfootool-setupevaluationbeginnerhands-on

Lab: promptfoo opzetten en je eerste evaluatie

Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.

labpyrittool-setupred-teamingmicrosoftbeginnerhands-on

Lab: PyRIT opzetten en je eerste aanval

Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.

mappingratelabbeginnerlimitlabs

In kaart brengen van rate limits en quota

Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.

analysisresponselabbeginnerbasicslabs

Grondbeginselen van responsanalyse

Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.

labresponse-analysisfingerprintingsafety-refusalsbeginner

Lab: LLM-responses analyseren

Learn to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting techniques when analyzing LLM outputs during red team testing.

labrole-playpersonajailbreakbeginnerhands-on

Lab: rollenspel-aanvallen

Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.

Lab: rollenspel- en persona-aanvallen

Hands-on lab exploring role-play and persona-based jailbreak techniques, where you craft scenarios that trick LLMs into adopting alternative identities to bypass safety guardrails.

labrole-playpersona

labsafetyboundariesmappingbeginnerhands-on

Lab: Veiligheidsgrenzen in kaart brengen

Ontdek systematisch wat een taalmodel wel en niet wil doen door zijn veiligheidsgrenzen over meerdere categorieën te onderzoeken en de resultaten te documenteren.

safetymappinglabbeginnerboundarylabs

In kaart brengen van veiligheidsgrenzen

Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.

probingsafetylabbeginnertraininglabs

Probing van de grenzen van veiligheidstraining

Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.

labharnessautomationpython

Lab: een eenvoudige test-harness bouwen

Build a reusable Python test harness that automates sending test prompts, recording results, and calculating attack success metrics.

labstructured-outputjsonschema-injectionbeginner

Lab: manipulatie van gestructureerde uitvoer

Explore how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.

labsystem-promptextractionreconnaissance

Lab: extractie van system prompts

Hands-on techniques for extracting hidden system prompts from LLM-powered applications using direct asking, role-play, instruction replay, and other methods.

labprompt-injectionsystem-promptoverridebeginnerhands-on

Lab: system prompt overschrijven

Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.

labsystem-promptreconnaissanceextractionbeginner

Lab: reconstructie van de system prompt

Use various techniques to systematically extract and reconstruct the full system prompt of an LLM application, combining direct, indirect, and incremental extraction methods.

labbeginnerconfigtoolinstallationlabs

Installatie en configuratie van red team-tools

Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.

Expert AI-red team-labs

Advanced labs tackling cutting-edge AI security challenges including quantization exploits, reward hacking, agent exploitation, multi-agent attacks, and watermark removal.

labexpertoverview

labsteeringexpertlabsactivation

Activation steering voor misbruik

Use activation steering techniques to manipulate model behavior at the representation level without prompt modification.

labexpertadversarial-mlgradientsfrom-scratch

Lab: adversarial machine learning vanaf nul

Hands-on expert lab for implementing gradient-based adversarial attacks against language models from scratch without frameworks, building intuition for how adversarial perturbations exploit model gradients.

labexpertadversarial-suffixGCGoptimizationhands-on

Lab: optimalisatie van adversarial suffixes

Implement GCG-style adversarial suffix attacks that automatically discover token sequences causing language models to comply with harmful requests. Covers gradient-based optimization, transferability analysis, and defense evaluation.

labexpertalignmentstress-testingsafetyhands-on

Lab: stresstesten van alignment

Push language model alignment to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure alignment degradation curves, and map the boundaries of model compliance.

labexpertagentscannersecurityhands-on

Lab: bouw een beveiligingsscanner voor agents

Build an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific attack surfaces that traditional LLM testing misses.

labexpertfuzzertestingadversarialhands-on

Lab: bouw een AI-fuzzer

Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.

labexpertbehavior-diffregressioncomparisonhands-on

Lab: bouw een tool voor gedrags-diffing

Build a tool that systematically compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.

labexpertguardrailsevaluationdefense-testinghands-on

Lab: bouw een guardrail-evaluator

Build an automated framework for evaluating AI guardrails and safety filters. Test input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.

labexpertautomationjailbreakframeworkhands-on

Lab: bouw jailbreak-automatisering

Build an automated jailbreak testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.

architecturecrosstransferlabexpertlabs

Lab: cross-architectuur transfer

Develop attacks on one model architecture that reliably transfer to fundamentally different architectures.

labexpertdefense-engineeringproductionmulti-layer

Lab: verdedigingsengineering

Expert-level lab for building and evaluating a production-grade multi-layer AI defense system including input classifiers, output monitors, semantic analysis, and adversarial robustness testing.

exploitationlabexpertbehavioremergentlabs

Misbruik van emergent gedrag

Identify and exploit emergent behaviors in frontier models that arise from scale and are not present in smaller models.

labexpertemergentcapabilityprobinghands-on

Lab: onderzoek van emergente capaciteiten

Systematically test large language models for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. Build a structured probing framework for capability discovery.

labexpertformalverificationlabslimits

Lab: grenzen van formele verificatie

Explore the limits of formal verification for neural network behavior and identify gaps exploitable by adversaries.

labexpertfull-stackexploit-chainmulti-layerhands-on

Lab: full-stack AI-misbruik

Chain vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. Learn to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.

labexpertagentscomputer-useguihands-on

Lab: misbruik van computer use-agents

Hands-on lab exploring attack vectors against GUI-based computer use agents that interact with screens through vision and action primitives.

labexperthoneypotdeceptiondetectionhands-on

Lab: rol een honeypot-AI uit

Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.

labexpertguidedattacklabsinterpretability

Aanvalsontwerp gestuurd door interpreteerbaarheid

Use mechanistic interpretability to identify exploitable circuits and design targeted attacks.

routingmoeexploitationlabexpertlabs

Lab: misbruik van MoE-routing

Exploit Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.

labexpertmulti-agentcoordinationhands-on

Lab: coördinatie van multi-agent-aanvallen

Hands-on lab building and deploying coordinated multi-agent attack systems that divide red teaming tasks across specialized attacking agents.

neurallablabsexpertbackdoorengineering

Engineering van neurale backdoors

Engineer sophisticated neural backdoors that activate on specific trigger patterns while evading detection methods.

labexpertjailbreakresearchnovel-attackshands-on

Lab: onderzoek naar nieuwe jailbreaks

Systematic methodology for discovering new jailbreak techniques against large language models. Learn to identify unexplored attack surfaces, develop novel attack vectors, and validate findings with scientific rigor.

labexpertpipelinepoisoningsupply-chainhands-on

Lab: poisoning van de ML-pijplijn

Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. Learn to identify and exploit weaknesses across the full ML lifecycle.

labexpertquantizationmodel-compressionhands-on

Lab: zwakheden van gekwantiseerde modellen misbruiken

Hands-on lab exploring how model quantization degrades safety alignment, with techniques to find and exploit precision-related vulnerabilities.

labexpertregistrysupply-chainmodel-integrityhands-on

Lab: compromittering van het model registry

Explore techniques for compromising model registries and substituting malicious models into production pipelines. Learn to detect model tampering, verify model provenance, and secure the model supply chain.

labexpertresearchmethodologyvulnerability-discovery

Lab: methodologie voor kwetsbaarheidsonderzoek

Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, attack surface mapping, experimental design, validation protocols, and responsible disclosure.

labexpertrlhfreward-hackingalignmenthands-on

Lab: reward hacking bij RLHF

Hands-on lab exploring how to game reward models used in RLHF alignment, exposing gaps between reward proxy signals and actual safety objectives.

labexpertbenchmarkevaluationsafetyhands-on

Lab: maak een veiligheidsbenchmark

Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.

attackstimelabexperttraininglabs

End-to-end aanvallen tijdens training

Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.

labexpertwatermarkingdetectionhands-on

Lab: detectie en verwijdering van AI-watermerken

Hands-on lab exploring techniques for detecting and removing statistical watermarks embedded in AI-generated text, and evaluating watermark robustness.

encodingchainsintermediateadvancedlablabs

Geavanceerde encoding chain-aanvallen

Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.

labragpoisoningadvanceddata-attacks

Lab: geavanceerde RAG-poisoning

Hands-on lab for crafting documents that reliably get retrieved and influence RAG responses for specific target queries.

labagentsexploitationtool-use

Lab: AI-agents misbruiken

Hands-on lab for exploiting AI agents with tool access through indirect prompt injection, tool-call chaining, and privilege escalation techniques.

labagentsmemorypersistenceagentic-attacks

Lab: manipulatie van agentgeheugen

Hands-on lab for injecting persistent instructions into an agent's memory and context that affect future interactions and conversations.

labagentsprompt-leakingtool-useintermediate

Lab: Agent Prompt Leaking

Praktijklab voor het extraheren van systeemprompts uit tool-gebruikende AI-agents door tool-interacties, observatiekanalen en redeneren in meerdere stappen uit te buiten om beschermde instructies te lekken.

intermediateabuselabpatternslabsapi

Lab: API-misbruikpatronen

Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.

labautomationci-cdpromptfoopipelineintermediate

Lab: geautomatiseerde red team-pijplijn

Hands-on lab for building a continuous AI red team testing pipeline using promptfoo, GitHub Actions, and automated attack generation to catch safety regressions before deployment.

labautomated-testingpromptfooci-cdintermediatehands-on

Lab: geautomatiseerd red team-testen

Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.

labcloudazurecontent-filterevasioncloud-ai

Lab: ontwijking van het Azure-contentfilter

Hands-on lab for mapping and testing Azure OpenAI Service content filtering categories, severity levels, and bypass techniques.

labcloudawsbedrockguardrailscloud-ai

Lab: AWS Bedrock-guardrails testen

Hands-on lab for systematically testing and bypassing AWS Bedrock's built-in guardrails including content filters, denied topics, and word filters.

labragchunkingevasiondata-attacks

Lab: misbruik van chunking

Hands-on lab for crafting documents that split across chunks in ways that hide malicious content from chunk-level filtering while maintaining attack effectiveness.

labragcitationfabricationhallucinationdata-attacks

Lab: fabricatie van bronvermeldingen

Hands-on lab for getting RAG systems to cite documents that don't exist or misattribute quotes to legitimate sources.

classifierintermediateevasionlablabs

Lab: ontwijking van ML-classifiers

Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.

labcontext-overflowattentioncontext-window

Lab: overflow-aanvallen op het contextvenster

Hands-on lab exploring how overflowing a model's context window with padding content can push safety instructions out of the attention window and enable injection attacks.

contextintermediatelabpoisoningwindowlabs

Lab: poisoning van het context window

Exploit context window management to inject persistent adversarial content that influences future model responses.

labcontext-windowstuffing

Lab: Context Window Stuffing-aanvallen

Praktijklab dat laat zien hoe oversized inputs het contextvenster van een LLM kunnen overweldigen om veiligheidsinstructies te verdunnen, systeemprompts uit het attention-venster te duwen of instructie-amnesie te veroorzaken.

intermediatesplittingconversationlablabs

Lab: conversation splitting-aanval

Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.

labdata-exfiltrationcovert-channelsmarkdownintermediate

Lab: kanalen voor data-exfiltratie

Hands-on lab for extracting data from AI systems through markdown image rendering, invisible links, tool call parameters, and other covert exfiltration channels.

labdata-exfiltrationchannelsside-channelintermediatehands-on

Lab: kanalen voor data-exfiltratie (lab voor gevorderden)

Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.

labdata-exfiltrationleakageagents

Lab: technieken voor data-exfiltratie

Hands-on lab for extracting sensitive data from AI systems including system prompt extraction, context leakage via markdown rendering, and URL-based data exfiltration.

labdefense-bypasssystematic

Lab: systematisch verdedigingen omzeilen

Hands-on lab for methodically identifying, categorizing, and bypassing common LLM defense mechanisms including input filters, output filters, and system prompt hardening.

labdefensemetricsmeasurementguardrailsintermediate

Lab: de effectiviteit van verdedigingen meten

Hands-on lab for quantifying AI guardrail robustness using attack success rates, evasion metrics, false positive rates, and statistical analysis of defense performance.

labdefense-testingguardrailsmetricsintermediatehands-on

Lab: de effectiviteit van verdedigingen testen

Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.

intermediateinjectionlablabsragdocument

Lab: op documenten gebaseerde RAG injection

Inject adversarial content into documents that will be processed by a RAG system to influence model responses.

intermediatelabpracticalmanipulationembeddinglabs

Praktische manipulatie van embeddings

Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.

labembeddingsmanipulationvectors

Lab: manipulatie van de embeddingruimte

Hands-on lab for crafting documents optimized to be retrieved for specific queries through embedding collision attacks using sentence-transformers.

labencodingbypassobfuscationsystematic-testingintermediate

Lab: encoding omzeilen

Hands-on lab for systematically testing Base64, ROT13, Unicode homoglyph, hexadecimal, and combined encoding bypasses against LLM safety filters with measured success rates.

labencodingevasionobfuscationfilters

Lab: ontwijkingstechnieken via encoding

Hands-on lab using Base64, ROT13, Unicode normalization, and custom encoding schemes to evade input filters and safety classifiers in language model systems.

labfew-shotinjectionprompt-engineering

Lab: injection van few-shot-voorbeelden

Hands-on lab exploring how injected few-shot examples can steer language model outputs toward attacker-chosen behaviors by exploiting in-context learning.

fineintermediatetuninglablabsapi

Beveiligingsonderzoek van de fine-tuning-API

Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.

labfunction-callingtool-use

Lab: misbruik van function calling en tool-gebruik

Hands-on lab exploring how attackers can manipulate LLM function calling and tool use to execute unauthorized actions, exfiltrate data, and chain tool calls for maximum impact.

labfunction-callinginjectionagentsagentic-attacks

Lab: injection via function calling

Hands-on lab for exploiting function calling mechanisms by crafting inputs that manipulate which functions get called and with what parameters.

guardrailruleintermediatelablabsextraction

Lab: extractie van guardrail-regels

Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.

Lab: op afbeeldingen gebaseerde prompt injection

Hands-on lab exploring how text instructions embedded in images can be used to perform prompt injection against vision-language models (VLMs) that process visual input.

labimageinjection

labindirect-injectionexternal-datadocumentsintermediatehands-on

Lab: indirecte prompt injection

Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.

labindirect-injectionchainsweb

Lab: ketens van indirecte prompt injection

Hands-on lab for setting up indirect prompt injection scenarios through web pages, emails, and documents, testing multi-hop injection chains against AI systems.

labjsoninjectionstructured-inputparsing

Lab: JSON-invoerinjectie

Hands-on lab exploring how adversarial payloads injected through structured JSON inputs can manipulate language model behavior, bypass schema validation, and exploit parsing inconsistencies.

labllm-judgeevaluationautomation

Lab: een LLM-judge-evaluator bouwen

Hands-on lab for building an LLM-based evaluator to score red team attack outputs, compare model vulnerability, and lay the foundation for automated attack campaigns.

labmarkdowninjectionrenderingexfiltration

Lab: op Markdown gebaseerde injectie

Hands-on lab exploring how Markdown rendering in AI-generated outputs can be exploited to inject hidden content, exfiltrate data through image tags, and manipulate displayed information.

Lab: scenario's voor misbruik van MCP-tools

Hands-on lab for exploiting Model Context Protocol tool definitions through malicious tool descriptions, tool confusion attacks, and server impersonation.

labmcptool-abuseagents

labmcpexploitationtool-shadowingconsent-phishingagentic-attacks

Lab: misbruik van MCP-servers

Hands-on lab for attacking a vulnerable MCP server, demonstrating tool shadowing, consent phishing, and server impersonation techniques.

Lab: poisoning van agentgeheugen

Hands-on lab exploring how conversational memory in AI agents can be poisoned to alter future behavior, inject persistent instructions, and exfiltrate data across sessions.

labmemorypoisoning

labragmetadatainjectiondata-attacks

Lab: injection via RAG-metadata

Hands-on lab for exploiting metadata fields like titles, descriptions, and timestamps to manipulate RAG retrieval ranking and influence responses.

labmodel-extractionstealingapi

Lab: basale modelextractie

Hands-on lab for API-based model extraction attacks, querying a target model to approximate its behavior, measuring fidelity, and understanding query budgets.

fingerprintingintermediatelabpracticallabsmodel

Praktisch lab voor model-fingerprinting

Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.

intermediatebypasslablanguagelabsmulti

Lab voor meertalige veiligheidsbypass

Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.

pipelineintermediatelablabsmultimodel

Lab voor aanvallen op multi-modelpijplijnen

Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.

labmulti-turnescalationprogressivesocial-engineering

Lab: progressieve escalatie over meerdere beurten

Hands-on lab building multi-turn attack sequences that progressively escalate from benign queries to full safety bypass through gradual trust establishment and boundary pushing.

labmulti-turncampaignspersistence

Lab: multi-turn-aanvalscampagnes

Hands-on lab for executing multi-turn crescendo attacks against LLMs, measuring safety degradation over conversation length, and building persistent attack campaigns.

labmultimodalvisionimage-injectionintermediate

Lab: multimodale injection

Hands-on lab for embedding text instructions in images to exploit vision-enabled LLMs. Learn to craft visual prompt injections, test OCR-based attacks, and evaluate multimodal safety boundaries.

labmultimodalimage-injectionvisionintermediatehands-on

Lab: multimodale injection (lab voor gevorderden)

Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.

discoverychannelintermediateoutputlablabs

Lab voor ontdekking van uitvoerkanalen

Discover alternative output channels in LLM applications that bypass standard output filtering.

labagentspermission-escalationprivilege-escalationagentic-attacks

Lab: rechtenescalatie bij agents

Hands-on lab for tricking AI agents into performing actions beyond their intended scope through privilege escalation techniques.

intermediatetechniqueslabpromptlabsleakage

Lab voor technieken van promptlekkage

Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.

Lab: regressietesten met promptfoo

Hands-on lab for setting up promptfoo to run automated regression tests against LLM applications, ensuring that safety properties hold across model updates and prompt changes.

labpromptfooregression

Lab: PyRIT-campagnes uitvoeren

Hands-on lab for configuring and running automated red teaming campaigns using Microsoft's PyRIT framework against LLM applications.

labpyritcampaigns

labragpoisoningdata-attacks

Lab: poisoning van de RAG-pijplijn

Hands-on lab for setting up a RAG pipeline with LlamaIndex, injecting malicious documents, testing retrieval poisoning, and measuring injection success rates.

intermediatechainlabhijackingreasoninglabs

Kaping van de redeneerketen

Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.

labrecursive-injectionchainsfeedback-loops

Lab: recursieve injection-ketens

Hands-on lab building self-reinforcing prompt injection chains where each model response amplifies the injection, creating feedback loops that progressively override safety controls.

labragrerankingmanipulationdata-attacks

Lab: re-ranking-aanvallen

Hands-on lab for manipulating the re-ranking stage of RAG pipelines to promote or suppress specific documents in retrieval results.

craftingintermediatesemanticlabdeceptionlabs

Lab voor het maken van semantische misleiding

Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.

exploitationintermediatesessionlablabsstate

Lab voor misbruik van sessiestatus

Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.

intermediatebypassoutputlabstructuredlabs

Veiligheidsbypass via gestructureerde uitvoer

Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.

labsupply-chainauditdependenciessecurityintermediatehands-on

Lab: supply chain-audit

Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.

labsupply-chainsecurityscanningdependenciesintermediate

Lab: scan van de ML-supply chain

Hands-on lab for auditing machine learning model dependencies, detecting malicious packages in ML pipelines, and scanning model files for backdoors and supply chain threats.

labtoken-smugglingencoding

Lab: token smuggling en encoding omzeilen

Hands-on lab exploring how tokenization boundaries, token-level manipulations, and encoding tricks can smuggle attack payloads past safety classifiers.

labtool-poisoningagentsindirect-injectionagentic-attacks

Lab: poisoning van toolresultaten

Hands-on lab for poisoning tool outputs to redirect agent behavior by injecting malicious content through tool results.

labtranslationbypassmultilinguallow-resource

Lab: omzeilen via vertaling

Hands-on lab exploiting language translation capabilities to bypass safety filters by submitting prompts in low-resource languages, code-switching, and using transliteration tricks.

labprompt-injectiontokenizerunicode

Lab: Geavanceerde token-smuggling via Unicode-normalisatie

Buit normalisatieverschillen tussen invoervalidators en LLM-tokenizers uit om contentfilters te omzeilen en verborgen instructies te injecteren.

labagentsworkflowhijackingagentic-attacks

Lab: workflow-hijacking

Hands-on lab for redirecting multi-step agent workflows to perform unintended actions by manipulating intermediate states and control flow.

labaudioadversarialhands-on

Lab: Adversariële audiovoorbeelden vervaardigen

Praktijklab voor het creëren van adversariële audiovoorbeelden met Python-audioverwerking, gericht op Whisper-transcriptie met geïnjecteerde tekst.

labmultimodalattack-chain

Lab: Multimodale aanvalsketen

Praktisch lab voor het bouwen en uitvoeren van een aanvalsketen met meerdere stappen die afbeeldingsinjectie, documentexploitatie en tekstgebaseerde technieken combineert tegen een multimodaal AI-systeem.

labvideoadversarialhands-on

Lab: Adversariële aanvallen op videomodellen

Praktisch lab waarin je adversariële videoframes maakt met perturbatie op frameniveau met OpenCV en PyTorch voor de exploitatie van videomodellen.

labimage-injectionhands-onvlm

Lab: Op afbeeldingen gebaseerde injecties vervaardigen

Praktisch lab voor het maken van op afbeeldingen gebaseerde prompt-injecties, testen tegen VLM's en het meten van slagingspercentages over verschillende injectietechnieken.

operationslabsetupred-team

Red team-lab en -operaties

Operational foundations for AI red teaming: lab environments, evidence handling, engagement workflows, and team management for professional AI security assessments.

labfederated-learninghands-onflowerbyzantinemodel-poisoningsimulation

Lab: Federated learning aanvallen

Praktisch lab dat modelvergiftigingsaanvallen implementeert in een gesimuleerde federated learning-opzet met het Flower-framework: Byzantijnse aanvallen, modelvervanging en het meten van aanvalsimpact.

labquantizationhands-onjailbreaksafety-testingtransformers

Lab: Gequantiseerde modellen exploiteren

Praktijklab dat slagingskansen van aanvallen vergelijkt over quantisatieniveaus: jailbreaks testen op FP16 vs INT8 vs INT4, veiligheidsdegradatie meten, en quantisatiebewuste exploits ontwerpen.

Lab: Een fine-tuning-backdoor invoegen (trainingspijplijn)

Praktijklab voor het creëren, invoegen en detecteren van een trigger-gebaseerde backdoor in een taalmodel via fine-tuning, met behulp van LoRA-adapters op een lokaal model.

labfine-tuningbackdoor

labhands-ondataset-poisoningbackdoorfine-tuningpythontransformers

Lab: een trainingsdataset vergiftigen

Praktisch lab dat datasetvergiftiging en fine-tuning demonstreert om gedragsverandering te tonen, met stapsgewijze Python-code, meting van de backdoor-trigger en troubleshooting-richtlijnen.