# lab
231 artikelengetagd met “lab”
Lab: guardrails systematisch omzeilen
Praktijklab voor het methodisch onderzoeken, classificeren en omzeilen van invoer-/uitvoer-guardrails in productie-AI-systemen met een gestructureerde red team-workflow.
Lab: toolvergelijking — hetzelfde doelwit, 4 tools
Praktijklab waarin Garak, PyRIT, promptfoo en Inspect AI worden vergeleken tegen hetzelfde doelmodel. Beoordeel dekking, snelheid, kwaliteit van bevindingen en praktische trade-offs van elke tool.
Lab: embeddingruimtes verkennen
Praktisch lab met Python om embeddingruimtes te visualiseren, semantische gelijkenis te meten en te demonstreren hoe adversarial documenten kunnen worden gemaakt die matchen met doel-queries.
Lab: het aanvalsoppervlak van een AI-systeem in kaart brengen
Praktijklab dat je door de verkenning van een AI-systeem leidt — componenten identificeren, gegevensstromen in kaart brengen, tools inventariseren en het aanvalsoppervlak documenteren.
Lab: gesimuleerde robotbesturingsexploitatie
Praktische laboefeningen voor het exploiteren van door LLM's bestuurde robots in simulatie: omgevingsopzet, injectie-aanvallen, testen van veiligheidsbypass en meerstaps exploitatieketens met PyBullet.
Lab: geavanceerde RAG-manipulatie
Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.
Evaluatie van adversarial robuustheid
Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.
Lab: adversarial suffixes genereren
Implement the Greedy Coordinate Gradient (GCG) algorithm to generate adversarial suffixes that cause language models to comply with harmful requests by appending optimized token sequences.
Lab: een aanvalsketen construeren
Hands-on lab for chaining three or more distinct vulnerabilities into a complete exploit sequence that achieves objectives impossible with any single technique alone.
Lab: adversarial voorbeelden in audio
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
Lab: AutoDAN implementeren
Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.
Lab: backdoor-detectie in fine-tuned modellen
Analyze a fine-tuned language model to find and characterize an inserted backdoor, using behavioral probing, activation analysis, and statistical testing techniques.
Lab: pijplijn voor continue geautomatiseerde redteaming
Build an end-to-end CART pipeline that continuously generates, executes, and scores adversarial attacks against LLM applications, with alerting and trend tracking.
Lab: beveiligingsbeoordeling van cloud-AI
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
Lab: cloud-AI-beoordeling
Hands-on lab for conducting an end-to-end security assessment of a cloud-deployed AI system including infrastructure review, API testing, model security evaluation, and data flow analysis.
Lab: breakout uit een gecontaineriseerd model
Explore techniques for escaping from containerized AI applications to the host system, testing container isolation boundaries in ML deployment environments.
Lab: transfer-aanvallen tussen modellen
Test whether jailbreaks discovered on one language model transfer effectively to others, building a systematic methodology for cross-model vulnerability research.
Cross-plug-in exploitatieketen
Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.
Training van een custom safety-classifier
Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.
Lab: een productie-red team-harness bouwen
Build a full-featured, production-quality red team harness with multi-model support, async testing, structured result storage, and HTML reporting.
Lab: eigen test-harness voor specifieke applicaties
Build a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
War game: verdediging versus red team
Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.
Lab: ensemble-aanvallen
Use multiple language models collaboratively to discover attack strategies that bypass any single model's defenses, leveraging model diversity for more effective red teaming.
Lab: poisoning-aanval op federated learning
Hands-on lab for understanding and simulating poisoning attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted gradient updates.
Lab: een backdoor invoegen via fine-tuning
Advanced lab demonstrating how fine-tuning can insert hidden backdoors into language models that activate on specific trigger phrases while maintaining normal behavior otherwise.
Lab: bouw een AI-fuzzing-harness
Create a systematic fuzzing framework for testing LLM boundaries, generating and mutating inputs to discover unexpected model behaviors and safety edge cases.
Lab: side-channel-aanvallen op de GPU
Demonstrate information leakage through GPU memory residuals and timing side channels, showing how shared GPU infrastructure can expose sensitive model data.
Gradient-gestuurde datavergiftiging
Use gradient information from open-source models to craft optimally poisoned training examples.
Lab: guardrail-bypasses aaneenschakelen
Advanced lab on identifying, isolating, and chaining multiple guardrail bypass techniques to defeat layered defense systems in production LLM applications.
Lab: onderzoek van AI-incidenten
Investigate logs and artifacts from a compromised AI system to reconstruct the attack chain, identify the vulnerability exploited, and determine the scope of the breach.
Lab: misbruik van inference-servers
Attack vLLM, TGI, and Triton inference servers to discover information disclosure vulnerabilities, denial-of-service vectors, and configuration weaknesses in model serving infrastructure.
Lab: volledig misbruik van MCP
Hands-on lab for conducting a complete Model Context Protocol server compromise including tool enumeration, permission escalation, cross-tool attacks, and data exfiltration through MCP channels.
Lab: beveiligingsvergelijking van modellen
Systematically compare the safety posture of major language models using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.
Lab: beveiliging van modeldistillatie
Extract model capabilities through distillation techniques using only black-box API access.
Lab: aanvallen op model serving-frameworks
Exploit vulnerabilities in TensorFlow Serving, TorchServe, and Triton Inference Server, targeting model loading, API endpoints, and management interfaces.
Lab: vergelijkend redteamen over meerdere modellen
Test the same attack suite across GPT-4, Claude, Llama, and Gemini. Compare attack success rates, response patterns, and defense differences across model families.
Multi-objective aanvalsoptimalisatie
Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.
Lab: geavanceerde multi-turn-aanvallen
Hands-on lab for executing 20+ turn conversation steering strategies that gradually erode AI safety boundaries to extract protected information through patient, escalating dialogue.
Lab voor multimodale aanvalsketens
Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.
Lab: multimodale aanvalspijplijn
Build an automated multimodal attack pipeline that generates adversarial images, combines them with text prompts, and tests against vision-language models (VLMs).
Lab: een PAIR-aanval implementeren
Implement the PAIR (Prompt Automatic Iterative Refinement) algorithm where an attacker LLM iteratively refines jailbreak prompts against a target LLM until a successful attack is found.
Lab: implementatie van een PAIR-aanval
Build a complete Prompt Automatic Iterative Refinement system that uses an attacker LLM to automatically generate and refine jailbreak prompts against a target model.
Lab voor simulatie van een productieomgeving
Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.
Lab: purple team-oefening
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
Lab: purple team-oefening (lab voor gevorderden)
Hands-on lab for conducting simultaneous attack and defense operations against an AI system with real-time metrics tracking, adaptive defense deployment, and coordinated red-blue team workflows.
Lab: aanval op de volledige RAG-keten
Hands-on lab for executing a complete RAG attack chain from document injection through retrieval manipulation to data exfiltration, targeting every stage of the Retrieval-Augmented Generation pipeline.
Lab: misbruik van reasoning-modellen
Attack reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection techniques.
Lab: red team-orchestratie
Build an orchestration system that coordinates multiple attack strategies simultaneously, managing parallel attack campaigns and synthesizing results into comprehensive risk assessments.
Lab: veiligheidsregressietesten op schaal
Build automated pipelines that detect safety degradation across model versions, ensuring that updates and fine-tuning do not introduce new vulnerabilities or weaken existing protections.
Representation probing voor kwetsbaarheden
Probe internal model representations to identify exploitable features and develop representation-level attacks.
Lab voor analyse van reward-modellen
Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.
Lab: misbruik van model-routing
Attack systems that route requests to different models based on complexity or content, exploiting routing logic to reach less-defended models or bypass safety filters.
Lab: implementatie van het TAP-algoritme
Implement the TAP (Tree of Attacks with Pruning) algorithm that uses tree-based search over attack prompts with branch pruning to efficiently find jailbreaks.
Lab: extractie van trainingsdata op schaal
Extract memorized training data from language models using prefix-based extraction, divergence testing, and membership inference. Measure extraction rates and assess privacy risks.
Lab: ontwikkeling van transfer-aanvallen
Hands-on lab for crafting adversarial prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.
Lab: ontwikkeling van transfer-aanvallen (lab voor gevorderden)
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
Lab: grondbeginselen van de Anthropic Claude API
Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.
Verkenning van multi-provider-API's
Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.
Lab: beveiliging van API-sleutels
Learn common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.
Lab: modeltesten via de API
Learn to test language models through their APIs including OpenAI, Anthropic, and local Ollama endpoints. Build reusable API testing functions with proper error handling.
Basisopzet voor geautomatiseerd testen
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
Lab: bouw je eerste verdediging
Hands-on lab for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block prompt injection attempts.
Basistesten van verdedigingsmechanismen
Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.
Lab: bouw je eerste verdediging (beginnerlab)
Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.
Oefening in het documenteren van beveiligingsbevindingen
Practice documenting security findings in a professional format with reproducible steps and impact assessment.
Lab: modelvergelijking
Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.
Lab: vergelijk de veiligheid van modellen
Hands-on lab for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle prompt injection, jailbreaks, and safety boundary enforcement.
Lab: contextmanipulatie
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
Lab: context overflow-aanvallen
Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
Analyse van conversatiegeschiedenis
Analyze how conversation history influences model behavior and identify opportunities for context manipulation.
Lab: grondbeginselen van het omzeilen van verdedigingen
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
Lab: basale verdedigingsontwijking
Encounter and bypass basic LLM defenses including keyword filters, content classifiers, and input preprocessing through encoding tricks, synonym substitution, and payload splitting.
Lab: delimiter injection-aanvallen
Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.
Lab: delimiter escape-aanvallen
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
Embedding-grondbeginselen voor beveiliging
Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.
Lab: encoding- en obfuscatietechnieken
Hands-on lab exploring how text encoding, character substitution, and obfuscation methods can bypass LLM content filters and safety mechanisms.
Lab: je red team-omgeving opzetten
Step-by-step guide to setting up a complete AI red teaming environment with Python, API clients, scanning tools, and local models.
Inlichtingen verzamelen uit foutmeldingen
Extract system architecture information from error messages and response patterns in LLM applications.
Lab: ethisch redteamen
Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.
Lab: ethisch redteamen (beginnerlab)
Hands-on lab for practicing responsible AI red teaming with proper documentation, scope management, ethical boundaries, and disclosure procedures.
Lab: few-shot-manipulatie-aanvallen
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.
Lab: je eerste prompt injection
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
Lab: je eerste jailbreak
Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.
Lab: scannen met garak
Install and run the Garak LLM vulnerability scanner against local models. Learn to configure probes, generators, and evaluators for comprehensive scanning.
Lab: garak opzetten en je eerste scan
Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.
Hello world-prompt injection
Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.
Lab: tool voor injection-detectie
Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
Lab: overzicht van injection-technieken
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
Lab: prioriteit bij het volgen van instructies
Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.
Lab: basale jailbreak-technieken
Hands-on exploration of jailbreak techniques including role-play, DAN-style prompts, and academic framing against multiple models.
Grondbeginselen van het misbruiken van JSON-uitvoer
Exploit LLM JSON mode to bypass content filters and extract information through structured output channels.
Lab: Markdown-injectie
Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.
Lab voor modelbeveiligingsvergelijking
Compare the security posture of different LLM models by running identical test suites across providers.
Lab: meertalige injection
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
Lab: multi-turn escalatie-aanvallen
Build trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation techniques.
Lab: Ollama opzetten voor lokaal LLM-testen
Install and configure Ollama for local LLM red teaming, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.
Lab voor controle over het uitvoerformaat
Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.
Lab: misbruik van het uitvoerformaat
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
Lab: manipulatie van het uitvoerformaat
Hands-on lab on forcing LLMs to output in specific formats including JSON injection, markdown manipulation, and structured output exploitation for data exfiltration.
Lab: uitvoersturing
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
Lab: payloads maken
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
Lab: encoding-bypasses
Test Base64, ROT13, hex, and Unicode encoding techniques to evade LLM content filters while preserving the semantic meaning of adversarial payloads.
Beveiligingstesten van LLM-playgrounds
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
Lab: grondbeginselen van prompt leaking
Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.
Lab voor analyse van promptstructuur
Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.
Lab: promptfoo opzetten en je eerste evaluatie
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
Lab: PyRIT opzetten en je eerste aanval
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.
In kaart brengen van rate limits en quota
Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.
Grondbeginselen van responsanalyse
Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.
Lab: LLM-responses analyseren
Learn to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting techniques when analyzing LLM outputs during red team testing.
Lab: rollenspel-aanvallen
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
Lab: rollenspel- en persona-aanvallen
Hands-on lab exploring role-play and persona-based jailbreak techniques, where you craft scenarios that trick LLMs into adopting alternative identities to bypass safety guardrails.
Lab: Veiligheidsgrenzen in kaart brengen
Ontdek systematisch wat een taalmodel wel en niet wil doen door zijn veiligheidsgrenzen over meerdere categorieën te onderzoeken en de resultaten te documenteren.
In kaart brengen van veiligheidsgrenzen
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.
Probing van de grenzen van veiligheidstraining
Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.
Lab: een eenvoudige test-harness bouwen
Build a reusable Python test harness that automates sending test prompts, recording results, and calculating attack success metrics.
Lab: manipulatie van gestructureerde uitvoer
Explore how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.
Lab: extractie van system prompts
Hands-on techniques for extracting hidden system prompts from LLM-powered applications using direct asking, role-play, instruction replay, and other methods.
Lab: system prompt overschrijven
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
Lab: reconstructie van de system prompt
Use various techniques to systematically extract and reconstruct the full system prompt of an LLM application, combining direct, indirect, and incremental extraction methods.
Installatie en configuratie van red team-tools
Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.
Expert AI-red team-labs
Advanced labs tackling cutting-edge AI security challenges including quantization exploits, reward hacking, agent exploitation, multi-agent attacks, and watermark removal.
Activation steering voor misbruik
Use activation steering techniques to manipulate model behavior at the representation level without prompt modification.
Lab: adversarial machine learning vanaf nul
Hands-on expert lab for implementing gradient-based adversarial attacks against language models from scratch without frameworks, building intuition for how adversarial perturbations exploit model gradients.
Lab: optimalisatie van adversarial suffixes
Implement GCG-style adversarial suffix attacks that automatically discover token sequences causing language models to comply with harmful requests. Covers gradient-based optimization, transferability analysis, and defense evaluation.
Lab: stresstesten van alignment
Push language model alignment to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure alignment degradation curves, and map the boundaries of model compliance.
Lab: bouw een beveiligingsscanner voor agents
Build an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific attack surfaces that traditional LLM testing misses.
Lab: bouw een AI-fuzzer
Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.
Lab: bouw een tool voor gedrags-diffing
Build a tool that systematically compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
Lab: bouw een guardrail-evaluator
Build an automated framework for evaluating AI guardrails and safety filters. Test input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.
Lab: bouw jailbreak-automatisering
Build an automated jailbreak testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.
Lab: cross-architectuur transfer
Develop attacks on one model architecture that reliably transfer to fundamentally different architectures.
Lab: verdedigingsengineering
Expert-level lab for building and evaluating a production-grade multi-layer AI defense system including input classifiers, output monitors, semantic analysis, and adversarial robustness testing.
Misbruik van emergent gedrag
Identify and exploit emergent behaviors in frontier models that arise from scale and are not present in smaller models.
Lab: onderzoek van emergente capaciteiten
Systematically test large language models for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. Build a structured probing framework for capability discovery.
Lab: grenzen van formele verificatie
Explore the limits of formal verification for neural network behavior and identify gaps exploitable by adversaries.
Lab: full-stack AI-misbruik
Chain vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. Learn to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.
Lab: misbruik van computer use-agents
Hands-on lab exploring attack vectors against GUI-based computer use agents that interact with screens through vision and action primitives.
Lab: rol een honeypot-AI uit
Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.
Aanvalsontwerp gestuurd door interpreteerbaarheid
Use mechanistic interpretability to identify exploitable circuits and design targeted attacks.
Lab: misbruik van MoE-routing
Exploit Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.
Lab: coördinatie van multi-agent-aanvallen
Hands-on lab building and deploying coordinated multi-agent attack systems that divide red teaming tasks across specialized attacking agents.
Engineering van neurale backdoors
Engineer sophisticated neural backdoors that activate on specific trigger patterns while evading detection methods.
Lab: onderzoek naar nieuwe jailbreaks
Systematic methodology for discovering new jailbreak techniques against large language models. Learn to identify unexplored attack surfaces, develop novel attack vectors, and validate findings with scientific rigor.
Lab: poisoning van de ML-pijplijn
Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. Learn to identify and exploit weaknesses across the full ML lifecycle.
Lab: zwakheden van gekwantiseerde modellen misbruiken
Hands-on lab exploring how model quantization degrades safety alignment, with techniques to find and exploit precision-related vulnerabilities.
Lab: compromittering van het model registry
Explore techniques for compromising model registries and substituting malicious models into production pipelines. Learn to detect model tampering, verify model provenance, and secure the model supply chain.
Lab: methodologie voor kwetsbaarheidsonderzoek
Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, attack surface mapping, experimental design, validation protocols, and responsible disclosure.
Lab: reward hacking bij RLHF
Hands-on lab exploring how to game reward models used in RLHF alignment, exposing gaps between reward proxy signals and actual safety objectives.
Lab: maak een veiligheidsbenchmark
Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.
End-to-end aanvallen tijdens training
Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.
Lab: detectie en verwijdering van AI-watermerken
Hands-on lab exploring techniques for detecting and removing statistical watermarks embedded in AI-generated text, and evaluating watermark robustness.
Geavanceerde encoding chain-aanvallen
Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
Lab: geavanceerde RAG-poisoning
Hands-on lab for crafting documents that reliably get retrieved and influence RAG responses for specific target queries.
Lab: AI-agents misbruiken
Hands-on lab for exploiting AI agents with tool access through indirect prompt injection, tool-call chaining, and privilege escalation techniques.
Lab: manipulatie van agentgeheugen
Hands-on lab for injecting persistent instructions into an agent's memory and context that affect future interactions and conversations.
Lab: Agent Prompt Leaking
Praktijklab voor het extraheren van systeemprompts uit tool-gebruikende AI-agents door tool-interacties, observatiekanalen en redeneren in meerdere stappen uit te buiten om beschermde instructies te lekken.
Lab: API-misbruikpatronen
Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.
Lab: geautomatiseerde red team-pijplijn
Hands-on lab for building a continuous AI red team testing pipeline using promptfoo, GitHub Actions, and automated attack generation to catch safety regressions before deployment.
Lab: geautomatiseerd red team-testen
Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.
Lab: ontwijking van het Azure-contentfilter
Hands-on lab for mapping and testing Azure OpenAI Service content filtering categories, severity levels, and bypass techniques.
Lab: AWS Bedrock-guardrails testen
Hands-on lab for systematically testing and bypassing AWS Bedrock's built-in guardrails including content filters, denied topics, and word filters.
Lab: misbruik van chunking
Hands-on lab for crafting documents that split across chunks in ways that hide malicious content from chunk-level filtering while maintaining attack effectiveness.
Lab: fabricatie van bronvermeldingen
Hands-on lab for getting RAG systems to cite documents that don't exist or misattribute quotes to legitimate sources.
Lab: ontwijking van ML-classifiers
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
Lab: overflow-aanvallen op het contextvenster
Hands-on lab exploring how overflowing a model's context window with padding content can push safety instructions out of the attention window and enable injection attacks.
Lab: poisoning van het context window
Exploit context window management to inject persistent adversarial content that influences future model responses.
Lab: Context Window Stuffing-aanvallen
Praktijklab dat laat zien hoe oversized inputs het contextvenster van een LLM kunnen overweldigen om veiligheidsinstructies te verdunnen, systeemprompts uit het attention-venster te duwen of instructie-amnesie te veroorzaken.
Lab: conversation splitting-aanval
Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.
Lab: kanalen voor data-exfiltratie
Hands-on lab for extracting data from AI systems through markdown image rendering, invisible links, tool call parameters, and other covert exfiltration channels.
Lab: kanalen voor data-exfiltratie (lab voor gevorderden)
Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.
Lab: technieken voor data-exfiltratie
Hands-on lab for extracting sensitive data from AI systems including system prompt extraction, context leakage via markdown rendering, and URL-based data exfiltration.
Lab: systematisch verdedigingen omzeilen
Hands-on lab for methodically identifying, categorizing, and bypassing common LLM defense mechanisms including input filters, output filters, and system prompt hardening.
Lab: de effectiviteit van verdedigingen meten
Hands-on lab for quantifying AI guardrail robustness using attack success rates, evasion metrics, false positive rates, and statistical analysis of defense performance.
Lab: de effectiviteit van verdedigingen testen
Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.
Lab: op documenten gebaseerde RAG injection
Inject adversarial content into documents that will be processed by a RAG system to influence model responses.
Praktische manipulatie van embeddings
Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.
Lab: manipulatie van de embeddingruimte
Hands-on lab for crafting documents optimized to be retrieved for specific queries through embedding collision attacks using sentence-transformers.
Lab: encoding omzeilen
Hands-on lab for systematically testing Base64, ROT13, Unicode homoglyph, hexadecimal, and combined encoding bypasses against LLM safety filters with measured success rates.
Lab: ontwijkingstechnieken via encoding
Hands-on lab using Base64, ROT13, Unicode normalization, and custom encoding schemes to evade input filters and safety classifiers in language model systems.
Lab: injection van few-shot-voorbeelden
Hands-on lab exploring how injected few-shot examples can steer language model outputs toward attacker-chosen behaviors by exploiting in-context learning.
Beveiligingsonderzoek van de fine-tuning-API
Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.
Lab: misbruik van function calling en tool-gebruik
Hands-on lab exploring how attackers can manipulate LLM function calling and tool use to execute unauthorized actions, exfiltrate data, and chain tool calls for maximum impact.
Lab: injection via function calling
Hands-on lab for exploiting function calling mechanisms by crafting inputs that manipulate which functions get called and with what parameters.
Lab: extractie van guardrail-regels
Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.
Lab: op afbeeldingen gebaseerde prompt injection
Hands-on lab exploring how text instructions embedded in images can be used to perform prompt injection against vision-language models (VLMs) that process visual input.
Lab: indirecte prompt injection
Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.
Lab: ketens van indirecte prompt injection
Hands-on lab for setting up indirect prompt injection scenarios through web pages, emails, and documents, testing multi-hop injection chains against AI systems.
Lab: JSON-invoerinjectie
Hands-on lab exploring how adversarial payloads injected through structured JSON inputs can manipulate language model behavior, bypass schema validation, and exploit parsing inconsistencies.
Lab: een LLM-judge-evaluator bouwen
Hands-on lab for building an LLM-based evaluator to score red team attack outputs, compare model vulnerability, and lay the foundation for automated attack campaigns.
Lab: op Markdown gebaseerde injectie
Hands-on lab exploring how Markdown rendering in AI-generated outputs can be exploited to inject hidden content, exfiltrate data through image tags, and manipulate displayed information.
Lab: scenario's voor misbruik van MCP-tools
Hands-on lab for exploiting Model Context Protocol tool definitions through malicious tool descriptions, tool confusion attacks, and server impersonation.
Lab: misbruik van MCP-servers
Hands-on lab for attacking a vulnerable MCP server, demonstrating tool shadowing, consent phishing, and server impersonation techniques.
Lab: poisoning van agentgeheugen
Hands-on lab exploring how conversational memory in AI agents can be poisoned to alter future behavior, inject persistent instructions, and exfiltrate data across sessions.
Lab: injection via RAG-metadata
Hands-on lab for exploiting metadata fields like titles, descriptions, and timestamps to manipulate RAG retrieval ranking and influence responses.
Lab: basale modelextractie
Hands-on lab for API-based model extraction attacks, querying a target model to approximate its behavior, measuring fidelity, and understanding query budgets.
Praktisch lab voor model-fingerprinting
Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.
Lab voor meertalige veiligheidsbypass
Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.
Lab voor aanvallen op multi-modelpijplijnen
Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.
Lab: progressieve escalatie over meerdere beurten
Hands-on lab building multi-turn attack sequences that progressively escalate from benign queries to full safety bypass through gradual trust establishment and boundary pushing.
Lab: multi-turn-aanvalscampagnes
Hands-on lab for executing multi-turn crescendo attacks against LLMs, measuring safety degradation over conversation length, and building persistent attack campaigns.
Lab: multimodale injection
Hands-on lab for embedding text instructions in images to exploit vision-enabled LLMs. Learn to craft visual prompt injections, test OCR-based attacks, and evaluate multimodal safety boundaries.
Lab: multimodale injection (lab voor gevorderden)
Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.
Lab voor ontdekking van uitvoerkanalen
Discover alternative output channels in LLM applications that bypass standard output filtering.
Lab: rechtenescalatie bij agents
Hands-on lab for tricking AI agents into performing actions beyond their intended scope through privilege escalation techniques.
Lab voor technieken van promptlekkage
Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.
Lab: regressietesten met promptfoo
Hands-on lab for setting up promptfoo to run automated regression tests against LLM applications, ensuring that safety properties hold across model updates and prompt changes.
Lab: PyRIT-campagnes uitvoeren
Hands-on lab for configuring and running automated red teaming campaigns using Microsoft's PyRIT framework against LLM applications.
Lab: poisoning van de RAG-pijplijn
Hands-on lab for setting up a RAG pipeline with LlamaIndex, injecting malicious documents, testing retrieval poisoning, and measuring injection success rates.
Kaping van de redeneerketen
Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.
Lab: recursieve injection-ketens
Hands-on lab building self-reinforcing prompt injection chains where each model response amplifies the injection, creating feedback loops that progressively override safety controls.
Lab: re-ranking-aanvallen
Hands-on lab for manipulating the re-ranking stage of RAG pipelines to promote or suppress specific documents in retrieval results.
Lab voor het maken van semantische misleiding
Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.
Lab voor misbruik van sessiestatus
Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.
Veiligheidsbypass via gestructureerde uitvoer
Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.
Lab: supply chain-audit
Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.
Lab: scan van de ML-supply chain
Hands-on lab for auditing machine learning model dependencies, detecting malicious packages in ML pipelines, and scanning model files for backdoors and supply chain threats.
Lab: token smuggling en encoding omzeilen
Hands-on lab exploring how tokenization boundaries, token-level manipulations, and encoding tricks can smuggle attack payloads past safety classifiers.
Lab: poisoning van toolresultaten
Hands-on lab for poisoning tool outputs to redirect agent behavior by injecting malicious content through tool results.
Lab: omzeilen via vertaling
Hands-on lab exploiting language translation capabilities to bypass safety filters by submitting prompts in low-resource languages, code-switching, and using transliteration tricks.
Lab: Geavanceerde token-smuggling via Unicode-normalisatie
Buit normalisatieverschillen tussen invoervalidators en LLM-tokenizers uit om contentfilters te omzeilen en verborgen instructies te injecteren.
Lab: workflow-hijacking
Hands-on lab for redirecting multi-step agent workflows to perform unintended actions by manipulating intermediate states and control flow.
Lab: Adversariële audiovoorbeelden vervaardigen
Praktijklab voor het creëren van adversariële audiovoorbeelden met Python-audioverwerking, gericht op Whisper-transcriptie met geïnjecteerde tekst.
Lab: Multimodale aanvalsketen
Praktisch lab voor het bouwen en uitvoeren van een aanvalsketen met meerdere stappen die afbeeldingsinjectie, documentexploitatie en tekstgebaseerde technieken combineert tegen een multimodaal AI-systeem.
Lab: Adversariële aanvallen op videomodellen
Praktisch lab waarin je adversariële videoframes maakt met perturbatie op frameniveau met OpenCV en PyTorch voor de exploitatie van videomodellen.
Lab: Op afbeeldingen gebaseerde injecties vervaardigen
Praktisch lab voor het maken van op afbeeldingen gebaseerde prompt-injecties, testen tegen VLM's en het meten van slagingspercentages over verschillende injectietechnieken.
Red team-lab en -operaties
Operational foundations for AI red teaming: lab environments, evidence handling, engagement workflows, and team management for professional AI security assessments.
Lab: Federated learning aanvallen
Praktisch lab dat modelvergiftigingsaanvallen implementeert in een gesimuleerde federated learning-opzet met het Flower-framework: Byzantijnse aanvallen, modelvervanging en het meten van aanvalsimpact.
Lab: Gequantiseerde modellen exploiteren
Praktijklab dat slagingskansen van aanvallen vergelijkt over quantisatieniveaus: jailbreaks testen op FP16 vs INT8 vs INT4, veiligheidsdegradatie meten, en quantisatiebewuste exploits ontwerpen.
Lab: Een fine-tuning-backdoor invoegen (trainingspijplijn)
Praktijklab voor het creëren, invoegen en detecteren van een trigger-gebaseerde backdoor in een taalmodel via fine-tuning, met behulp van LoRA-adapters op een lokaal model.
Lab: een trainingsdataset vergiftigen
Praktisch lab dat datasetvergiftiging en fine-tuning demonstreert om gedragsverandering te tonen, met stapsgewijze Python-code, meting van de backdoor-trigger en troubleshooting-richtlijnen.