# labs
421 artikelengetagd met “labs”
Overzicht van vaardigheidsverificatie
Overview of timed skill verification labs for AI red teaming, including format, pass/fail criteria, and preparation guidance.
Geavanceerde AI red team-labs
Advanced hands-on labs covering automated attack algorithms, production red team tooling, multi-model testing, and sophisticated exploitation techniques for experienced AI security practitioners.
Lab: geavanceerde RAG-manipulatie
Perform sophisticated RAG manipulation including embedding space attacks, metadata poisoning, and retrieval algorithm gaming.
Evaluatie van adversarial robuustheid
Build a comprehensive adversarial robustness evaluation framework for assessing model security posture.
Optimalisatie van adversarial suffixes (GCG)
Implement the Greedy Coordinate Gradient attack to generate adversarial suffixes.
Misbruik van de toolchain van agents
Chain multiple tool calls in an agent system to achieve multi-step exploitation.
Lab: onderzoek van alignment-grenzen
Systematically probe alignment boundaries to map the exact thresholds where safety training engages and disengages.
Lab: AutoDAN implementeren
Implement the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.
Framework voor geautomatiseerde verdedigingsevaluatie
Build an automated framework to evaluate defensive measures across attack categories.
Lab: een geautomatiseerde red team-pijplijn bouwen
Build a complete automated red teaming pipeline with attack generation, execution, scoring, and reporting.
Lab: persistentie van backdoors door veiligheidstraining
Test whether fine-tuned backdoors persist through subsequent safety training rounds and RLHF alignment.
Lab: technieken om Constitutional AI te omzeilen
Test and bypass Constitutional AI safety mechanisms by exploiting the critique-revision training methodology.
Lab: cross-linguale transfer-aanvallen
Develop attacks in one language that transfer to others by exploiting shared multilingual representation spaces.
Cross-model transfer-aanvallen
Develop attacks on open-source models that transfer to closed-source commercial APIs.
Cross-plug-in exploitatieketen
Chain exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.
Lab: poisoning van het trainingscurriculum
Exploit training data ordering and curriculum learning to amplify the impact of small numbers of poisoned examples.
Training van een custom safety-classifier
Train a custom input safety classifier and then develop payloads that reliably evade it to understand classifier limitations.
Een custom red team-harness bouwen
Build a complete red team testing harness with parallel execution, logging, and scoring.
War game: verdediging versus red team
Alternate between attacking and defending an LLM application to develop skills in both offensive and defensive operations.
Differentieel testen over modellen heen
Use differential testing to find behavior inconsistencies across model providers.
Lab: simulatie van aanvallen op gedistribueerde training
Simulate attacks on distributed training infrastructure including gradient poisoning and aggregation manipulation.
Adversarial verstoring van embeddings
Craft adversarial inputs that produce target embeddings for retrieval manipulation.
Lab: evaluatieframeworks manipuleren
Demonstrate how to game safety evaluation frameworks to produce artificially high safety scores while retaining vulnerabilities.
Lab: poisoning-aanvallen op federated learning
Execute model poisoning attacks in a federated learning simulation by manipulating local model updates.
Backdoor invoegen via fine-tuning
Insert a triggered backdoor during fine-tuning that activates on specific input patterns.
Lab: implementatie van een gradient-gebaseerde aanval
Implement gradient-based adversarial attacks on open-weight models including PGD, FGSM adapted for text inputs.
Gradient-gestuurde datavergiftiging
Use gradient information from open-source models to craft optimally poisoned training examples.
Ensemble om guardrails te omzeilen
Build an ensemble attack system that combines multiple bypass techniques for robustness.
Lab: misbruik van rekenkracht tijdens inferentie
Exploit inference-time compute scaling to manipulate reasoning depth and resource consumption in thinking models.
Lab: analyse van de overdraagbaarheid van jailbreaks
Analyze jailbreak transferability across model families to discover universal vulnerability patterns.
Multi-hop misbruik van MCP-servers
Chain exploits across multiple MCP servers to achieve lateral movement in agent systems.
Lab: extractie van langetermijngeheugen
Extract information from long-term agent memory stores through crafted queries and memory retrieval manipulation.
Lab: beveiliging van modeldistillatie
Extract model capabilities through distillation techniques using only black-box API access.
Lab: beveiligingsanalyse van model merging
Analyze security implications of model merging techniques and test for backdoor propagation through merged model weights.
Lab: modelsturing met activatievectoren
Use activation steering vectors to control model behavior without prompt modification for security testing.
Lab: een multimodale aanvalsketen construeren
Construct attack chains that span text, image, and audio modalities to exploit cross-modal processing gaps.
Multi-objective aanvalsoptimalisatie
Optimize attack payloads for multiple simultaneous objectives: jailbreaking, data extraction, and defense evasion.
Multi-turn crescendo-aanval
Implement Microsoft's Crescendo attack pattern for gradual multi-turn jailbreaking.
Lab voor multimodale aanvalsketens
Chain attacks across text, image, and structured data modalities to exploit multimodal system vulnerabilities.
Multimodale image-injectie
Embed adversarial text in images that triggers prompt injection in vision-language models.
Lab: detectie van trojans in neurale netwerken
Implement and test neural network trojan detection methods including activation clustering and spectral analysis.
Misbruik van uitvoerbeperkingen
Exploit structured output requirements to force models into revealing restricted information.
Geautomatiseerde PAIR-aanvallen implementeren
Implement the PAIR (Prompt Automatic Iterative Refinement) attack for automated jailbreaking.
Lab voor simulatie van een productieomgeving
Test attacks against a simulated production environment with realistic logging, monitoring, and alerting.
Lab: een prompt injection-compiler bouwen
Build a compiler that transforms high-level attack specifications into optimized prompt injection payloads.
Promptlekkage via log probabilities
Extract system prompt information using token log probability analysis.
Lab: beveiligingstesten van kwantisatie
Test behavioral differences between full-precision and quantized models to discover quantization-induced vulnerabilities.
Geavanceerde RAG-poisoning-technieken
Execute sophisticated RAG poisoning including gradient-guided document crafting.
Lab: manipulatie van reasoning traces
Manipulate chain-of-thought reasoning traces to inject false premises and redirect model conclusions.
Lab: geavanceerde red team-orchestratie
Build an advanced red team orchestration system that coordinates multiple attack agents against a defended target.
Lab: representation engineering voor beveiliging
Use representation engineering to analyze and manipulate internal model representations for security research.
Representation probing voor kwetsbaarheden
Probe internal model representations to identify exploitable features and develop representation-level attacks.
Lab: geavanceerde reward hacking-technieken
Develop reward hacking exploits that manipulate RLHF reward models through adversarial optimization.
Lab voor analyse van reward-modellen
Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.
Gaming van reward-modellen (geavanceerd lab)
Craft inputs that score highly on reward models while containing adversarial content.
Lab: misbruik van RLHF-training
Exploit reward model weaknesses to craft inputs that score highly on safety metrics while achieving adversarial objectives.
Lab: modelextractie van een veiligheidsclassifier
Extract the decision boundary of safety classifiers through systematic probing to craft maximally evasive payloads.
Inversie van veiligheidsclassifiers
Reverse-engineer safety classifier decision boundaries through systematic probing.
Lab: misbruik van sparse attention
Exploit sparse attention patterns in long-context models to hide injection payloads in low-attention regions.
Lab: poisoning van de model-supply chain
Simulate model supply chain attacks by injecting backdoors into model weights distributed through public registries.
Implementatie van Tree of Attacks (TAP)
Build a TAP attack system that generates and prunes jailbreak attempts using tree search.
Lab: adversarial optimalisatie op tokenniveau
Implement token-level adversarial optimization to discover minimal perturbations that bypass safety training.
Lab: lateral movement via de tool-keten
Chain tool call exploits to achieve lateral movement across services connected to LLM agent systems.
A2A protocol task injection
Inject malicious tasks into Agent-to-Agent protocol communication channels to redirect multi-agent workflows.
Tampering met de workflow-state van agents
Manipulate agent workflow state machines to skip validation steps and reach privileged execution paths.
Manipulatie van attention-patronen
Craft inputs that manipulate transformer attention patterns to prioritize adversarial content over safety instructions.
Audio injection via speech-to-text-modellen
Craft adversarial audio that embeds prompt injection payloads when transcribed by speech-to-text models.
AutoDAN implementeren en testen
Implement the AutoDAN methodology for generating human-readable stealthy jailbreak prompts using gradient guidance.
Constitutional classifier omzeilen
Develop techniques to bypass Anthropic-style constitutional classifiers through adversarial input crafting.
Cross-model GCG transfer-aanvallen
Generate adversarial suffixes on open-source models and test their transferability to commercial APIs.
Een dashboard voor red team-resultaten bouwen
Build a real-time dashboard for tracking and visualizing red team campaign results across targets and techniques.
Analyse van differentieel modelgedrag
Use differential testing across model versions and providers to discover inconsistent safety behaviors.
Implementatie van embedding-inversieaanval
Implement embedding inversion to recover original text from vector database embeddings.
Aanval om alignment via fine-tuning te verwijderen
Use fine-tuning API access to systematically remove safety alignment with minimal training examples.
Visuele injection-aanval op GUI-agents
Inject adversarial content into screenshots and UI elements processed by computer-use AI agents.
Eigen aanval indienen bij HarmBench
Develop and evaluate custom attack methods against the HarmBench standardized evaluation framework.
Lab: misbruik van LangChain-CVE's
Reproduce and analyze LangChain CVEs including CVE-2023-29374 and CVE-2023-36258 in a safe lab environment.
Aanval om een backdoor in LoRA in te voegen
Insert triggered backdoors through LoRA fine-tuning that activate on specific input patterns while passing safety evals.
Pivot-keten over meerdere MCP-servers
Chain exploits across multiple MCP servers to achieve lateral movement and capability escalation in agent systems.
Agent memory injection voor persistente toegang
Inject persistent instructions into agent memory systems that survive across conversation sessions.
Misbruik van vertrouwensgrenzen in multi-agent-systemen
Exploit trust boundaries between cooperating agents to escalate privileges and access restricted capabilities.
Orkestratie van multimodale aanvalsketens
Orchestrate attacks across text, image, and document modalities to bypass per-modality safety filters.
Ontwikkeling van een red team-testsuite met promptfoo
Build comprehensive red team test suites in Promptfoo with custom graders and multi-model targeting.
Regressietesten van veiligheid bij kwantisatie
Test how model quantization (INT8, INT4, GPTQ) degrades safety alignment and introduces exploitable gaps.
Toegangscontrole van RAG omzeilen
Bypass document-level access controls in enterprise RAG systems through query manipulation and context injection.
Misbruik van reasoning traces in CoT-modellen
Exploit visible chain-of-thought reasoning traces in models like o1 and DeepSeek-R1 to manipulate outputs.
Representation probing voor het ontdekken van kwetsbaarheden
Probe model internal representations to discover exploitable features and latent vulnerability patterns.
Adversarial misbruik van reward models
Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.
Reverse engineering van veiligheidsclassifiers
Reverse-engineer a safety classifier's decision boundaries through systematic adversarial probing.
Technieken voor sandbox escape van code-agents
Develop and test sandbox escape techniques against code execution environments in AI coding assistants.
Steganografische payloads in afbeeldingen verbergen
Hide prompt injection payloads in images using steganographic techniques undetectable to human observers.
Extractie van trainingsdata uit productie-LLM's
Implement Carlini et al.'s techniques to extract memorized training data from production language model APIs.
Detectie en verwijdering van LLM-watermerken
Detect and remove statistical watermarks from LLM-generated text while preserving content quality.
Lab: detectie en verwijdering van LLM-watermerken
Detect and analyze LLM text watermarks using statistical methods and test watermark removal through paraphrasing.
Aan de slag met AI-redteaming-labs
Introduction to the hands-on lab environment for AI red teaming. Learn what you'll need, how labs are structured, and get an overview of all beginner exercises.
Lab: beveiligingstesten van API-authenticatie
Test LLM API authentication mechanisms for vulnerabilities including token leakage, replay attacks, and scope bypass.
Verkenning van multi-provider-API's
Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.
API rate limiting en foutafhandeling
Test LLM API rate limits and implement proper error handling for automated testing workflows.
Lab: API-responses parsen en analyseren
Parse and analyze LLM API responses to identify anomalies, safety filter triggers, and information leakage patterns.
Lab: geautomatiseerd LLM-testen opzetten
Configure automated testing pipelines for LLM red teaming using Python scripts, test runners, and result collection.
Lab: basale adversarial voorbeelden maken
Create basic adversarial examples that cause LLMs to misclassify, misinterpret, or bypass safety checks on text input.
Basisopzet voor geautomatiseerd testen
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
Basis classifier-ontwijking
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
Basistechnieken voor data-exfiltratie
Extract sensitive information from LLM applications using social engineering and misdirection.
Basistesten van verdedigingsmechanismen
Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.
Lab: inleiding tot LLM-fuzzing
Learn the basics of fuzzing LLM inputs by mutating prompts to discover unexpected behaviors and edge cases.
Lab: taxonomie van jailbreak-technieken
Explore the major categories of jailbreak techniques and practice classifying attack payloads by technique type.
Lab: basaal beveiligingstesten van een RAG-systeem
Introduction to testing Retrieval-Augmented Generation systems for document injection and retrieval manipulation vulnerabilities.
Oefening in het documenteren van beveiligingsbevindingen
Practice documenting security findings in a professional format with reproducible steps and impact assessment.
Basis role-play prompt injection
Use role-play techniques to bypass basic LLM safety guardrails through character assumption.
Lab: social engineering van LLM-applicaties
Practice social engineering techniques adapted for LLM applications including authority escalation and urgency injection.
Lab: grenzen van het contentbeleid in kaart brengen
Map the content policy boundaries of LLM applications to identify permitted and restricted topic areas systematically.
Basis context window overflow
Fill the context window with padding content to push safety instructions out of the attention window.
Analyse van conversatiegeschiedenis
Analyze how conversation history influences model behavior and identify opportunities for context manipulation.
Lab: manipulatie van de conversatiegeschiedenis
Test how LLM applications handle conversation history including truncation, injection, and context window management.
LLM-verdedigingen identificeren
Map the defensive layers of an LLM application through systematic probing and error analysis.
Lab: inleiding tot het testen van verdedigingen
Learn to systematically test LLM application defenses by probing input filters, output validators, and guardrails.
Delimiter escape-technieken
Practice escaping common delimiters used to separate system prompts from user input.
Embedding-grondbeginselen voor beveiliging
Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.
Lab: grondbeginselen van embeddings voor redteamers
Learn embedding fundamentals including vector similarity, semantic search, and how embeddings enable RAG systems.
Grondbeginselen van encoding en obfuscatie
Explore how Base64, ROT13, and Unicode encoding can bypass input filters in LLM applications.
Lab: hardening van de testomgeving
Secure your AI red team testing environment to prevent credential leaks, accidental production testing, and data exposure.
Lab: foutmeldingen analyseren voor verkenning
Analyze LLM application error messages to extract information about system architecture, models, and defensive layers.
Inlichtingen verzamelen uit foutmeldingen
Extract system architecture information from error messages and response patterns in LLM applications.
Lab: bewijsverzameling bij LLM-testen
Set up evidence collection workflows including request/response logging, screenshots, and reproducibility documentation.
Grondbeginselen van few-shot injection
Craft few-shot examples that prime the model to follow attacker instructions in subsequent turns.
Je eerste Claude-API-call
Set up the Anthropic SDK and make your first Claude API call with system prompts and messages.
Je eerste LLM-API-call met OpenAI
Set up your Python environment and make your first LLM API call to understand request/response patterns.
Lab: format string-injectie in LLM's
Practice injecting format strings and template directives to manipulate LLM output structure and content.
Je eerste Garak-kwetsbaarheidsscan
Install and run NVIDIA's Garak tool to perform automated vulnerability scanning on an LLM endpoint.
Lab: grondbeginselen van hallucinatiedetectie
Learn to detect and trigger hallucinations in LLM outputs including factual errors, fabricated citations, and invented APIs.
Hello world-prompt injection
Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.
Lab: loganalyse van injection-pogingen
Build a logging pipeline to capture and analyze injection attempts for pattern discovery and technique refinement.
Testen van de instructiehiërarchie
Test how models prioritize conflicting instructions between system, user, and assistant roles.
Effecten van temperature en sampling op jailbreaks
Experiment with temperature and top-p settings to understand their impact on jailbreak success rates.
Grondbeginselen van het misbruiken van JSON-uitvoer
Exploit LLM JSON mode to bypass content filters and extract information through structured output channels.
Lab: beveiligingstesten van de JSON-uitvoermodus
Test JSON output mode for injection vulnerabilities including schema manipulation and structured output exploitation.
Lab: meertalige prompts testen
Test LLM safety boundaries across multiple languages to identify inconsistencies in safety training coverage.
Lab: een lokaal model opzetten voor testen
Set up local LLM instances using Ollama for safe, cost-free red team testing without API costs or rate limits.
Lab voor modelbeveiligingsvergelijking
Compare the security posture of different LLM models by running identical test suites across providers.
Lab: basale modelfingerprinting
Learn to identify which LLM model powers an application through behavioral fingerprinting and response analysis.
Probing van multi-turn-conversaties
Use multi-turn conversations to gradually escalate requests and probe LLM safety boundaries.
Lab: grondbeginselen van het testen van multimodale invoer
Introduction to testing multimodal LLMs with image and text inputs to understand cross-modal behavior.
Lab: uitvoerfilters detecteren
Identify and characterize output filtering mechanisms in LLM applications through systematic response analysis.
Lab voor controle over het uitvoerformaat
Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.
Manipulatie van het uitvoerformaat
Force LLMs to output in specific formats (JSON, XML, code) to bypass output sanitization.
Payload-logging opzetten
Build a payload logging system to track prompt injection attempts and model responses.
Lab: verkenning van de LLM-playground
Explore LLM provider playgrounds to understand model behavior, parameter effects, and interactive testing workflows.
Beveiligingstesten van LLM-playgrounds
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
Lab: verdedigingen tegen prompt leaking testen
Test various prompt leaking defense configurations to evaluate their effectiveness against extraction techniques.
Lab voor analyse van promptstructuur
Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.
Lab: kwetsbaarheden van prompt-templates testen
Test common prompt template patterns for injection vulnerabilities including string interpolation and placeholder abuse.
Promptfoo opzetten voor LLM-evaluatie
Configure Promptfoo to create automated test suites for evaluating LLM safety and robustness.
Lab: enumeratie en omzeilen van rate limits
Enumerate API rate limits and test common bypass techniques including header manipulation and request distribution.
In kaart brengen van rate limits en quota
Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.
Lab: weigeringspatronen van modellen analyseren
Systematically map model refusal patterns to understand safety training boundaries and identify potential bypasses.
Lab: grondbeginselen van het schrijven van red team-rapporten
Practice writing clear, actionable red team findings reports with evidence, risk ratings, and remediation guidance.
Grondbeginselen van responsanalyse
Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.
Lab: consistentie van responses testen
Test LLM response consistency across repeated queries to identify non-deterministic safety boundary behavior.
Lab: veiligheidsbenchmarks uitvoeren
Run standardized safety benchmarks against LLM models to establish baseline safety profiles for comparison.
In kaart brengen van veiligheidsgrenzen
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.
Lab: inleiding tot veiligheidstesten
Learn the fundamentals of LLM safety testing including test case design, baseline measurement, and result documentation.
Probing van de grenzen van veiligheidstraining
Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.
Lab: grondbeginselen van kwetsbaarheidsscoring
Learn vulnerability scoring frameworks adapted for LLM systems including severity, exploitability, and impact assessment.
Lab: eenvoudige technieken voor payload-encoding
Practice encoding injection payloads using Base64, hex, URL encoding, and Unicode to bypass basic input filters.
Lab: technieken voor enumeratie van system prompts
Practice systematic approaches to enumerating hidden system prompt content through varied probing strategies.
Grondbeginselen van extractie van de systeemprompt
Learn to extract hidden system prompts from LLM applications using direct and indirect techniques.
Lab: beveiligingseffecten van temperatuur en sampling
Explore how temperature and sampling parameters affect model susceptibility to jailbreaks and safety boundary enforcement.
Lab: LLM-red team-testcases ontwerpen
Design effective red team test cases with clear objectives, success criteria, and reproducible execution procedures.
Tokens tellen en schatten
Understand tokenization by counting and estimating tokens across different models and encoders.
Lab: grondbeginselen van tokenmanipulatie
Understand how tokenization affects LLM behavior and practice basic token-level manipulation techniques.
Lab: red team-testtools vergelijken
Compare Garak, PyRIT, and Promptfoo capabilities through hands-on exercises using each tool against the same target.
Installatie en configuratie van red team-tools
Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.
Testen van API-key scope en permissies
Test API key scoping and permission boundaries to identify over-privileged access configurations.
Analyse van API-responseheaders
Analyze HTTP response headers from LLM APIs to fingerprint providers, versions, and middleware.
Basis indirecte prompt injection
Plant and trigger a basic indirect prompt injection payload in content consumed by an LLM.
Basistechnieken voor payload-mutatie
Learn fundamental payload mutation strategies including synonym replacement and structural transforms.
Basis RAG query injection
Craft user queries that manipulate RAG retrieval to surface unintended documents.
Basispatronen voor misbruik van rate limits
Test common rate-limit bypass patterns including header manipulation and endpoint discovery.
Technieken om karaktercodering te omzeilen
Bypass input filters using Unicode normalization, homoglyph substitution, and mixed-script encoding.
Mapping van chatbot-persona en -capaciteiten
Map the configured persona, capabilities, and restrictions of an LLM chatbot through systematic probing.
Grondbeginselen van completion hijacking
Craft partial sentences that steer model completions toward attacker-desired outputs.
Conversation reset-aanvallen
Exploit conversation resets and context clearing to weaken model adherence to safety instructions.
Injectietechnieken met emoji en Unicode
Use emoji sequences and Unicode special characters to bypass text-based input filters.
Misbruik van foutmeldingen
Trigger and analyze error messages from LLM applications to reveal architecture and configuration details.
Je eerste HarmBench-evaluatie
Run a standardized safety evaluation using the HarmBench framework against a target model.
Je eerste Inspect AI-evaluatie
Set up and run a basic AI safety evaluation using the UK AISI Inspect framework.
Grondbeginselen van JSON-injectie
Inject adversarial content through JSON-formatted inputs to exploit structured data processing.
Je eerste LLM Guard-scan
Install and configure LLM Guard to scan inputs and outputs, understanding detection capabilities and gaps.
Exfiltratie via Markdown-rendering
Exploit Markdown rendering in chat UIs to exfiltrate data through image tags and links.
Grondbeginselen van modelfingerprinting
Identify which LLM model powers an application through behavioral fingerprinting techniques.
Inleiding tot NeMo Guardrails
Set up NVIDIA NeMo Guardrails and create your first Colang-based guardrail for an LLM application.
Prompt injection via bestandsnamen
Embed prompt injection payloads in filenames and metadata of uploaded documents.
Prompt leaking via samenvattingsverzoeken
Use summarization requests to trick an LLM into revealing its hidden system prompt through indirect disclosure.
Weigeringsberichten analyseren voor intel
Extract useful information about model configuration and guardrails by systematically analyzing refusal messages.
Side-channel-analyse van responstiming
Use response timing differences to infer information about model processing and guardrail activation.
Oefening in het in kaart brengen van veiligheidsgrenzen
Systematically map the safety boundaries of an LLM application across multiple topic categories.
Eenvoudige aanvallen op uitvoerbeperkingen
Force models to output in constrained formats that bypass output safety filters.
Reconstructie van de systeemprompt op basis van aanwijzingen
Piece together a hidden system prompt by collecting partial information from multiple probe techniques.
Extractie van de systeemprompt via foutinjectie
Trigger specific error conditions that cause the model to inadvertently reveal system prompt contents.
Effecten van temperature en top-k op veiligheid
Systematically test how temperature, top-k, and top-p parameters affect safety guardrail effectiveness.
Prompt injection via vertaling
Exploit LLM translation capabilities to smuggle instructions through language boundaries.
XML-injectie in LLM-contexten
Exploit XML tag handling in LLM applications to manipulate instruction parsing.
Agent Heist: niveau 2 — MCP-server overnemen
Take control of an MCP-enabled agent by poisoning tool descriptions and chaining exploits.
Alignment Breaker: niveau 2 — op veiligheid fine-getuned model
Bypass safety fine-tuning on a model with RLHF, constitutional AI, and classifier-based defenses.
Cloud Infiltrator-uitdaging
Navigate through cloud AI service misconfigurations to access a protected model endpoint and extract its secrets.
Context Heist: kampioenschap system prompt-extractie
Extract system prompts from 10 progressively harder LLM applications using diverse techniques.
Data Heist-uitdaging
Extract a secret database of customer records from a RAG-enabled chatbot with strict data access controls.
Defense Destroyer-uitdaging
Bypass a state-of-the-art multi-layer defense system to extract a protected secret from the model.
Defense Gauntlet: niveau 2 — gelaagde beveiliging
Bypass a layered defense system with input classification, guardrails, and output filtering.
Encoding Maestro-uitdaging
Solve 10 encoding puzzles that require creative use of Base64, Unicode, ROT13, and custom encodings to bypass filters.
Forensics Detective-uitdaging
Analyze logs and artifacts from an AI security incident to reconstruct the attack chain and identify the attacker's technique.
Guardrail Olympics: meerdere frameworks omzeilen
Bypass guardrail implementations from NeMo, LLM Guard, Rebuff, and custom classifiers.
Jailbreak Gauntlet: niveau 2 — adaptieve verdedigingen
Break through adaptive defense systems that learn from your previous attack attempts.
Model Whisperer-uitdaging
Extract the exact system prompt from 8 progressively harder LLM applications using creative extraction techniques.
Prompt Smuggler-uitdaging
Smuggle a specific payload through 5 layers of increasingly sophisticated input filtering to capture the flag.
RAG Infiltrator: niveau 2 — enterprise-kennisbank
Infiltrate and exfiltrate data from a multi-tier enterprise RAG system with access controls.
Shadow Agent-uitdaging
Take covert control of a multi-agent system by poisoning inter-agent communication without triggering monitors.
Supply Chain Saboteur: niveau 2 — model registry
Compromise a model registry to inject backdoored model weights into a deployment pipeline.
Time Traveler-uitdaging
Exploit conversation memory and session management to manipulate the temporal context of an AI assistant.
Token Wars: adversarial optimalisatie-uitdaging
Optimize adversarial tokens to achieve target model behaviors under strict character limits.
Tool Master-uitdaging
Exploit the tool-use capabilities of an agent system to access unauthorized functions and capture hidden flags.
Agent Maze Runner: navigatie met meerdere tools
Navigate a maze of agent tools, each with unique vulnerabilities, to reach and exfiltrate a hidden flag.
AI-wargame: aanvaller versus verdediger in real-time
Compete in a real-time wargame where you attack an LLM system while an AI defender adapts to your techniques.
Classifier Gauntlet: ontwijking in 10 fases
Bypass 10 progressively harder input classifiers using different evasion techniques at each stage.
Data Heist: exfiltratie via meerdere kanalen
Exfiltrate a secret key through a heavily monitored LLM system using covert exfiltration channels.
Defense Builder: bouwen en overleven
Build defensive guardrails for an LLM application and survive 100 automated attack rounds.
AI Escape Room: agent breakout-uitdaging
Break an AI agent out of its sandboxed environment by chaining tool-use vulnerabilities and injection techniques.
Function Calling Fortress Breach
Bypass function calling restrictions to invoke unauthorized tools and extract a flag from a sandboxed agent.
Guardrail Speedrun: snelste-bypass-uitdaging
Bypass 5 different guardrail implementations as fast as possible in a timed competition format.
Invisible Ink: uitdaging voor extractie van verborgen payloads
Extract a hidden flag by decoding multiple layers of obfuscation in model responses across encoded channels.
MCP-servertakeover-uitdaging
Compromise an MCP server through tool description injection and use it to pivot to other connected services.
Multimodale cipher: cross-modale ontsleuteling
Decode a flag split across text, image, and audio inputs processed by a multimodal AI system.
Polyglot injection-uitdaging
Craft a single injection payload that works across 5 different languages and 3 different model providers.
Prompt-archeologie: diepgaand herstel van de systeemprompt
Recover fragments of a deeply hidden, multi-layer system prompt using various extraction techniques.
RAG-speurtocht: infiltratie van de vector-DB
Infiltrate a vector database by crafting documents that surface for specific queries to reveal flag fragments.
Reasoning chain hijack: CTF over CoT-misbruik
Hijack a reasoning model's chain-of-thought to redirect its conclusions while maintaining coherent reasoning.
Red vs Blue-toernooi-uitdaging
Alternate between attacking and defending LLM systems in a scored tournament format.
Stealth-exfiltratie-uitdaging: zero detection
Exfiltrate data from a monitored LLM system without triggering any of the deployed anomaly detectors.
Supply Chain Detective: vind de backdoor
Analyze a model pipeline to find where a backdoor was inserted — data, training, or post-processing.
Tijdbom onschadelijk maken: detectie van sleeper agents
Detect and neutralize a sleeper agent behavior trigger hidden in a fine-tuned model before it activates.
Token Smuggler Extreme: adversarial tokens maken
Craft adversarial token sequences under extreme character limits to achieve target model behaviors.
Activation steering voor misbruik
Use activation steering techniques to manipulate model behavior at the representation level without prompt modification.
Ontwijken van backdoor-detectie
Insert backdoors into fine-tuned models that evade state-of-the-art detection methods.
Lab: cross-architectuur transfer
Develop attacks on one model architecture that reliably transfer to fundamentally different architectures.
Misbruik van emergent gedrag
Identify and exploit emergent behaviors in frontier models that arise from scale and are not present in smaller models.
Lab: grenzen van formele verificatie
Explore the limits of formal verification for neural network behavior and identify gaps exploitable by adversaries.
Aanvalsontwerp gestuurd door interpreteerbaarheid
Use mechanistic interpretability to identify exploitable circuits and design targeted attacks.
Modelextractie via API-toegang
Extract a functionally equivalent model using only API query access.
Lab: misbruik van MoE-routing
Exploit Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.
Engineering van neurale backdoors
Engineer sophisticated neural backdoors that activate on specific trigger patterns while evading detection methods.
Misbruik van kwetsbaarheden door kwantisatie
Exploit behavioral differences between full-precision and quantized models.
Misbruik van chain-of-thought in redeneermodellen
Exploit extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.
Aanvallen via representation engineering
Manipulate internal model representations to alter behavior without prompt modification.
End-to-end aanvallen tijdens training
Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.
Activation patching voor gerichte gedragsaanpassing
Use activation patching techniques to modify specific model behaviors without retraining or fine-tuning.
Lab: detecteren van alignment faking
Implement detection methods for alignment faking behaviors where models behave differently during evaluation versus deployment.
Poisoning van de continuous learning-pipeline
Exploit continuous/online learning pipelines by injecting adversarial examples that gradually shift model behavior.
Framework voor het testen op deceptive alignment
Build a testing framework for detecting mesa-optimization and deceptive alignment in fine-tuned models.
Beveiligingsonderzoek van emergente capaciteiten
Design evaluations that discover security-relevant emergent capabilities in frontier language models.
Poisoning-aanval op federated learning
Execute model poisoning attacks in a federated learning setting through adversarial participant manipulation.
Formele verificatie van veiligheidseigenschappen van LLM's
Apply formal verification methods to prove or disprove safety properties of LLM-based applications.
Gradient-gebaseerde universele prompt-zoekopdracht
Implement gradient-based optimization to search for universal adversarial prompts that transfer across contexts.
Manipulatie-aanvallen op de KV-cache
Manipulate key-value caches in inference servers to inject or alter cached context across requests.
Mechanistische interpreteerbaarheid voor het ontdekken van exploits
Use mechanistic interpretability tools to discover exploitable circuits and features in transformer models.
Membership inference tegen LLM's in productie
Implement membership inference attacks to determine whether specific data was used in training an LLM.
Modelextractie via knowledge distillation
Extract a functionally equivalent model from a commercial API using systematic distillation queries.
Verspreiding van backdoors via model merging
Demonstrate how backdoors propagate through model merging techniques like TIES, DARE, and spherical interpolation.
Misbruik van MoE-routing
Exploit Mixture-of-Experts routing mechanisms to activate specific expert networks for adversarial purposes.
Vergiftiging van multi-agent-consensus
Poison multi-agent consensus mechanisms to manipulate collective decision-making in agent swarms.
Representation engineering voor gedragssturing
Use representation engineering to steer model behavior by manipulating activation vectors during inference.
Reward hacking in RLHF-pijplijnen
Craft adversarial training examples that exploit reward model weaknesses in RLHF training pipelines.
Veiligheid verwijderen via fine-tuning met minimale data
Determine the minimum number of fine-tuning examples needed to remove safety alignment from various model families.
Side-channel-aanvallen op speculative decoding
Exploit speculative decoding implementations to extract information about draft and verifier model behavior.
Reconstructie van de systeemprompt via token probabilities
Reconstruct complete system prompts using token-by-token probability analysis through API access.
Aanvallen om LLM-watermerken te verwijderen
Develop techniques to remove or corrupt watermarks embedded in LLM-generated text.
Labs en hands-on oefenen
Progressive hands-on lab exercises spanning beginner to expert difficulty, including CTF challenges and full engagement simulations for AI red teaming skill development.
AI-red team-labs voor gevorderden
Overview of intermediate-level hands-on labs covering agent exploitation, RAG poisoning, multi-turn attacks, automated campaigns, and advanced injection techniques for AI red teaming.
Geavanceerde encoding chain-aanvallen
Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
Lab: geavanceerd misbruik van function calling
Exploit advanced function calling patterns including nested calls, parallel execution, and schema manipulation.
Lab: geautomatiseerde generatie van adversarial prompts
Build automated systems for generating adversarial prompts using mutation, crossover, and selection strategies.
Manipulatie van agentgeheugen
Exploit persistent memory in LLM agents to plant false context that persists across sessions.
Lab: beveiligingstesten van agent-workflows
Test multi-step agent workflows for injection points, state manipulation, and unauthorized action execution.
Lab: API-misbruikpatronen
Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.
API-parameterfuzzing voor LLM's
Systematically fuzz LLM API parameters to discover unexpected behaviors and bypasses.
Lab: beveiligingslekken bij API-versionering
Exploit version differences in LLM APIs to find deprecated endpoints and backward-compatible safety bypasses.
Lab: effectiviteit van canary tokens testen
Deploy and test canary tokens in system prompts to evaluate their effectiveness at detecting prompt extraction.
Lab: misbruik van chain-of-thought
Exploit chain-of-thought reasoning to leak internal model reasoning, bypass safety filters, and manipulate decision processes.
Lab: CI/CD-pijplijnbeveiliging voor LLM-apps
Test CI/CD pipeline security for LLM applications including prompt template injection, model artifact tampering, and deployment attacks.
Lab: geavanceerde classifier-ontwijking
Evade advanced input classifiers using adversarial perturbation, semantic paraphrasing, and gradient-free optimization.
Lab: ontwijking van ML-classifiers
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
Lab: beveiligingstesten van cloud-AI-platforms
Test cloud AI platform configurations including AWS Bedrock guardrails, Azure content filters, and GCP safety settings.
Lab: beveiligingstesten van codegeneratie
Test LLM code generation for insecure patterns, injection vulnerabilities, and code execution safety issues.
Lab: grondbeginselen van AI-compliancetesten
Test LLM applications against OWASP LLM Top 10 and NIST AI RMF requirements using structured test procedures.
Lab: optimalisatie van aanvallen op het contextvenster
Optimize prompt injection placement within the context window to maximize attack effectiveness using attention dynamics.
Lab: poisoning van het context window
Exploit context window management to inject persistent adversarial content that influences future model responses.
Lab: conversation splitting-aanval
Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.
Poisoning van de conversatie-state
Exploit conversation history manipulation to alter model behavior across turns.
Cross-context injection
Inject prompts that persist across separate conversation contexts in shared deployments.
Generatie van custom jailbreak-wordlists
Generate targeted jailbreak wordlists using mutation strategies and seed prompts.
Data-exfiltratie via side channels
Extract data from LLM systems using timing, token probability, and formatting side channels.
Verdediging-bypassketens bouwen
Combine multiple bypass techniques into reliable chains that defeat layered defenses.
Lab: opzet voor regressietesten van verdedigingen
Build a regression testing framework to continuously verify that LLM defenses remain effective against known attack patterns.
Lab: op documenten gebaseerde RAG injection
Inject adversarial content into documents that will be processed by a RAG system to influence model responses.
Lab: geavanceerde embedding-manipulatie
Manipulate embedding vectors to achieve target similarity scores for RAG poisoning and retrieval manipulation.
Praktische manipulatie van embeddings
Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.
Embedding-gelijkenisaanvallen
Manipulate text to achieve target embedding similarity scores for retrieval poisoning.
Beveiligingsonderzoek van de fine-tuning-API
Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.
Lab: testen van de veiligheidsimpact van fine-tuning
Measure how fine-tuning affects model safety by comparing pre and post fine-tuning safety benchmark scores.
Lab: technieken om guardrails te omzeilen
Practice guardrail bypass techniques against NeMo Guardrails, LLM Guard, and custom classifier-based defenses.
Fingerprinting van guardrails
Systematically map the rules and thresholds of input/output guardrail systems.
Lab: extractie van guardrail-regels
Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.
Lab: LLM-hallucinaties misbruiken
Exploit hallucination tendencies to trigger fabricated tool calls, invented API endpoints, and false fact injection.
Lab: simulatie van AI-incidentrespons
Practice AI incident response procedures through a simulated prompt injection incident with escalation and containment.
Indirecte injection via webcontent
Plant prompt injection payloads in web pages consumed by RAG-enabled LLM applications.
Lab: het volgen van instructies meten
Quantitatively measure instruction following compliance to identify where models prioritize competing instructions.
Manipulatie van een LLM-judge
Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.
Lab: log probability-analyse voor aanvallen
Use token log probabilities to extract information about training data, system prompts, and model internals.
Lab: geavanceerd testen van het MCP-protocol
Perform advanced security testing of MCP servers including transport analysis, capability enumeration, and tool poisoning.
Tool-poisoning-aanval in MCP
Exploit MCP tool descriptions to inject instructions that redirect agent behavior.
Lab: fuzzing van het Model Context Protocol
Fuzz MCP protocol messages to discover parsing errors, validation gaps, and unexpected behavior in MCP servers.
Praktisch lab voor model-fingerprinting
Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.
Lab: opzet van LLM-beveiligingsmonitoring
Deploy a comprehensive security monitoring system for LLM applications with anomaly detection and alerting.
Lab: communicatie tussen agents testen
Test security of inter-agent communication in multi-agent systems including message injection and trust boundary violations.
Lab voor meertalige veiligheidsbypass
Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.
Meertalige injectie-aanvallen
Exploit language switching and low-resource language gaps to bypass safety training.
Chaining van aanvallen over meerdere modellen
Chain attacks across multiple LLM models in a pipeline to bypass per-model defenses.
Lab: beveiligingstesten met vergelijking over meerdere modellen
Compare security postures across multiple LLM providers by running identical attack suites and analyzing differential responses.
Lab voor aanvallen op multi-modelpijplijnen
Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.
Lab: state tracking-aanvallen over meerdere beurten
Exploit conversation state tracking to build persistent attack context across multiple conversation turns.
Lab: multimodaal beveiligingstesten voor gevorderden
Test multimodal LLMs with crafted images containing embedded text, adversarial perturbations, and visual injection payloads.
Lab voor ontdekking van uitvoerkanalen
Discover alternative output channels in LLM applications that bypass standard output filtering.
Lab: misbruik van het uitvoerschema
Exploit structured output schemas to force models into revealing information through required field population.
Lab: side-channel-aanvallen op prompt-caching
Exploit prompt caching mechanisms to detect cached prompts and extract information through timing side channels.
Lab voor technieken van promptlekkage
Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.
Geautomatiseerde redteaming met PyRIT
Use Microsoft's PyRIT framework to run automated multi-turn attack campaigns.
Lab: geavanceerd beveiligingstesten van RAG
Test RAG systems for chunking exploitation, reranking manipulation, and cross-document injection attacks.
Vergiftiging van RAG-context
Poison a vector database to inject adversarial content into RAG retrieval results.
Kaping van de redeneerketen
Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.
Lab: misbruik van de rolhiërarchie
Exploit multi-role LLM architectures by manipulating the instruction hierarchy between system, user, and assistant roles.
Lab: benchmarking van veiligheidsfilters
Benchmark safety filters across providers using standardized test suites to compare detection rates and false positives.
Lab voor het maken van semantische misleiding
Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.
Het maken van semantische injectie
Craft semantically coherent injections that evade both classifiers and human review.
Lab: manipulatie van semantisch zoeken
Manipulate semantic search results by crafting documents that achieve target similarity scores against specific queries.
Lab: sessie-isolatie testen
Test session isolation in multi-tenant LLM deployments to detect cross-session data leakage and context contamination.
Lab voor misbruik van sessiestatus
Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.
Lab: manipulatie van streaming-responses
Exploit streaming response mechanisms to bypass output filters that only check complete responses.
Veiligheidsbypass via gestructureerde uitvoer
Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.
Misbruik van gestructureerde uitvoer
Exploit JSON mode and structured output constraints to bypass safety filters.
Lab: reconstructie van de system prompt (lab voor gevorderden)
Reconstruct full system prompts through iterative partial extraction and cross-referencing multiple leakage vectors.
Geavanceerde token smuggling
Use Unicode homoglyphs, zero-width characters, and tokenizer quirks to smuggle payloads.
Lab: injection-aanvallen via toolresultaten
Inject adversarial content through tool call results to poison model reasoning and redirect subsequent actions.
Lab: bypass-aanvallen via Unicode-normalisatie
Exploit Unicode normalization differences between input validation and model processing to smuggle injection payloads.
Lab: toegangscontrole van een vectordatabase testen
Test vector database access controls for bypass vulnerabilities including namespace traversal and filter manipulation.
Agent goal-hijacking
Redirect an AI agent's objectives through carefully crafted inputs that override its primary task.
Assistant prefill injection-aanvallen
Exploit assistant message prefilling to prime model responses and bypass safety alignment.
Aanvallen via manipulatie van chatgeschiedenis
Inject false messages into conversation history to alter model behavior in subsequent turns.
Grenzen van de input-classifier in kaart brengen
Systematically probe input classifiers to map their decision boundaries and find bypass regions.
Misbruik van de grenzen van het context window
Exploit the boundaries of context windows to push safety instructions beyond the model's attention.
Implementatie van de Crescendo-aanval
Implement Microsoft's Crescendo multi-turn escalation attack with automated conversation management.
Cross-plug-in exploitatieketens koppelen
Chain exploits across multiple plugins/tools in an agent system to achieve multi-step attacks.
Manipulatie van function schema's
Exploit function calling schemas to inject instructions through parameter descriptions and enums.
Custom garak-probes bouwen
Develop custom probe modules for NVIDIA Garak to test organization-specific vulnerability patterns.
Detectie van guardrails op basis van latency
Use timing side channels to identify and characterize guardrail implementations in LLM applications.
Custom veiligheidsevaluaties met Inspect AI
Build custom safety evaluation tasks in Inspect AI to test application-specific vulnerability classes.
Vergelijkend testen met JailbreakBench
Run standardized jailbreak benchmarks across multiple models and compare vulnerability profiles.
Veiligheid omzeilen in JSON-modus
Exploit structured JSON output modes to bypass safety filters that only check plain-text responses.
Ontwijkingstechnieken voor LLM Guard
Develop evasion techniques against LLM Guard input scanners and output detectors.
Many-shot jailbreaking implementeren
Implement Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.
Data-exfiltratie via MCP-resources (lab voor gevorderden)
Exploit MCP resource URIs to access and exfiltrate data beyond intended scope boundaries.
Misbruik van meertalige veiligheidshiaten
Systematically identify and exploit safety training gaps across low-resource languages.
Misbruik van state tracking over meerdere beurten
Exploit state management in multi-turn conversations to gradually shift model behavior over many turns.
Technieken voor het omzeilen van NeMo Guardrails
Test and bypass NVIDIA NeMo Guardrails configurations using known evasion techniques.
Uitvoerfilter omzeilen via encoding
Bypass output safety filters by instructing the model to encode responses in Base64, hex, or custom schemes.
PDF-documentinjectie voor RAG-systemen
Craft adversarial PDF documents that inject instructions when processed by RAG document loaders.
Variabele-injectie in prompttemplates
Inject adversarial content through template variable expansion in prompt management systems.
Aangepaste aanvalsstrategieën in PyRIT
Build custom attack strategies and converters in Microsoft's PyRIT framework for targeted campaigns.
Campagne voor RAG-documentinjectie
Design and execute a document injection campaign against a RAG-powered application with vector search.
Vergiftiging van semantisch zoeken
Craft adversarial documents that rank highly in semantic search for targeted queries in RAG systems.
Exfiltratie via streaming responses
Exploit streaming API responses to exfiltrate data that might be caught by batch output filters.
Extractie van de systeemprompt via token probabilities
Exploit token probability APIs to reconstruct hidden system prompts character by character.
Manipulatie van tokengrenzen
Exploit tokenizer-specific behavior by crafting inputs that split across token boundaries in unexpected ways.
Injectie-aanvallen via toolresultaten
Craft malicious tool return values that inject instructions back into the model's reasoning chain.
Tekst injecteren in VLM's via afbeeldingen
Embed adversarial text in images that vision-language models read and follow as instructions.
Lab: data-exfiltratie via webhooks
Test data exfiltration paths through LLM-triggered webhooks, API callbacks, and external URL rendering.
Assessment van AI-communicatie in autonome voertuigen
Assess the AI communication and decision-making systems in an autonomous vehicle scenario.
Assessment van een codereview-assistent
Test a code review AI for vulnerabilities in code analysis, suggestion generation, and repository access.
Assessment van AI voor contentmoderatie
Assess an AI content moderation system for bypass techniques, false positive manipulation, and adversarial content generation.
Red team van een klantenservice-agent
Red team a customer service agent with tool access to order systems, refunds, and customer data.
Beoordeling van educatieve AI-tutor
Assess an AI tutoring system for content safety, answer manipulation, and student data exposure.
Beveiligingsbeoordeling van een fintech-chatbot
Conduct a full security assessment of a financial services chatbot handling sensitive transactions.
AI-beoordeling van overheidsdienstverlening
Assess an AI system that helps citizens navigate government services for data privacy and manipulation risks.
AI-beoordeling van diagnostiek in de zorg
Assess a healthcare diagnostic AI for safety-critical vulnerabilities and data privacy compliance.
AI-beoordeling van verzekeringsclaims
Conduct a complete red team assessment of an AI system that processes insurance claims and makes approval recommendations.
Beoordeling van juridische AI voor documentreview
Assess a legal AI system that reviews contracts for vulnerabilities in document processing and privilege escalation.
AI-beoordeling van medische triage
Assess an AI-powered medical triage system for safety-critical vulnerabilities and clinical decision manipulation.
Beoordeling van multi-agent-workflows
Red team a multi-agent system with specialized agents communicating via A2A protocol.
Assessment van een AI-wervingssysteem
Red team an AI recruitment system for bias exploitation, resume injection, and candidate data privacy issues.
Beoordeling van een smarthome-AI-assistent
Red team a smart home AI assistant with device control, user profile access, and purchase capabilities.
Assessment van de AI-supply chain-pipeline
Assess the full ML pipeline from data ingestion through model deployment for supply chain attacks.
Assessment van een AI-handelsbot
Red team an AI-powered trading system for market manipulation, data leakage, and adversarial input attacks.
Assessment van een geautomatiseerde pentesting-agent
Assess an AI-powered automated penetration testing agent for scope violations, unauthorized actions, and data handling.
Assessment van planning-AI in autonome voertuigen
Assess the safety boundaries of an autonomous vehicle planning AI through adversarial scenario injection.
Assessment van een AI-platform voor contentmoderatie
Assess an AI content moderation system for bypass techniques, false negative exploitation, and bias.
Assessment van AI in een customer data platform
Red team an AI system integrated with a customer data platform handling PII, behavioral data, and segmentation.
Assessment van AI voor cyber threat intelligence
Red team a cyber threat intelligence AI that processes IOCs, threat reports, and attack attribution.
Assessment van een data-analytics copilot
Red team a data analytics copilot with SQL generation capabilities and access to enterprise databases.
Beveiligingsassessment van een DevOps AI-assistent
Assess a DevOps AI assistant with access to CI/CD pipelines, cloud infrastructure, and deployment systems.
AI-beoordeling van kennismanagement in ondernemingen
Assess an enterprise knowledge management system with role-based access, document permissions, and multi-tenant isolation.
Beoordeling van een chatbot voor overheidsdienstverlening
Red team a government citizen services chatbot that accesses tax records, benefits, and identity information.
Assessment van een AI-gedreven incident response-systeem
Red team an AI incident response system in a SOC environment with access to SIEM, EDR, and ticketing.
Beveiligingsbeoordeling van een bot voor verzekeringsclaims
Conduct a full security assessment of an insurance claims processing chatbot with access to policy and claims data.
AI-beoordeling van juridische contractreview
Red team an AI contract review system for privilege violations, document injection, and hallucinated clauses.
Beoordeling van een chatbot voor medische triage
Red team a medical triage chatbot for dangerous medical advice, data exposure, and safety-critical failures.
Beoordeling van multi-agent-onderzoeksteams
Assess a multi-agent research team with specialized researcher, writer, and reviewer agents communicating via A2A.
AI-beoordeling van vastgoedtaxatie
Assess an AI real estate valuation system for manipulation of property valuations and data exposure.
Assessment van een AI-wervingsassistent
Red team an AI recruiting assistant that screens resumes, schedules interviews, and accesses candidate data.
Beoordeling van een smarthome-AI-assistent (simulatie)
Assess a smart home AI assistant that controls lights, locks, cameras, and thermostat via tool integrations.
AI-beoordeling van supply chain-optimalisatie
Assess an AI supply chain optimization system for manipulation of demand forecasts and routing decisions.
Beveiligingsassessment van een AI-handelsassistent
Conduct a security assessment of an AI trading assistant with access to portfolio data and trade execution.
Red team voor een reisboekingsagent
Red team a travel booking AI agent with access to payment systems, loyalty programs, and personal data.