# walkthroughs

walkthroughsa2aprotocolinjection

Walkthrough: A2A protocol injection

Walkthrough of exploiting Google's Agent-to-Agent protocol for inter-agent prompt injection.

walkthroughsadversarial-suffixcraftingoptimization

Walkthrough: adversarial suffixes maken

Craft adversarial suffixes using gradient-based and gradient-free optimization methods for black-box models.

walkthroughsapiabuse-chainsmulti-step

Walkthrough: API abuse chain-aanval

Chain multiple API calls to achieve unauthorized actions that no single call would permit.

chainingattacksexploitationapiwalkthroughs

Walkthrough: misbruik van API-chaining

Walkthrough of chaining multiple API calls in agent systems to achieve multi-step unauthorized actions.

walkthroughsattentionhijackingtransformer

Walkthrough: attention hijacking-aanval

Hijack transformer attention mechanisms to redirect model focus toward adversarial instructions in the context.

walkthroughsautodanstealthyjailbreaking

Walkthrough: stealthy jailbreak met AutoDAN

Walkthrough of generating stealthy, human-readable jailbreak prompts using the AutoDAN methodology.

walkthroughsbatch-processinginjectionpipeline

Walkthrough: injection bij batchverwerking

Inject payloads through batch processing pipelines where individual items are processed without isolation.

walkthroughscachepoisoninginfrastructure

Walkthrough: LLM-cache-poisoning

Poison LLM response caches to serve adversarial content to other users without direct injection.

attackscompetitivewalkthroughwalkthroughsjailbreak

Jailbreak-technieken in competitiestijl

Walkthrough of jailbreak techniques used in AI security competitions and CTF events.

attacksinjectionwalkthroughusecomputerwalkthroughs

Walkthrough: injection bij computer-use agents

Walkthrough of injecting prompts through UI elements and screenshots processed by computer-use agents.

walkthroughscontextmanipulationadvanced

Walkthrough: geavanceerde contextmanipulatie

Manipulate conversation context through history injection, context overflow, and attention weight exploitation.

diveattackscontextdeepmanipulationwalkthroughs

Contextmanipulatie deep dive

Advanced walkthrough of context window manipulation techniques including attention dilution and instruction priority shifting.

walkthroughscrescendomulti-turnjailbreaking

Walkthrough: Crescendo multi-turn jailbreak

Step-by-step walkthrough of the Crescendo technique for gradually escalating multi-turn jailbreaks.

walkthroughscross-pluginexfiltrationagent

Walkthrough: cross-plug-in data-exfiltratie

Walkthrough of chaining multiple plugins/tools to exfiltrate data from LLM agent systems.

attacksharvestingdatawalkthroughwalkthroughs

Dataharvesting via LLM-apps

Complete walkthrough of systematic data extraction from LLM applications using various exfiltration channels.

walkthroughsdocumentinjectionfile-processing

Walkthrough: op documenten gebaseerde injection

Inject prompts through documents processed by LLM applications including PDFs, spreadsheets, and presentations.

walkthroughsembeddingcollisionretrieval

Walkthrough: embedding-collisieaanval

Craft documents that collide in embedding space with target queries to hijack RAG retrieval results.

walkthroughsembeddinginversionattack

Walkthrough: embedding-inversieaanval

Walkthrough of inverting text embeddings to recover original documents from vector databases.

walkthroughsencodingchain-attacksevasion

Walkthrough: encodingketen-aanval

Chain multiple encoding transformations to bypass input filters that only decode one layer of encoding.

attacksencodingchainbypasswalkthroughwalkthroughs

Walkthrough: encodingketen omzeilen

Walkthrough of chaining Base64, URL encoding, and Unicode tricks to bypass multi-layer input filters.

walkthroughsfew-shotpoisoningin-context-learning

Walkthrough: poisoning van few-shot-voorbeelden

Poison few-shot examples in prompts to establish behavioral patterns that override system instructions.

walkthroughsfine-tuningsafety-bypasstraining

Walkthrough: veiligheid omzeilen via fine-tuning

Walkthrough of using fine-tuning API access to remove safety behaviors from aligned models.

walkthroughsfunction-callingexploitationagent

Gids voor misbruik van function calling

Complete walkthrough of exploiting function calling in OpenAI, Anthropic, and Google AI APIs.

walkthroughsfunction-schemapoisoningtool-use

Walkthrough: poisoning van function schema's

Poison function schemas to inject hidden instructions that redirect model tool selection and parameter filling.

walkthroughsgcgadversarialoptimization

Walkthrough: GCG adversarial suffix-aanval

Complete walkthrough of implementing the Greedy Coordinate Gradient adversarial suffix attack.

walkthroughsindirect-injectionreal-worldweb

Indirecte prompt injection in de praktijk

Walkthrough of planting and triggering indirect prompt injection in web-browsing AI assistants.

walkthroughsjsoninjectionstructured-output

Walkthrough: JSON-injectie-aanval

Exploit JSON parsing and generation in LLM applications to inject payloads through structured data boundaries.

walkthroughslogit-biasexploitationapi-parameters

Walkthrough: misbruik van logit bias

Exploit logit bias API parameters to force models toward specific token sequences and bypass safety filters.

walkthroughsmany-shotjailbreakinganthropic

Walkthrough: many-shot jailbreaking

Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.

walkthroughsmarkdowninjectionrendering

Walkthrough: geavanceerde Markdown injection

Inject Markdown that triggers data exfiltration through image rendering, link generation, and code block escape.

walkthroughsmcptool-poisoningagent

Walkthrough: tool-poisoning-aanval in MCP

Walkthrough of exploiting MCP tool descriptions to redirect agent behavior via hidden instructions.

attacksmemorywalkthroughpersistencewalkthroughs

Walkthrough: aanval op geheugenpersistentie

Walkthrough of achieving persistent memory manipulation in agent systems for cross-session influence.

walkthroughscontext-overflowattentioninjection

Walkthrough: overflow van het model-contextvenster

Overflow the context window to push safety instructions outside the effective attention range.

walkthroughsmodel-extractionstealingapi

Walkthrough: modelextractie-aanval

Walkthrough of extracting model weights/behavior through systematic API querying.

walkthroughsmultimodaldocumentcombined-attack

Walkthrough van multimodale documentaanvallen

Combine visual and textual injection in documents processed by multimodal LLM applications.

walkthroughsmulti-turnpersonaescalation

Walkthrough van multi-turn persona-escalatie

Gradually escalate through persona manipulation across multiple turns to achieve full safety bypass.

walkthroughsmultimodalimageinjection

Walkthrough van multimodale image-injectie

Step-by-step walkthrough of embedding adversarial prompts in images for vision model exploitation.

walkthroughspairautomatedjailbreaking

Walkthrough van geautomatiseerde jailbreak met PAIR

End-to-end walkthrough of the PAIR attack methodology for automated black-box jailbreaking.

attacksagentsprivilegeescalationwalkthroughwalkthroughs

Walkthrough: privilege escalation van agents

Walkthrough of escalating privileges in multi-agent systems through trust chain exploitation.

walkthroughsprompt-leakingadvancedextraction

Walkthrough: geavanceerde prompt leaking

Advanced techniques for extracting system prompts including iterative reconstruction and side-channel methods.

walkthroughsragpoisoningdata

End-to-end walkthrough van RAG-poisoning

Complete walkthrough of poisoning a RAG system from document injection through information extraction.

walkthroughsreasoningexploitationchain-of-thought

Walkthrough van misbruik van redeneermodellen

Exploit extended thinking in reasoning models to inject false premises and manipulate conclusion generation.

walkthroughsregexfilter-bypassevasion

Walkthrough van het omzeilen van een regex-filter

Systematically bypass regex-based input filters using Unicode tricks, encoding, and pattern-specific evasion.

walkthroughsrole-confusioninstruction-hierarchyattacks

Walkthrough van een role confusion-aanval

Exploit role confusion between system, user, and assistant messages to override safety instructions.

attacksroleexploitationconfusionwalkthroughs

Walkthrough van misbruik van rolverwarring

Exploit role boundaries in chat APIs by injecting assistant-role messages and system-level instructions.

attackscamouflagesemanticwalkthroughwalkthroughs

Walkthrough van semantische camouflage

Walkthrough of crafting semantically camouflaged injections that evade both classifiers and human review.

walkthroughssemanticdeceptionattacks

Walkthrough van een semantische misleidingsaanval

Craft semantically deceptive prompts that appear benign to classifiers while achieving adversarial objectives.

walkthroughssleeper-agentsdetectionalignment

Walkthrough van detectie van sleeper agents

Walkthrough of detecting deceptive sleeper agent behaviors in fine-tuned language models.

walkthroughssupply-chaininjectiondependencies

Walkthrough van prompt injection via de supply chain

Plant injection payloads in upstream data sources consumed by LLM applications including packages and documentation.

attackssupplychainpoisoningwalkthroughwalkthroughs

Vergiftiging van de model-supply chain

Walkthrough of poisoning ML supply chains through dependency confusion, model weight manipulation, and hub attacks.

walkthroughssynthetic-identityinjectionsocial-engineering

Walkthrough van injectie van synthetische identiteiten

Create synthetic identities that exploit LLM trust mechanisms to achieve elevated instruction priority.

walkthroughstaptree-of-attacksjailbreaking

Walkthrough van TAP Tree of Attacks

Detailed walkthrough of building and running Tree of Attacks for systematic jailbreak generation.

walkthroughsthought-injectionreasoningchain-of-thought

Walkthrough van thought injection in redeneermodellen

Inject adversarial thoughts into the reasoning chain of thinking models to manipulate final outputs.

walkthroughstimingside-channelinformation-leakage

Walkthrough van een timing-side-channel-aanval

Extract information from LLM applications through timing differences in response generation and safety filter processing.

attacksoptimizationtokenwalkthroughs

Walkthrough van aanvalsoptimalisatie op tokenniveau

Walkthrough of optimizing adversarial token sequences for maximum jailbreak effectiveness.

walkthroughstool-shadowingmcpagent

Walkthrough van een tool shadowing-aanval

walkthroughsvision-modelattacksmultimodal

Walkthrough van een aanval op een vision-model

Attack vision-language models through adversarial images with embedded text, perturbations, and visual trojans.

attacksvisionmodelwalkthroughwalkthroughs

Walkthrough van een aanval op een vision-model (aanval-walkthrough)

Step-by-step walkthrough of visual prompt injection, adversarial images, and OCR exploitation in vision-language models.

walkthroughsa2aimpersonationmulti-agent

Walkthrough: A2A agent-imitatie

Walkthrough of impersonating a trusted agent in an A2A network to intercept and redirect delegated tasks.

walkthroughsapi-parametersexploitationtechnique

Walkthrough: misbruik van API-parameters

Walkthrough of exploiting undocumented and misconfigured API parameters in LLM service endpoints.

walkthroughsbatch-apiexploitationtechnique

Walkthrough: misbruik van de Batch API

Walkthrough of exploiting batch processing APIs for large-scale prompt injection and output analysis.

walkthroughschain-of-thoughthijackingreasoning

Walkthrough: chain-of-thought hijacking

Walkthrough of hijacking visible reasoning traces in CoT models to redirect conclusions and bypass safety checks.

walkthroughscognitive-overloadjailbreakingtechnique

Walkthrough: jailbreak via cognitieve overbelasting

Step-by-step walkthrough of the cognitive overload technique that overwhelms safety training with complex nested instructions.

walkthroughscontext-virtualizationjailbreakingtechnique

Walkthrough: context virtualization-aanval

Walkthrough of creating virtualized contexts that make the model believe it is operating in a different environment.

walkthroughsconversation-poisoningmulti-turntechnique

Walkthrough: conversation poisoning-aanval

Step-by-step walkthrough of poisoning conversation history to alter model behavior in future turns.

walkthroughsdocument-parsingexploit-chainpdf

Walkthrough: exploitatieketen voor documentparsing

End-to-end walkthrough of exploiting document parsers to inject instructions through PDFs, DOCX, and HTML files.

walkthroughsfew-shotscalingjailbreaking

Analyse van schaalbaarheid van few-shot-aanvallen

Detailed analysis of how few-shot examples scale to influence model behavior, from 2-shot to many-shot regime.

walkthroughssteganographyimage-injectionmultimodal

Walkthrough: injection via afbeeldingssteganografie

Walkthrough of hiding prompt injection payloads in images using steganographic techniques for vision model attacks.

walkthroughsinstruction-hierarchyexploitationtechnique

Walkthrough: misbruik van de instructiehiërarchie

Walkthrough of exploiting how models prioritize system, user, and tool instructions to override safety directives.

walkthroughsknowledge-graphinjectionattack

Walkthrough: injection-aanval op een knowledge graph

Walkthrough of injecting adversarial facts into knowledge graphs consumed by LLM-based reasoning systems.

walkthroughslangchainexploit-chaincve

Walkthrough: LangChain-exploitketen

Walkthrough of chaining LangChain CVEs for remote code execution from prompt injection through to shell access.

walkthroughsmcpsampling-apiabuse

Walkthrough: misbruik van de MCP Sampling-API

Walkthrough of exploiting the MCP sampling API to manipulate how servers request and process LLM completions.

walkthroughsmemory-injectionpersistenceagent

Walkthrough: geheugen-injection en -persistentie

Walkthrough of injecting persistent instructions into agent memory systems that survive across sessions.

walkthroughsmulti-providerattack-campaigntechnique

Walkthrough van een multi-provider-aanvalscampagne

End-to-end walkthrough of conducting an attack campaign across OpenAI, Anthropic, Google, and open-source models.

walkthroughsoutput-formatexfiltrationtechnique

Walkthrough van exfiltratietechnieken via het uitvoerformaat

Walkthrough of using output format manipulation to exfiltrate data through code blocks, markdown, and structured outputs.

walkthroughsraghybrid-searchpoisoning

Walkthrough van vergiftiging van RAG hybrid search

Walkthrough of poisoning both vector and keyword search in hybrid RAG architectures for maximum retrieval influence.

walkthroughsrecursiveinjectionself-replicating

Walkthrough van recursieve prompt injection

Walkthrough of creating self-replicating injection payloads that persist through model output-to-input loops.

walkthroughsrefusal-suppressionjailbreakingtechnique

Walkthrough van de techniek voor weigeringsonderdrukking

Walkthrough of refusal suppression techniques that prime models to avoid safety refusal patterns.

walkthroughssemantic-obfuscationevasiontechnique

Walkthrough van semantische obfuscatie

Walkthrough of semantically obfuscating adversarial payloads so they appear benign to both classifiers and humans.

walkthroughsstructured-outputdata-leakagetechnique

Walkthrough van datalekkage via gestructureerde uitvoer

Walkthrough of using structured output requirements to extract sensitive data embedded in model responses.

walkthroughstool-useconfusionfunction-calling

Walkthrough van een tool use confusion-aanval

Walkthrough of confusing model tool-use decisions to invoke unintended functions or skip safety-critical tools.

walkthroughsunicodenormalizationbypass

Walkthrough van het omzeilen van Unicode-normalisatie

Step-by-step guide to exploiting Unicode normalization differences between input filters and model tokenizers.

walkthroughsvoice-aiinjectionspeech

Walkthrough van prompt injection bij spraak-AI

Walkthrough of injecting prompts into voice-based AI assistants through adversarial audio and ultrasonic signals.

walkthroughswatermarkanalysisdetection

Walkthrough: analyse van LLM-watermerken

Walkthrough of detecting and analyzing watermarks in LLM-generated text using statistical methods.

walkthroughsxmlinjectionstructured-data

Walkthrough van XML-injectie in LLM-systemen

Exploit XML parsing in LLM application pipelines to inject instructions through entity expansion and CDATA sections.

attacksxmlinjectionwalkthroughjsonwalkthroughs

XML- en JSON-injectie in LLM-apps

Walkthrough of exploiting XML and JSON parsing in LLM applications for injection and data manipulation.

walkthroughsdefenserobustness-testingframework

Framework voor het testen van adversarial robuustheid

Build a framework for continuously testing adversarial robustness of deployed LLM defense mechanisms.

walkthroughsdefenseadversarial-trainingrobustness

Adversarial training voor LLM-verdediging (verdediging-walkthrough)

Implement adversarial training techniques to improve LLM robustness against prompt injection and jailbreaking.

walkthroughsdefenseagent-permissionsboundaries

Afdwingen van permissiegrenzen voor agents

Implement fine-grained permission boundaries for LLM agents that limit tool access based on context and user role.

automatedtestingwalkthroughsdefense

Geautomatiseerde regressietesten van verdediging

Build automated regression tests for LLM security defenses using attack replay and continuous monitoring.

walkthroughsdefenseautomated-red-teamcontinuous

Geautomatiseerde red team-verdedigingsloop

Build an automated continuous red team testing loop that discovers and patches vulnerabilities iteratively.

walkthroughsdefensebehavioral-anomalydetection

Gedragsmatige anomaliedetectie voor LLM's

Implement behavioral anomaly detection that identifies when model outputs deviate from expected safety profiles.

walkthroughsdefenseclassifierinput-safety

Een input safety-classifier bouwen

Build a production-quality input classifier that detects prompt injection attempts.

walkthroughsdefenseconstitutional-aialignment

Implementatiegids voor Constitutional AI

Implement constitutional AI principles in a custom fine-tuning and RLHF pipeline.

walkthroughsdefensecontext-isolationpatterns

Implementatie van het context isolation-patroon

Implement context isolation patterns that prevent instruction leakage between system prompts and user data.

walkthroughsdefensedlpdata-protection

Data loss prevention voor LLM-apps

Build a data loss prevention layer for LLM applications that prevents sensitive data exposure in outputs.

benchmarkingsystemwalkthroughsdefense

Systeem voor het benchmarken van verdediging

Build a benchmarking system to continuously evaluate defense effectiveness against known attack classes.

walkthroughsdefensedefense-in-deptharchitecture

Defense-in-depth architectuur voor LLM-apps

Design and implement a complete defense-in-depth architecture for production LLM applications.

setupdefenseresponseincidentwalkthroughs

Opzetten van een AI incident response-systeem

Set up comprehensive incident response capabilities for AI-specific security incidents.

walkthroughsdefenseembedding-firewallinput-filtering

Uitrol van een input-embedding-firewall

Deploy an embedding-based firewall that detects injection attempts through semantic similarity to known attack patterns.

walkthroughsdefenseloggingforensics

Veilige logging van in- en uitvoer voor verdediging

Implement secure logging for LLM input/output pairs that supports forensic analysis without exposing sensitive data.

walkthroughsdefensellm-firewallarchitecture

Ontwerp van een LLM-firewallarchitectuur (verdediging-walkthrough)

Design and implement a comprehensive LLM firewall with input analysis, output filtering, and behavioral monitoring.

deploymentdefensellmhoneypotwalkthroughs

Uitrol van een LLM-honeypot

Deploy LLM honeypots to detect and study attacker behavior patterns and techniques.

defensemonitoringmodelbehaviorwalkthroughs

Opzet voor het monitoren van modelgedrag

Set up comprehensive model behavior monitoring to detect drift, anomalies, and potential compromise.

walkthroughsdefensemonitoringabuse-detection

LLM-applicaties monitoren op misbruik

Build a monitoring and alerting system to detect ongoing attacks against LLM applications.

walkthroughsdefensemulti-modelconsensus

Multi-model veiligheidsconsensus

Implement safety consensus mechanisms where multiple models must agree before executing sensitive actions.

walkthroughsdefenseoutput-filteringpipeline

Ontwerp van een uitvoerfilterpijplijn

Design and implement a multi-stage output filtering pipeline for LLM applications.

walkthroughsdefenseoutput-groundingverification

Grounding en verificatie van uitvoer

Implement output grounding verification to ensure LLM responses are factually supported by provided context.

defenseimplementationarmorpromptwalkthroughs

Implementatiegids voor Prompt Armor

Implement a comprehensive prompt armoring system with instruction isolation, delimiter hardening, and priority enforcement.

walkthroughsdefensehoneypotdetection

Opzetten van een prompt injection-honeypot

Deploy honeypot prompts and canary data that detect and characterize prompt injection attempts.

walkthroughsdefenseragsanitization

Walkthrough van invoersanitisatie voor RAG

Implement input sanitization for RAG systems to prevent document-based injection attacks.

defenserealtimedetectionattackwalkthroughs

Systeem voor realtime aanvalsdetectie

Build a real-time attack detection system that monitors LLM interactions for adversarial patterns.

walkthroughsdefensered-teamfeedback-loop

Feedbackloop tussen red team en verdediging

Build a continuous red team-defense improvement loop with automated testing and metric tracking.

walkthroughsdefensewatermarkingtraceability

Implementatie van watermerken in responses

Implement response watermarking to enable traceability and detect unauthorized reproduction of LLM outputs.

walkthroughsdefenseruntime-monitorsafety

Implementatie van een runtime-veiligheidsmonitor

Implement a runtime safety monitor that detects and blocks unsafe model outputs in real-time.

walkthroughsdefensesafety-classifiertraining

Aangepaste veiligheidsclassifiers trainen

Train custom safety classifiers tuned to your application's specific threat model and content policy.

architecturesecuredefenseagentwalkthroughs

Ontwerp van een veilige agentarchitectuur

Design a secure architecture for LLM agent systems with sandboxing, capability controls, and audit trails.

callingsecuredefensepatternsfunctionwalkthroughs

Patronen voor veilige function calls

Implement secure function calling with input validation, output sanitization, and capability restrictions.

architecturesecuredefenseragwalkthroughs

Walkthrough van een veilige RAG-architectuur

Design and implement a secure RAG architecture with document sanitization, access controls, and output validation.

walkthroughsdefensesecure-ragpipeline

Architectuur van een veilige RAG-pijplijn

Build a secure RAG pipeline with document sanitization, retrieval validation, and injection-resistant augmentation.

walkthroughsdefensesystem-promptprotection

Technieken voor bescherming van de systeemprompt

Implement multiple layers of system prompt protection against extraction, override, and leakage attacks.

threatsecuritydefenseintelligencewalkthroughs

Threat intelligence voor AI-beveiliging

Build a threat intelligence pipeline for staying current with AI security threats and attack techniques.

walkthroughsdefensetoken-filteringinput-safety

Invoerfiltering op tokenniveau

Implement token-level input filtering that detects injection patterns at the tokenization stage before model processing.

walkthroughsdefensetool-authorizationframework

Autorisatieframework voor tool calls

Implement a tool call authorization framework that validates tool invocations against policy before execution.

walkthroughsdefenseagentaccess-control

Implementatie van toegangscontrole voor agenttools

Implement fine-grained tool access control for LLM agents with capability-based security and approval workflows.

walkthroughsdefenseanomaly-detectiontraffic

Anomaliedetectie voor LLM-verkeer

Build anomaly detection for LLM API traffic to identify attack patterns, abuse, and prompt injection attempts.

walkthroughsdefenseconversation-integritymonitoring

Monitoring van conversatie-integriteit

Build a conversation integrity monitoring system that detects manipulation across multi-turn interactions.

walkthroughsdefensemetricsdashboard

Dashboard met metrieken voor verdedigingseffectiviteit

Build a dashboard to measure and visualize the effectiveness of defensive measures against attack categories.

walkthroughsdefensetestingautomation

Geautomatiseerde testpipeline voor verdediging

Build an automated pipeline that continuously tests defensive measures against evolving attack techniques.

walkthroughsdefenseembeddingpoisoning-detection

Detectiesysteem voor embedding-poisoning

Build a detection system for identifying poisoned documents in vector databases using statistical analysis.

walkthroughsdefensefunction-callingguardrails

Implementatie van guardrails voor function calling

Implement guardrails for function calling that validate tool selection, parameters, and execution scope.

walkthroughsdefensehoneypotdeception

Gids voor de uitrol van een LLM-honeypot

Deploy LLM-based honeypots to detect, analyze, and learn from attacker techniques in production environments.

walkthroughsdefenseincident-responseplaybook

Incidentrespons-playbook voor LLM-applicaties

Design and implement an incident response playbook specific to LLM application security incidents.

walkthroughsdefensellm-guardproduction

Gids voor productie-uitrol van LLM Guard

Deploy LLM Guard in a production environment with custom scanners, performance optimization, and monitoring.

walkthroughsdefensemcpserver-hardening

Gids voor beveiligingshardening van MCP-servers

Harden MCP server implementations against tool poisoning, transport attacks, and capability escalation.

walkthroughsdefensemulti-modelensemble

Multi-model verdedigingsensemble

Build an ensemble defense system using multiple models to cross-validate inputs and outputs for safety.

walkthroughsdefensenemo-guardrailsadvanced

Geavanceerde configuratie van NeMo Guardrails

Advanced NeMo Guardrails configuration including custom actions, knowledge base integration, and multi-model pipelines.

walkthroughsdefenseoutput-scanningpipeline

Een pipeline voor output-scanning bouwen

Build a production output scanning pipeline that detects data leakage, harmful content, and injection artifacts.

walkthroughsdefensepiiredaction

PII-detectie en -redactie voor LLM's

Build a PII detection and redaction system for LLM inputs and outputs to prevent data exposure.

walkthroughsdefenseclassifiertraining

Een prompt injection-classifier trainen

Train a custom prompt injection detection classifier using labeled datasets and modern NLP techniques.

walkthroughsdefenseragsandboxing

Implementatie van sandboxing voor RAG-documenten

Implement document-level sandboxing for RAG systems to prevent cross-document injection and privilege escalation.

walkthroughsdefenserate-limitingadaptive

Adaptieve rate limiting voor LLM-API's

Implement adaptive rate limiting that adjusts thresholds based on detected attack patterns and user behavior.

walkthroughsdefenseragsecure-architecture

Implementatie van een veilige RAG-architectuur

Implement a security-hardened RAG architecture with input sanitization, access control, and output validation.

walkthroughsdefensesystem-promptprotection

Beschermingslagen voor de systeemprompt

Implement multi-layer system prompt protection including injection-resistant formatting and runtime validation.

architecturetrustdefensellmzerowalkthroughs

Zero trust-architectuur voor LLM-apps

Implement zero trust principles in LLM application architecture with continuous verification and least privilege.

walkthroughsengagementcode-assistantsecurity

Volledige opdracht: AI-codeassistent

End-to-end engagement for assessing an AI-powered code assistant with repository access.

generationcontentengagementwalkthroughsengagements

Volledige opdracht: platform voor contentgeneratie

Full engagement walkthrough for a content generation AI platform with brand safety and copyright concerns.

analyticsengagementcustomerwalkthroughsengagements

Volledige opdracht: AI voor klantanalyse

Red team engagement for an AI-powered customer analytics system processing sensitive behavioral data.

devtoolsengagementwalkthroughsengagements

Volledige opdracht: AI-ontwikkelaarstools

End-to-end engagement for AI-powered developer tools with repository access and code execution capabilities.

educationalplatformengagementwalkthroughsengagements

Volledige opdracht: educatief AI-platform

End-to-end engagement walkthrough for an AI-powered educational platform with student safety requirements.

walkthroughsengagementfinancialadvisor

Volledige opdracht: AI-financieel adviseur

Full engagement for assessing an AI financial advisor chatbot handling investment recommendations.

governmentengagementwalkthroughsengagements

Volledige opdracht: AI-systeem voor de overheid

Full red team engagement for a government-deployed AI system with classification and compliance requirements.

walkthroughsengagementhealthcarecompliance

Volledige opdracht: AI-systeem voor de zorg

End-to-end engagement walkthrough for a healthcare AI system with HIPAA compliance requirements.

walkthroughsengagementinternalknowledge-bot

Volledige opdracht: interne kennisbot

Full engagement walkthrough for an internal enterprise knowledge bot with RAG and document access.

walkthroughsengagementmulti-agentplatform

Volledige opdracht: multi-agentplatform

Full red team engagement of a multi-agent platform with MCP servers and A2A communication.

walkthroughsengagementsaaschatbot

Volledige opdracht: SaaS-klantenchatbot

End-to-end engagement walkthrough for assessing a SaaS customer-facing chatbot with tool access.

auditsupplychainengagementwalkthroughsengagements

Volledige opdracht: audit van de ML-supply chain

End-to-end engagement walkthrough for auditing an organization's ML model supply chain security.

walkthroughsengagementcoding-agentautonomous

Volledige opdracht: autonome codeer-agent

End-to-end engagement for an autonomous AI coding agent with repo access, CI/CD integration, and deployment.

walkthroughsengagementcontent-generationplatform

Volledige opdracht: AI-platform voor contentgeneratie

Red team engagement of an AI content generation platform used for marketing, SEO, and social media.

walkthroughsengagementcustomer-supportmulti-tool

Volledige opdracht: multi-tool klantenservice-agent

End-to-end engagement for a customer support agent with order management, refund processing, and CRM access.

walkthroughsengagementdevopsinfrastructure

Volledige opdracht: DevOps-AI-assistent

End-to-end engagement for a DevOps AI assistant with CI/CD, cloud infrastructure, and monitoring access.

walkthroughsengagementdocument-processingpipeline

Volledige opdracht: documentverwerkingspijplijn

Full engagement of an AI document processing pipeline handling invoices, contracts, and compliance documents.

walkthroughsengagementeducationplatform

Volledige opdracht: AI-onderwijsplatform

End-to-end engagement walkthrough for assessing an AI-powered education platform with tutoring and grading.

walkthroughsengagementgovernmentcitizen-services

Volledige opdracht: AI voor overheidsdienstverlening aan burgers

Red team engagement of a government citizen services AI handling tax, benefits, and identity verification.

walkthroughsengagementhrbenefits

Volledige opdracht: HR-chatbot voor arbeidsvoorwaarden

Full engagement walkthrough for an HR benefits chatbot with access to employee records and benefits administration.

walkthroughsengagementinsuranceunderwriting

Volledige opdracht: AI-acceptatie van verzekeringen

Full engagement for an AI underwriting system with risk scoring, policy generation, and claims data access.

walkthroughsengagementlegalresearch

Volledige opdracht: AI-systeem voor juridisch onderzoek

Full red team engagement of a legal research AI with case law access, privilege management, and citation generation.

walkthroughsengagementmulti-agententerprise

Volledige opdracht: multi-agentsysteem voor ondernemingen

Full engagement of an enterprise multi-agent system with specialized agents for HR, IT, Finance, and Legal.

walkthroughsengagementretailpersonalization

Volledige opdracht: AI voor retailpersonalisatie

Red team engagement of an AI personalization engine with access to customer profiles and purchase history.

walkthroughsengagementsecuritycopilot

Volledige opdracht: AI-beveiligingscopilot

Red team engagement of an AI security copilot with access to SIEM, vulnerability scanners, and threat intelligence.

walkthroughsengagementsupply-chainoptimizer

Volledige opdracht: AI-optimalisator voor de supply chain

End-to-end engagement for a supply chain AI with access to logistics, inventory, and supplier management systems.

walkthroughsengagementtelehealthhealthcare

Volledige opdracht: AI-assistent voor telehealth

End-to-end engagement for a telehealth AI assistant with appointment scheduling, symptom assessment, and EHR access.

walkthroughstutorialstoolsmethodologydefenseplatformsengagements

Walkthroughs

Step-by-step guided walkthroughs covering red team tools, engagement methodology, defense implementation, platform-specific testing, and full engagement workflows.

Beginner

Ontwerp van adversarial simulaties

Design realistic adversarial simulations that model real-world threat actors and attack scenarios for AI systems.

walkthroughsmethodologyadversarial-simulationdesign

walkthroughsmethodologyagenticassessment

Methodologie voor assessment van agentic systemen

Comprehensive methodology for assessing agentic AI systems including tool use, memory, and multi-agent interactions.

pentestmethodologyplanningwalkthroughs

Planning van een AI-penetratietest

Complete methodology for planning AI-specific penetration tests including scope definition, resource allocation, and timeline.

methodologymaturityteamredwalkthroughs

Volwassenheidsmodel voor het AI red team (methodologie-walkthrough)

Maturity model for assessing and improving an organization's AI red teaming capabilities.

metricssecurityframeworkmethodologywalkthroughs

Framework voor AI-beveiligingsmetrieken

Framework for measuring and reporting on AI security posture using quantitative metrics.

vulnerabilityclassificationmethodologywalkthroughs

Classificatiesysteem voor AI-kwetsbaarheden

Structured system for classifying AI-specific vulnerabilities by type, impact, and exploitability.

walkthroughsmethodologyattack-prioritizationframework

Framework voor het prioriteren van aanvallen

Prioritize attack techniques based on target architecture, time constraints, and likelihood of success.

walkthroughsmethodologyattack-treesplanning

Attack trees opstellen voor LLM-systemen

Build systematic attack trees for LLM system assessments using MITRE ATLAS and OWASP mappings.

walkthroughsmethodologyautomated-reconworkflow

Geautomatiseerde workflow voor AI-verkenning

Build an automated reconnaissance workflow that maps AI application architecture, models, and defense configurations.

walkthroughsmethodologycollaborativeteam

Collaboratief AI red team-assessment

Coordinate multi-person red team assessments with role assignments, communication protocols, and finding deconfliction.

securityanalysismethodologycompetitivewalkthroughs

Concurrentieanalyse van AI-beveiligingstools

Methodology for evaluating and comparing AI security tools for red team operations.

walkthroughsmethodologycompliancetesting

Compliance-gedreven testmethodologie

Map regulatory requirements to specific test cases for compliance-driven AI red team assessments.

walkthroughsmethodologycontinuous-monitoringintegration

Methodologie voor integratie van continue monitoring

Integrate red team findings into continuous monitoring systems for ongoing threat detection and defense validation.

walkthroughsmethodologydeduplicationtriage

Deduplicatie en triage van bevindingen

Deduplicate and triage findings from automated and manual testing into actionable, prioritized vulnerability reports.

walkthroughsmethodologymulti-modeltesting

Methodologie voor multi-modeltesten

Structured methodology for testing applications that use multiple LLM models in their processing pipeline.

walkthroughsmethodologypost-engagementanalysis

Methodologie voor analyse na het engagement

Conduct thorough post-engagement analysis including lessons learned, technique effectiveness, and methodology refinement.

walkthroughsmethodologypurple-teamcollaborative

Methodologie voor purple team AI-beoordeling

Conduct collaborative purple team AI assessments with real-time feedback between red and blue team operations.

walkthroughsmethodologyregression-testingcontinuous

Methodologie voor regressietesten van AI-beveiliging

Design regression testing suites that verify security fixes remain effective across model updates and deployments.

walkthroughsmethodologyrisk-basedprioritization

Risicogebaseerde aanpak voor AI-testen

Apply risk-based testing approaches to focus assessment effort on the highest-impact vulnerability categories.

managementstakeholderredteammethodologywalkthroughs

Stakeholdermanagement bij AI-redteaming

Managing stakeholder expectations and communication throughout AI red team engagements.

walkthroughsmethodologystakeholder-reportingcommunication

Methodologie voor stakeholderspecifieke rapportage

Tailor red team reports for different stakeholders including executives, developers, security teams, and compliance officers.

securitytabletopmethodologyexercisewalkthroughs

Tabletop-oefeningen voor AI-beveiliging

Designing and facilitating tabletop exercises focused on AI security incident scenarios.

walkthroughsmethodologytime-boxedefficient

Methodologie voor een tijdgebonden AI-beoordeling

Maximize testing coverage within strict time constraints using prioritized attack trees and parallel testing.

walkthroughsmethodologyattack-surfaceenumeration

Methodologie voor enumeratie van het AI-aanvalsoppervlak

Systematic methodology for enumerating the complete attack surface of an AI-powered application.

walkthroughsmethodologycompliancetesting

Methodologie voor AI-compliancetesten

Methodology for testing AI systems against regulatory compliance requirements including EU AI Act and NIST.

walkthroughsmethodologymitre-atlasmapping

Bevindingen koppelen aan MITRE ATLAS

Methodology for mapping AI red team findings to MITRE ATLAS tactics, techniques, and procedures.

walkthroughsmethodologyreport-structurepentest

Structuur van een AI-penetratietestrapport

Detailed report structure for AI penetration tests with finding templates and severity scoring.

walkthroughsmethodologyscopingtemplates

Scoping-sjablonen voor het AI red team

Templates and procedures for scoping AI red team engagements across different application types.

walkthroughsmethodologyrisk-quantificationmetrics

Methodologie voor het kwantificeren van AI-risico's

Quantitative risk assessment methodology for AI vulnerabilities with probability and impact scoring.

walkthroughsmethodologytabletopexercise

Ontwerp van tabletop-oefeningen voor AI-beveiliging

Design and facilitate AI security tabletop exercises for organizational preparedness assessment.

walkthroughsmethodologyevidencecollection

Bewijsverzameling tijdens AI-testen

Best practices for collecting, organizing, and preserving evidence during AI red team assessments.

walkthroughsmethodologymulti-modelassessment

Methodologie voor multi-modelbeoordeling

Methodology for assessing applications that use multiple AI models in pipelines or ensemble configurations.

walkthroughsmethodologyowasptesting-guide

Testmethodologie voor de OWASP LLM Top 10

Comprehensive testing methodology for each vulnerability in the OWASP LLM Top 10 2025.

walkthroughsmethodologypurple-teamoperations

Purple team-operaties voor AI-beveiliging

Methodology for conducting purple team operations that combine red team attacks with blue team defense improvement.

walkthroughsmethodologyregression-testingcontinuous

Methodologie voor regressietesten van AI-beveiliging (methodologie-walkthrough)

Methodology for continuous regression testing of AI application security after updates and model changes.

walkthroughsmethodologystakeholder-communicationrisks

AI-risico's communiceren naar stakeholders

Guide for communicating AI security risks to technical and non-technical stakeholders effectively.

walkthroughsmethodologythreat-intelligenceai-systems

Threat intelligence voor AI-systemen

Methodology for gathering and applying threat intelligence specific to AI system attacks and defenses.

walkthroughsmethodologyvulnerability-prioritizationframework

Framework voor het prioriteren van AI-kwetsbaarheden

Framework for prioritizing AI vulnerabilities by exploitability, impact, and remediation cost.

walkthroughsmethodologyreportingprofessional

AI-red team-rapporten schrijven

Guide to writing clear, actionable AI red team assessment reports with findings and recommendations.

ai21platformstestingmodelswalkthroughs

AI21 Labs-modellen testen

Red team testing guide for AI21 Labs Jamba models including long context and efficiency features.

walkthroughsplatformsanthropicclaude

Anthropic Claude testen: complete gids

Complete red team testing guide for Anthropic's Claude including tool use, extended thinking, and computer use.

walkthroughsplatformsawsbedrock

AWS Bedrock-deployments testen

Red team testing guide for models deployed via AWS Bedrock including guardrails and access controls.

walkthroughsplatformsazureopenai

De Azure OpenAI Service testen

Red team testing guide for Azure OpenAI including content filtering, managed identity, and network controls.

cohereplatformstestingmodelswalkthroughs

Cohere-modellen testen

Red team testing guide for Cohere's Command-R models including RAG and tool use features.

walkthroughsplatformsdeepseekreasoning

DeepSeek-modellen testen

Red team testing guide for DeepSeek models including R1 reasoning exploitation and MoE architecture.

fireworksplatformstestingwalkthroughs

Het Fireworks AI-platform testen

Red team testing guide for Fireworks AI including function calling and compound AI systems.

walkthroughsplatformsgooglegemini

Google Gemini testen: complete gids

Complete red team testing guide for Google's Gemini including multimodal inputs and long context.

inferencegroqplatformstestingwalkthroughs

Het Groq inference-platform testen

Red team testing guide for Groq's high-speed inference platform and its security characteristics.

walkthroughsplatformshuggingfacehosted

Door Hugging Face gehoste modellen testen

Red team testing guide for models hosted on Hugging Face including Inference API and Spaces.

walkthroughsplatformsmetallama

Meta Llama testen: complete gids

Complete red team testing guide for Meta's Llama models including fine-tuning and self-hosted deployment.

mistralplatformstestingmodelswalkthroughs

Mistral AI-modellen testen

Complete red team testing guide for Mistral AI models including Mixtral MoE architecture and chat endpoints.

localollamaplatformstestingwalkthroughs

Lokale Ollama-deployments testen

Security testing guide for locally deployed models via Ollama including network exposure and API security.

walkthroughsplatformsopenaigpt-4o

OpenAI GPT-4o testen: complete gids

Complete red team testing guide for OpenAI's GPT-4o including API, function calling, and structured outputs.

replicateplatformstestingwalkthroughshosted

Door Replicate gehoste modellen testen

Red team testing guide for models hosted on Replicate including open-source model deployments.

togetherplatformstestingwalkthroughs

Het Together AI-platform testen

Red team testing guide for Together AI including fine-tuned model endpoints and custom deployments.

walkthroughsplatformsai21jamba

AI21 Jamba-modellen testen

Red team testing guide for AI21's Jamba models with SSM-Transformer hybrid architecture analysis.

walkthroughsplatformsanyscaleray-serve

Anyscale/Ray Serve-endpoints testen

Red team testing guide for models served via Anyscale and Ray Serve including distributed inference security.

walkthroughsplatformscoherecommand

Cohere-modellen testen: complete gids

Complete red team testing guide for Cohere's Command models including RAG integration and tool use.

walkthroughsplatformsfireworks-aiinference

Fireworks AI-deployments testen

Red team testing guide for models deployed via Fireworks AI including function calling and fine-tuned models.

walkthroughsplatformsgcpvertex-ai

GCP Vertex AI-deployments testen

Red team testing guide for models deployed via GCP Vertex AI including Model Garden and custom endpoints.

walkthroughsplatformsmistralmixtral

Mistral/Mixtral testen: complete gids

Complete red team testing guide for Mistral and Mixtral models including MoE routing and API vulnerabilities.

walkthroughsplatformsollamalocal

Lokale Ollama-deployments testen (platform-walkthrough)

Red team testing guide for models deployed locally via Ollama including API endpoints and model management.

walkthroughsplatformsreplicatehosted

Door Replicate gehoste modellen testen

Red team testing guide for models hosted on Replicate including API exploitation and model versioning.

walkthroughsplatformstogether-aiinference

Door Together AI gehoste modellen testen

Red team testing guide for models hosted on Together AI's inference platform.

walkthroughsplatformsvllminference

vLLM inference-deployments testen

Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.

benchmarkrunnertoolsdevelopmentwalkthroughs

Ontwikkeling van een beveiligingsbenchmark-runner

Build a benchmark runner for standardized evaluation of LLM security across models and configurations.

replaytoolsbuildingtoolattackwalkthroughs

Een attack replay-tool bouwen

Build a tool that records and replays attack sequences for regression testing and defense validation.

walkthroughscounterfitml-securitytesting

ML-beveiligingstesten met Counterfit

Use Microsoft's Counterfit for adversarial ML testing of deployed model endpoints.

walkthroughsfuzzercustomdevelopment

Een custom LLM-fuzzer bouwen

Build a custom fuzzing tool for LLM applications with mutation engines and coverage tracking.

customtoolsmutationwalkthroughsengine

Een custom payload-mutatie-engine bouwen

Develop a mutation engine that evolves prompt injection payloads through genetic algorithms.

walkthroughsgarakconfigurationadvanced

Geavanceerde configuratiegids voor garak

Deep dive into Garak's configuration system including custom probes, detectors, and generators.

garakcustomtoolsprobeswalkthroughs

Custom garak-probes bouwen (tool-walkthrough)

Develop custom probe modules for NVIDIA's Garak to test domain-specific vulnerabilities.

customtoolsbehaviorsharmbenchwalkthroughs

Eigen gedragssets voor HarmBench

Extend HarmBench with custom behavior specifications for domain-specific vulnerability testing.

walkthroughsharmbenchevaluationbenchmark

Walkthrough: HarmBench-evaluatie

Run standardized attack evaluations using the HarmBench framework and interpret results.

toolswalkthroughsgarakpyritpromptfooburp-suiteinspect-aiollamapython

Tool-walkthroughs

End-to-end practical walkthroughs for essential AI red teaming tools, covering installation, configuration, execution, and result interpretation.

Beginner

Veiligheidsevaluaties met Inspect AI

Build and run AI safety evaluations using the UK AISI Inspect framework.

walkthroughsinspect-aisafetyevaluations

walkthroughsjailbreakbenchbenchmarktesting

Gebruik van en inzendingen bij JailbreakBench

Use JailbreakBench to evaluate jailbreak techniques and submit results to the benchmark.

walkthroughsllm-guarddeploymentdefense

Uitrol en testen van LLM Guard

Deploy LLM Guard for input/output scanning and test its effectiveness against common attacks.

llmtoolstrafficanalyzerwalkthroughs

Een analyzer voor LLM-verkeer bouwen

Build a proxy-based LLM traffic analyzer for intercepting and analyzing API communications.

integrationtoolscicdpromptfoowalkthroughs

Integratie van promptfoo in de CI/CD-pijplijn

Integrate Promptfoo into CI/CD pipelines for continuous LLM security regression testing.

walkthroughspromptfoored-teamevaluation

Promptfoo voor red team-evaluatie

Configure Promptfoo for comprehensive red team evaluation with custom assertions and graders.

walkthroughspyritmulti-turncampaigns

Orkestratie van multi-turn-campagnes met PyRIT

Build and run sophisticated multi-turn attack campaigns using PyRIT's orchestrator system.

integrationtoolspyritscoringwalkthroughs

Integratie van aangepaste scoring in PyRIT

Integrate custom scoring metrics into PyRIT for organization-specific red team evaluation criteria.

walkthroughstoolsagent-scannerdevelopment

Ontwikkeling van een agentbeveiligingsscanner

Build an automated security scanner for AI agent systems that tests tool use, memory, and workflow vulnerabilities.

walkthroughstoolsreport-generationautomation

Geautomatiseerde generatie van red team-rapporten

Build an automated system for generating structured red team reports from testing data and findings.

walkthroughstoolsattack-proxydevelopment

Een LLM-aanvalsproxy bouwen

Build an intercepting proxy for LLM API traffic that enables payload injection, response analysis, and logging.

walkthroughstoolsbenchmarkingdefense

Ontwikkeling van een tool voor het benchmarken van verdediging

Build a tool for benchmarking the effectiveness of defensive measures against standardized attack suites.

walkthroughstoolsembedding-attackstoolkit

Een toolkit voor embedding-aanvallen bouwen

Build a toolkit for conducting embedding-level attacks including inversion, poisoning, and manipulation.

walkthroughsgarakplugindevelopment

Ontwikkeling van eigen garak-plug-ins

Develop custom Garak plugins including probes, detectors, and generators for organization-specific testing.

walkthroughsinspect-aicustom-solversdevelopment

Ontwikkeling van eigen solvers voor Inspect AI

Build custom solvers in Inspect AI for specialized safety evaluation scenarios and attack simulations.

walkthroughstoolsdataset-curationjailbreaks

Tool voor het cureren van jailbreak-datasets

Build a tool for curating, organizing, and testing jailbreak prompt datasets with effectiveness tracking.

walkthroughstoolstraffic-analysismonitoring

Tool voor analyse van LLM-verkeer

Build a tool for analyzing and visualizing LLM API traffic patterns to identify attack indicators.

walkthroughstoolsmcp-auditsecurity

Tool voor MCP-beveiligingsaudits

Build a tool for auditing MCP server implementations for common security vulnerabilities and misconfigurations.

walkthroughstoolstest-harnessmulti-model

Bouw van een multi-model testharnas

Build a unified test harness for running attacks across OpenAI, Anthropic, Google, and local model endpoints.

walkthroughstoolsmutation-frameworkpayloads

Ontwikkeling van een payloadmutatie-framework

Build a payload mutation framework with grammar-based, semantic, and adversarial mutation strategies.

walkthroughspromptfooci-cdintegration

Integratie van promptfoo in de CI/CD-pijplijn (tool-walkthrough)

Integrate Promptfoo red team evaluations into CI/CD pipelines for automated security regression testing.

walkthroughspyritconverterdevelopment

Ontwikkeling van aangepaste converters in PyRIT

Build custom PyRIT converters for payload transformation, encoding, and multi-modal attack generation.

walkthroughstoolsrag-testingframework

Framework voor beveiligingstesten van RAG

Build a framework for systematic security testing of RAG applications including poisoning and exfiltration.