# intermediate

155 artikelengetagd met “intermediate”

Voorbereidingsgids voor certificering op gevorderd niveau

Comprehensive study guide for intermediate-level certification preparation.

assessmentsstudy-guideintermediatecertification

Capstone: simulatie van een compliance-assessment

Conduct a simulated compliance assessment against EU AI Act, NIST AI RMF, and ISO 42001, producing a comprehensive gap analysis report.

capstonecomplianceeu-ai-actnistiso-42001intermediate

Gemiddeld

Capstone: ontwerp van een red team-programma

Design a complete AI red team program for a fictional enterprise, producing a comprehensive program charter document.

capstoneprogram-designgovernancecharterintermediate

Gemiddeld

Capstone: deep dive in een branchespecifieke verticaal

Select an industry vertical, threat model the AI systems within it, and produce a sector-specific AI security testing guide.

capstoneverticalhealthcarefinancelegalgovernmentintermediate

Gemiddeld

LLM Guard en Protect AI Guardian

Input-/outputscanning, PII-detectie, toxiciteitsfiltering, integratiepatronen en bypass-technieken voor LLM Guard en het Protect AI Guardian-ecosysteem.

llm-guardprotect-aipii-detectiontoxicitybypassintermediate

Gemiddeld

NVIDIA NeMo Guardrails

Architectuur, configuratie, Colang-programmering, integratiepatronen en bypass-technieken voor NVIDIA's open-source NeMo Guardrails-framework.

nemo-guardrailsnvidiacolangguardrailsbypassintermediate

Gemiddeld

AI-anomaliedetectie

Jailbreakpogingen, ongebruikelijke gebruikspatronen, uitvoerdrift en anomalieën in de embeddingruimte detecteren in AI-systemen via statistische en op ML gebaseerde methoden.

anomaly-detectionjailbreak-detectiondriftembeddingintermediate

Gemiddeld

AI-monitoring en observability

Wat te monitoren in AI-systemen, belangrijke metrieken om misbruik en drift te detecteren, alarmeringsstrategieën, en observability-architectuur voor LLM-applicaties.

monitoringobservabilitymetricsalertingintermediate

Gemiddeld

AI-loggingarchitectuur

Wat je vastlegt in de logs van AI-systemen — prompts, completions, latency, tokens, tool-aanroepen — samen met opslagstrategieën, bewaarbeleid en privacyoverwegingen.

loggingarchitectureprivacyretentioncomplianceintermediate

Gemiddeld

Adversarial ML: kernconcepten

Geschiedenis en grondbeginselen van adversarial machine learning — verstoringsaanvallen, evasion vs. poisoning, robuustheid — als brug van klassieke adversarial ML naar LLM-specifieke aanvallen.

adversarial-mlfundamentalsevasionpoisoningintermediate

Gemiddeld

AI-dreigingsmodellen: white-box, black-box en grey-box

Toegangsniveaus bij het testen van AI-beveiliging — wat op elk niveau mogelijk is, realistische scenario's en een vergelijking met traditionele dreigingsmodellering in de security.

threat-modelwhite-boxblack-boxgrey-boxintermediate

Gemiddeld

Embeddings en vectorruimtes voor redteamers

Begrijp hoe embeddings semantische betekenis vastleggen, hoe vectoroperaties werken en waarom redteamers embeddingruimtes moeten begrijpen voor RAG-aanvallen en het misbruiken van gelijkenis.

embeddingsvectorssemanticsintermediate

Gemiddeld

Lab: embeddingruimtes verkennen

Praktisch lab met Python om embeddingruimtes te visualiseren, semantische gelijkenis te meten en te demonstreren hoe adversarial documenten kunnen worden gemaakt die matchen met doel-queries.

labembeddingshands-onpythonintermediate

Gemiddeld

RAG-architectuur: hoe retrieval-systemen werken

End-to-end anatomie van een retrieval-augmented generation-pijplijn — documentopname, chunking, embedding, indexering, retrieval, contextassemblage en generatie — met een analyse van het aanvalsoppervlak in elke fase.

ragretrievalarchitectureintermediate

Gemiddeld

Aanvallen op semantische gelijkenis en vector search

Hoe cosinusgelijkenis en nearest neighbor search werken, en hoe aanvallers vectordatabases misbruiken via adversarial documenten, embedding collisions en het omzeilen van drempelwaarden.

Gemiddeld

Inferentie: sampling, temperatuur en generatie

Hoe LLM's tekst genereren tijdens inferentie — greedy decoding, top-k, top-p, temperatuur — en hoe deze parameters het slagingspercentage van aanvallen beïnvloeden.

inferencesamplingtemperaturedecodingintermediate

Gemiddeld

Tokenisatie en de beveiligingsimplicaties ervan

Hoe BPE- en SentencePiece-tokenizers werken, en hoe het gedrag van een tokenizer exploiteerbare aanvalsoppervlakken creëert, waaronder grensaanvallen, homoglyphen en encoding-trucs.

tokenizationbpesecurityencodingintermediate

Gemiddeld

Pre-training → fine-tuning → RLHF-pijplijn

Begrijp de drie fasen van het maken van een gealigneerde LLM — pre-training, supervised fine-tuning en RLHF/DPO — en de beveiligingsimplicaties in elke fase.

trainingrlhffine-tuningalignmentintermediate

Gemiddeld

De transformer-architectuur voor aanvallers

Diepe duik in de transformer-architectuur — attention, feed-forward-lagen en residual-verbindingen — vanuit het perspectief van welke onderdelen exploiteerbaar zijn.

transformerattentionarchitectureintermediate

Gemiddeld

Agentarchitecturen en tool use-patronen

Hoe de agentpatronen ReAct, Plan-and-Execute en LangGraph werken — tooldefinitie, aanroep en resultaatverwerking — en waar injectie plaatsvindt in elke architectuur.

agentstoolsreactlangchainintermediate

Gemiddeld

Veelvoorkomende AI-deploymentpatronen en aanvalsoppervlakken

Analyse van deploymentpatronen — chatbots, copilots, autonome agents, batchverwerking en fine-tuned modellen — met hun unieke aanvalsoppervlakken en beveiligingsoverwegingen.

deploymentpatternsarchitectureintermediate

Gemiddeld

Lab: het aanvalsoppervlak van een AI-systeem in kaart brengen

Praktijklab dat je door de verkenning van een AI-systeem leidt — componenten identificeren, gegevensstromen in kaart brengen, tools inventariseren en het aanvalsoppervlak documenteren.

labreconnaissanceattack-surfacemappingintermediate

Gemiddeld

CTF: Code Gen Exploit

Manipulate AI code generation to produce vulnerable, backdoored, or malicious code. Explore how prompt manipulation influences code security, from subtle vulnerability injection to full backdoor insertion.

ctfcode-generationvulnerabilityintermediate

Gemiddeld

CTF: RAG Heist

Extract sensitive information from a Retrieval-Augmented Generation system by exploiting retrieval mechanisms, document parsing, embedding manipulation, and context window management vulnerabilities.

ctfragdata-extractionretrievalintermediate

Gemiddeld

AI-red team-labs voor gevorderden

Overview of intermediate-level hands-on labs covering agent exploitation, RAG poisoning, multi-turn attacks, automated campaigns, and advanced injection techniques for AI red teaming.

labsintermediate

Gemiddeld

Geavanceerde encoding chain-aanvallen

Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.

encodingchainsintermediateadvancedlablabs

Gemiddeld

Lab: geavanceerd misbruik van function calling

Exploit advanced function calling patterns including nested calls, parallel execution, and schema manipulation.

labsfunction-callingadvancedintermediate

Gemiddeld

Lab: geautomatiseerde generatie van adversarial prompts

Build automated systems for generating adversarial prompts using mutation, crossover, and selection strategies.

labsadversarialprompt-generationintermediate

Gemiddeld

Manipulatie van agentgeheugen

Exploit persistent memory in LLM agents to plant false context that persists across sessions.

labsagentmemoryintermediate

Gemiddeld

Lab: Agent Prompt Leaking

Praktijklab voor het extraheren van systeemprompts uit tool-gebruikende AI-agents door tool-interacties, observatiekanalen en redeneren in meerdere stappen uit te buiten om beschermde instructies te lekken.

labagentsprompt-leakingtool-useintermediate

Gemiddeld

Lab: beveiligingstesten van agent-workflows

Test multi-step agent workflows for injection points, state manipulation, and unauthorized action execution.

labsagent-workflowtestingintermediate

Gemiddeld

Lab: API-misbruikpatronen

Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.

intermediateabuselabpatternslabsapi

Gemiddeld

API-parameterfuzzing voor LLM's

Systematically fuzz LLM API parameters to discover unexpected behaviors and bypasses.

labsapifuzzingintermediate

Gemiddeld

Lab: beveiligingslekken bij API-versionering

Exploit version differences in LLM APIs to find deprecated endpoints and backward-compatible safety bypasses.

labsapi-versioningsecurity-gapsintermediate

Gemiddeld

Lab: geautomatiseerde red team-pijplijn

Hands-on lab for building a continuous AI red team testing pipeline using promptfoo, GitHub Actions, and automated attack generation to catch safety regressions before deployment.

labautomationci-cdpromptfoopipelineintermediate

Gemiddeld

Lab: geautomatiseerd red team-testen

Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.

labautomated-testingpromptfooci-cdintermediatehands-on

Gemiddeld

Lab: effectiviteit van canary tokens testen

Deploy and test canary tokens in system prompts to evaluate their effectiveness at detecting prompt extraction.

labscanary-tokenstestingintermediate

Gemiddeld

Lab: misbruik van chain-of-thought

Exploit chain-of-thought reasoning to leak internal model reasoning, bypass safety filters, and manipulate decision processes.

labschain-of-thoughtexploitationintermediate

Gemiddeld

Lab: CI/CD-pijplijnbeveiliging voor LLM-apps

Test CI/CD pipeline security for LLM applications including prompt template injection, model artifact tampering, and deployment attacks.

labscicdpipeline-testingintermediate

Gemiddeld

Lab: geavanceerde classifier-ontwijking

Evade advanced input classifiers using adversarial perturbation, semantic paraphrasing, and gradient-free optimization.

labsclassifierevasion-advancedintermediate

Gemiddeld

Lab: ontwijking van ML-classifiers

Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.

classifierintermediateevasionlablabs

Gemiddeld

Lab: beveiligingstesten van cloud-AI-platforms

Test cloud AI platform configurations including AWS Bedrock guardrails, Azure content filters, and GCP safety settings.

labscloud-platformtestingintermediate

Gemiddeld

Lab: beveiligingstesten van codegeneratie

Test LLM code generation for insecure patterns, injection vulnerabilities, and code execution safety issues.

labscode-generationsecurity-testingintermediate

Gemiddeld

Lab: grondbeginselen van AI-compliancetesten

Test LLM applications against OWASP LLM Top 10 and NIST AI RMF requirements using structured test procedures.

labscompliancetestingintermediate

Gemiddeld

Lab: optimalisatie van aanvallen op het contextvenster

Optimize prompt injection placement within the context window to maximize attack effectiveness using attention dynamics.

labscontext-windowoptimizationintermediate

Gemiddeld

Lab: poisoning van het context window

Exploit context window management to inject persistent adversarial content that influences future model responses.

contextintermediatelabpoisoningwindowlabs

Gemiddeld

Lab: conversation splitting-aanval

Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.

intermediatesplittingconversationlablabs

Gemiddeld

Poisoning van de conversatie-state

Exploit conversation history manipulation to alter model behavior across turns.

labsconversationstate-poisoningintermediate

Gemiddeld

Cross-context injection

Inject prompts that persist across separate conversation contexts in shared deployments.

labscross-contextinjectionintermediate

Gemiddeld

Generatie van custom jailbreak-wordlists

Generate targeted jailbreak wordlists using mutation strategies and seed prompts.

labswordlistsjailbreakingintermediate

Gemiddeld

Lab: kanalen voor data-exfiltratie

Hands-on lab for extracting data from AI systems through markdown image rendering, invisible links, tool call parameters, and other covert exfiltration channels.

labdata-exfiltrationcovert-channelsmarkdownintermediate

Gemiddeld

Lab: kanalen voor data-exfiltratie (lab voor gevorderden)

Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.

labdata-exfiltrationchannelsside-channelintermediatehands-on

Gemiddeld

Data-exfiltratie via side channels

Extract data from LLM systems using timing, token probability, and formatting side channels.

labsexfiltrationside-channelsintermediate

Gemiddeld

Verdediging-bypassketens bouwen

Combine multiple bypass techniques into reliable chains that defeat layered defenses.

labsdefense-bypasschainsintermediate

Gemiddeld

Lab: de effectiviteit van verdedigingen meten

Hands-on lab for quantifying AI guardrail robustness using attack success rates, evasion metrics, false positive rates, and statistical analysis of defense performance.

labdefensemetricsmeasurementguardrailsintermediate

Gemiddeld

Lab: opzet voor regressietesten van verdedigingen

Build a regression testing framework to continuously verify that LLM defenses remain effective against known attack patterns.

labsdefenseregression-testingintermediate

Gemiddeld

Lab: de effectiviteit van verdedigingen testen

Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.

labdefense-testingguardrailsmetricsintermediatehands-on

Gemiddeld

Lab: op documenten gebaseerde RAG injection

Inject adversarial content into documents that will be processed by a RAG system to influence model responses.

intermediateinjectionlablabsragdocument

Gemiddeld

Lab: geavanceerde embedding-manipulatie

Manipulate embedding vectors to achieve target similarity scores for RAG poisoning and retrieval manipulation.

labsembeddingmanipulation-advancedintermediate

Gemiddeld

Praktische manipulatie van embeddings

Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.

intermediatelabpracticalmanipulationembeddinglabs

Gemiddeld

Embedding-gelijkenisaanvallen

Manipulate text to achieve target embedding similarity scores for retrieval poisoning.

labsembeddingssimilarityintermediate

Gemiddeld

Lab: encoding omzeilen

Hands-on lab for systematically testing Base64, ROT13, Unicode homoglyph, hexadecimal, and combined encoding bypasses against LLM safety filters with measured success rates.

labencodingbypassobfuscationsystematic-testingintermediate

Gemiddeld

Beveiligingsonderzoek van de fine-tuning-API

Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.

fineintermediatetuninglablabsapi

Gemiddeld

Lab: testen van de veiligheidsimpact van fine-tuning

Measure how fine-tuning affects model safety by comparing pre and post fine-tuning safety benchmark scores.

labsfine-tuningsafety-testingintermediate

Gemiddeld

Lab: technieken om guardrails te omzeilen

Practice guardrail bypass techniques against NeMo Guardrails, LLM Guard, and custom classifier-based defenses.

labsguardrail-bypasstechniquesintermediate

Gemiddeld

Fingerprinting van guardrails

Systematically map the rules and thresholds of input/output guardrail systems.

labsguardrailsfingerprintingintermediate

Gemiddeld

Lab: extractie van guardrail-regels

Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.

guardrailruleintermediatelablabsextraction

Gemiddeld

Lab: LLM-hallucinaties misbruiken

Exploit hallucination tendencies to trigger fabricated tool calls, invented API endpoints, and false fact injection.

labshallucinationexploitationintermediate

Gemiddeld

Lab: simulatie van AI-incidentrespons

Practice AI incident response procedures through a simulated prompt injection incident with escalation and containment.

labsincident-responsesimulationintermediate

Gemiddeld

Lab: indirecte prompt injection

Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.

labindirect-injectionexternal-datadocumentsintermediatehands-on

Gemiddeld

Indirecte injection via webcontent

Plant prompt injection payloads in web pages consumed by RAG-enabled LLM applications.

labsindirect-injectionwebintermediate

Gemiddeld

Lab: het volgen van instructies meten

Quantitatively measure instruction following compliance to identify where models prioritize competing instructions.

labsinstruction-followingmeasurementintermediate

Gemiddeld

Manipulatie van een LLM-judge

Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.

labsllm-judgemanipulationintermediate

Gemiddeld

Lab: log probability-analyse voor aanvallen

Use token log probabilities to extract information about training data, system prompts, and model internals.

labslogprobsanalysisintermediate

Gemiddeld

Lab: geavanceerd testen van het MCP-protocol

Perform advanced security testing of MCP servers including transport analysis, capability enumeration, and tool poisoning.

labsmcpadvanced-testingintermediate

Gemiddeld

Tool-poisoning-aanval in MCP

Exploit MCP tool descriptions to inject instructions that redirect agent behavior.

labsmcptool-poisoningintermediate

Gemiddeld

Lab: fuzzing van het Model Context Protocol

Fuzz MCP protocol messages to discover parsing errors, validation gaps, and unexpected behavior in MCP servers.

labsmcp-fuzzingprotocolintermediate

Gemiddeld

Praktisch lab voor model-fingerprinting

Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.

fingerprintingintermediatelabpracticallabsmodel

Gemiddeld

Lab: opzet van LLM-beveiligingsmonitoring

Deploy a comprehensive security monitoring system for LLM applications with anomaly detection and alerting.

labsmonitoringsecurity-setupintermediate

Gemiddeld

Lab: communicatie tussen agents testen

Test security of inter-agent communication in multi-agent systems including message injection and trust boundary violations.

labsmulti-agentcommunication-testingintermediate

Gemiddeld

Lab voor meertalige veiligheidsbypass

Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.

intermediatebypasslablanguagelabsmulti

Gemiddeld

Meertalige injectie-aanvallen

Exploit language switching and low-resource language gaps to bypass safety training.

labsmultilingualinjectionintermediate

Gemiddeld

Chaining van aanvallen over meerdere modellen

Chain attacks across multiple LLM models in a pipeline to bypass per-model defenses.

labsmulti-modelchainingintermediate

Gemiddeld

Lab: beveiligingstesten met vergelijking over meerdere modellen

Compare security postures across multiple LLM providers by running identical attack suites and analyzing differential responses.

labsmulti-modelcomparison-testingintermediate

Gemiddeld

Lab voor aanvallen op multi-modelpijplijnen

Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.

pipelineintermediatelablabsmultimodel

Gemiddeld

Lab: state tracking-aanvallen over meerdere beurten

Exploit conversation state tracking to build persistent attack context across multiple conversation turns.

labsmulti-turnstate-trackingintermediate

Gemiddeld

Lab: multimodale injection

Hands-on lab for embedding text instructions in images to exploit vision-enabled LLMs. Learn to craft visual prompt injections, test OCR-based attacks, and evaluate multimodal safety boundaries.

labmultimodalvisionimage-injectionintermediate

Gemiddeld

Lab: multimodale injection (lab voor gevorderden)

Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.

labmultimodalimage-injectionvisionintermediatehands-on

Gemiddeld

Lab: multimodaal beveiligingstesten voor gevorderden

Test multimodal LLMs with crafted images containing embedded text, adversarial perturbations, and visual injection payloads.

labsmultimodalintermediate-testingintermediate

Gemiddeld

Lab voor ontdekking van uitvoerkanalen

Discover alternative output channels in LLM applications that bypass standard output filtering.

discoverychannelintermediateoutputlablabs

Gemiddeld

Lab: misbruik van het uitvoerschema

Exploit structured output schemas to force models into revealing information through required field population.

labsoutput-schemaexploitationintermediate

Gemiddeld

Lab: side-channel-aanvallen op prompt-caching

Exploit prompt caching mechanisms to detect cached prompts and extract information through timing side channels.

labsprompt-cachingside-channelintermediate

Gemiddeld

Lab voor technieken van promptlekkage

Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.

intermediatetechniqueslabpromptlabsleakage

Gemiddeld

Geautomatiseerde redteaming met PyRIT

Use Microsoft's PyRIT framework to run automated multi-turn attack campaigns.

labspyritautomatedintermediate

Gemiddeld

Lab: geavanceerd beveiligingstesten van RAG

Test RAG systems for chunking exploitation, reranking manipulation, and cross-document injection attacks.

labsragadvanced-testingintermediate

Gemiddeld

Vergiftiging van RAG-context

Poison a vector database to inject adversarial content into RAG retrieval results.

labsragpoisoningintermediate

Gemiddeld

Kaping van de redeneerketen

Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.

intermediatechainlabhijackingreasoninglabs

Gemiddeld

Lab: misbruik van de rolhiërarchie

Exploit multi-role LLM architectures by manipulating the instruction hierarchy between system, user, and assistant roles.

labsrole-hierarchyexploitationintermediate

Gemiddeld

Lab: benchmarking van veiligheidsfilters

Benchmark safety filters across providers using standardized test suites to compare detection rates and false positives.

labssafety-filterbenchmarkingintermediate

Gemiddeld

Lab voor het maken van semantische misleiding

Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.

craftingintermediatesemanticlabdeceptionlabs

Gemiddeld

Het maken van semantische injectie

Craft semantically coherent injections that evade both classifiers and human review.

labssemanticinjectionintermediate

Gemiddeld

Lab: manipulatie van semantisch zoeken

Manipulate semantic search results by crafting documents that achieve target similarity scores against specific queries.

labssemantic-searchmanipulationintermediate

Gemiddeld

Lab: sessie-isolatie testen

Test session isolation in multi-tenant LLM deployments to detect cross-session data leakage and context contamination.

labssession-isolationtestingintermediate

Gemiddeld

Lab voor misbruik van sessiestatus

Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.

exploitationintermediatesessionlablabsstate

Gemiddeld

Lab: manipulatie van streaming-responses

Exploit streaming response mechanisms to bypass output filters that only check complete responses.

labsstreamingresponse-manipulationintermediate

Gemiddeld

Veiligheidsbypass via gestructureerde uitvoer

Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.

intermediatebypassoutputlabstructuredlabs

Gemiddeld

Misbruik van gestructureerde uitvoer

Exploit JSON mode and structured output constraints to bypass safety filters.

labsstructured-outputexploitationintermediate

Gemiddeld

Lab: supply chain-audit

Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.

labsupply-chainauditdependenciessecurityintermediatehands-on

Gemiddeld

Lab: scan van de ML-supply chain

Hands-on lab for auditing machine learning model dependencies, detecting malicious packages in ML pipelines, and scanning model files for backdoors and supply chain threats.

labsupply-chainsecurityscanningdependenciesintermediate

Gemiddeld

Lab: reconstructie van de system prompt (lab voor gevorderden)

Reconstruct full system prompts through iterative partial extraction and cross-referencing multiple leakage vectors.

labssystem-promptreconstructionintermediate

Gemiddeld

Geavanceerde token smuggling

Use Unicode homoglyphs, zero-width characters, and tokenizer quirks to smuggle payloads.

labstoken-smugglingadvancedintermediate

Gemiddeld

Lab: injection-aanvallen via toolresultaten

Inject adversarial content through tool call results to poison model reasoning and redirect subsequent actions.

labstool-resultinjectionintermediate

Gemiddeld

Lab: bypass-aanvallen via Unicode-normalisatie

Exploit Unicode normalization differences between input validation and model processing to smuggle injection payloads.

labsunicodenormalization-bypassintermediate

Gemiddeld

Lab: toegangscontrole van een vectordatabase testen

Test vector database access controls for bypass vulnerabilities including namespace traversal and filter manipulation.

labsvector-dbaccess-controlintermediate

Gemiddeld

Agent goal-hijacking

Redirect an AI agent's objectives through carefully crafted inputs that override its primary task.

labsagentgoal-hijackingintermediate

Gemiddeld

Assistant prefill injection-aanvallen

Exploit assistant message prefilling to prime model responses and bypass safety alignment.

labsassistant-prefillinjectionintermediate

Gemiddeld

Aanvallen via manipulatie van chatgeschiedenis

Inject false messages into conversation history to alter model behavior in subsequent turns.

labschat-historymanipulationintermediate

Gemiddeld

Grenzen van de input-classifier in kaart brengen

Systematically probe input classifiers to map their decision boundaries and find bypass regions.

labsclassifierboundary-mappingintermediate

Gemiddeld

Misbruik van de grenzen van het context window

Exploit the boundaries of context windows to push safety instructions beyond the model's attention.

labscontext-windowboundaryintermediate

Gemiddeld

Implementatie van de Crescendo-aanval

Implement Microsoft's Crescendo multi-turn escalation attack with automated conversation management.

labscrescendoimplementationintermediate

Gemiddeld

Cross-plug-in exploitatieketens koppelen

Chain exploits across multiple plugins/tools in an agent system to achieve multi-step attacks.

labscross-pluginchainingintermediate

Gemiddeld

Manipulatie van function schema's

Exploit function calling schemas to inject instructions through parameter descriptions and enums.

labsfunction-schemamanipulationintermediate

Gemiddeld

Custom garak-probes bouwen

Develop custom probe modules for NVIDIA Garak to test organization-specific vulnerability patterns.

labsgarakcustom-probesintermediate

Gemiddeld

Detectie van guardrails op basis van latency

Use timing side channels to identify and characterize guardrail implementations in LLM applications.

labsguardrailslatencydetectionintermediate

Gemiddeld

Custom veiligheidsevaluaties met Inspect AI

Build custom safety evaluation tasks in Inspect AI to test application-specific vulnerability classes.

labsinspect-aicustom-evaluationsintermediate

Gemiddeld

Vergelijkend testen met JailbreakBench

Run standardized jailbreak benchmarks across multiple models and compare vulnerability profiles.

labsjailbreakbenchcomparativeintermediate

Gemiddeld

Veiligheid omzeilen in JSON-modus

Exploit structured JSON output modes to bypass safety filters that only check plain-text responses.

labsjson-modesafety-bypassintermediate

Gemiddeld

Ontwijkingstechnieken voor LLM Guard

Develop evasion techniques against LLM Guard input scanners and output detectors.

labsllm-guardevasionintermediate

Gemiddeld

Many-shot jailbreaking implementeren

Implement Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.

labsmany-shotjailbreakingintermediate

Gemiddeld

Data-exfiltratie via MCP-resources (lab voor gevorderden)

Exploit MCP resource URIs to access and exfiltrate data beyond intended scope boundaries.

labsmcpresourceexfiltrationintermediate

Gemiddeld

Misbruik van meertalige veiligheidshiaten

Systematically identify and exploit safety training gaps across low-resource languages.

labsmultilingualsafety-gapsintermediate

Gemiddeld

Misbruik van state tracking over meerdere beurten

Exploit state management in multi-turn conversations to gradually shift model behavior over many turns.

labsmulti-turnstate-trackingintermediate

Gemiddeld

Technieken voor het omzeilen van NeMo Guardrails

Test and bypass NVIDIA NeMo Guardrails configurations using known evasion techniques.

labsnemo-guardrailsbypassintermediate

Gemiddeld

Uitvoerfilter omzeilen via encoding

Bypass output safety filters by instructing the model to encode responses in Base64, hex, or custom schemes.

labsoutput-filterbypassencodingintermediate

Gemiddeld

PDF-documentinjectie voor RAG-systemen

Craft adversarial PDF documents that inject instructions when processed by RAG document loaders.

labspdfinjectionragintermediate

Gemiddeld

Variabele-injectie in prompttemplates

Inject adversarial content through template variable expansion in prompt management systems.

labsprompt-templatevariable-injectionintermediate

Gemiddeld

Aangepaste aanvalsstrategieën in PyRIT

Build custom attack strategies and converters in Microsoft's PyRIT framework for targeted campaigns.

labspyritcustom-strategiesintermediate

Gemiddeld

Campagne voor RAG-documentinjectie

Design and execute a document injection campaign against a RAG-powered application with vector search.

labsragdocument-injectionintermediate

Gemiddeld

Vergiftiging van semantisch zoeken

Craft adversarial documents that rank highly in semantic search for targeted queries in RAG systems.

labssemantic-searchpoisoningembeddingsintermediate

Gemiddeld

Exfiltratie via streaming responses

Exploit streaming API responses to exfiltrate data that might be caught by batch output filters.

labsstreamingresponseexfiltrationintermediate

Gemiddeld

Extractie van de systeemprompt via token probabilities

Exploit token probability APIs to reconstruct hidden system prompts character by character.

labslogprobssystem-promptintermediate

Gemiddeld

Manipulatie van tokengrenzen

Exploit tokenizer-specific behavior by crafting inputs that split across token boundaries in unexpected ways.

labstokenizerboundary-manipulationintermediate

Gemiddeld

Injectie-aanvallen via toolresultaten

Craft malicious tool return values that inject instructions back into the model's reasoning chain.

labstool-resultinjectionintermediate

Gemiddeld

Tekst injecteren in VLM's via afbeeldingen

Embed adversarial text in images that vision-language models read and follow as instructions.

labsvisiontext-injectionimagesintermediate

Gemiddeld

Lab: data-exfiltratie via webhooks

Test data exfiltration paths through LLM-triggered webhooks, API callbacks, and external URL rendering.

labswebhookexfiltrationintermediate

Gemiddeld

Simulatie: AI-beoordeling van een startup

Red team a startup's AI-powered product with limited scope and budget, making pragmatic tradeoffs between thoroughness and time constraints.

simulationstartupassessmentbudget-constrainedintermediate

Gemiddeld

Code injection via Markdown

Injecting executable payloads through markdown rendering in LLM outputs, exploiting the gap between text generation and content rendering in web-based LLM interfaces.

prompt-injectionmarkdowncode-injectionxssred-teamingintermediate

Gemiddeld

Context window stuffing

Techniques for filling the LLM context window to push system instructions out of active memory, manipulating token budgets to dilute or displace defensive prompts.

prompt-injectioncontext-windowtoken-manipulationred-teamingintermediate

Gemiddeld

Delimiter escape-aanvallen

Techniques for escaping delimiters used to separate system and user content in LLM applications, breaking out of sandboxed input regions to inject instructions.

prompt-injectiondelimiter-escapesandbox-escapered-teamingintermediate

Gemiddeld

Ontwijking op basis van encoding

Using base64, ROT13, hexadecimal, Unicode, and other encoding schemes to evade input detection systems and bypass content filters in LLM applications.

prompt-injectionencodingbase64rot13unicodeevasionred-teamingintermediate

Gemiddeld

Few-shot injection

Using crafted few-shot examples within user input to steer LLM behavior toward unintended outputs, exploiting in-context learning to override safety training.

prompt-injectionfew-shotin-context-learningred-teamingintermediate

Gemiddeld

Technieken voor payloadobfuscatie

Methods for disguising prompt injection payloads through encoding, splitting, substitution, and other obfuscation techniques to bypass input filters and detection systems.

prompt-injectionobfuscationevasionpayload-craftingred-teamingintermediate

Gemiddeld

Role-play-injectie

Using fictional scenarios, character role-play, and narrative framing to bypass LLM safety filters by having the model operate within a permissive fictional context.

prompt-injectionrole-playjailbreakfictional-framingred-teamingintermediate

Gemiddeld

Injectie via vertaling

Using translation requests and low-resource languages to bypass content filters, exploiting the uneven distribution of safety training across languages.

prompt-injectiontranslationmultilinguallow-resource-languagesred-teamingintermediate

Gemiddeld

Het creëren van een virtuele persona

Creating persistent alternate personas within LLM conversations to bypass safety training, establishing character identities that override the model's default behavioral constraints.

prompt-injectionpersonajailbreakDANcharacter-creationred-teamingintermediate

Gemiddeld