# intermediate
155 artikelengetagd met “intermediate”
Voorbereidingsgids voor certificering op gevorderd niveau
Comprehensive study guide for intermediate-level certification preparation.
Capstone: simulatie van een compliance-assessment
Conduct a simulated compliance assessment against EU AI Act, NIST AI RMF, and ISO 42001, producing a comprehensive gap analysis report.
Capstone: ontwerp van een red team-programma
Design a complete AI red team program for a fictional enterprise, producing a comprehensive program charter document.
Capstone: deep dive in een branchespecifieke verticaal
Select an industry vertical, threat model the AI systems within it, and produce a sector-specific AI security testing guide.
LLM Guard en Protect AI Guardian
Input-/outputscanning, PII-detectie, toxiciteitsfiltering, integratiepatronen en bypass-technieken voor LLM Guard en het Protect AI Guardian-ecosysteem.
NVIDIA NeMo Guardrails
Architectuur, configuratie, Colang-programmering, integratiepatronen en bypass-technieken voor NVIDIA's open-source NeMo Guardrails-framework.
AI-anomaliedetectie
Jailbreakpogingen, ongebruikelijke gebruikspatronen, uitvoerdrift en anomalieën in de embeddingruimte detecteren in AI-systemen via statistische en op ML gebaseerde methoden.
AI-monitoring en observability
Wat te monitoren in AI-systemen, belangrijke metrieken om misbruik en drift te detecteren, alarmeringsstrategieën, en observability-architectuur voor LLM-applicaties.
AI-loggingarchitectuur
Wat je vastlegt in de logs van AI-systemen — prompts, completions, latency, tokens, tool-aanroepen — samen met opslagstrategieën, bewaarbeleid en privacyoverwegingen.
Adversarial ML: kernconcepten
Geschiedenis en grondbeginselen van adversarial machine learning — verstoringsaanvallen, evasion vs. poisoning, robuustheid — als brug van klassieke adversarial ML naar LLM-specifieke aanvallen.
AI-dreigingsmodellen: white-box, black-box en grey-box
Toegangsniveaus bij het testen van AI-beveiliging — wat op elk niveau mogelijk is, realistische scenario's en een vergelijking met traditionele dreigingsmodellering in de security.
Embeddings en vectorruimtes voor redteamers
Begrijp hoe embeddings semantische betekenis vastleggen, hoe vectoroperaties werken en waarom redteamers embeddingruimtes moeten begrijpen voor RAG-aanvallen en het misbruiken van gelijkenis.
Lab: embeddingruimtes verkennen
Praktisch lab met Python om embeddingruimtes te visualiseren, semantische gelijkenis te meten en te demonstreren hoe adversarial documenten kunnen worden gemaakt die matchen met doel-queries.
RAG-architectuur: hoe retrieval-systemen werken
End-to-end anatomie van een retrieval-augmented generation-pijplijn — documentopname, chunking, embedding, indexering, retrieval, contextassemblage en generatie — met een analyse van het aanvalsoppervlak in elke fase.
Aanvallen op semantische gelijkenis en vector search
Hoe cosinusgelijkenis en nearest neighbor search werken, en hoe aanvallers vectordatabases misbruiken via adversarial documenten, embedding collisions en het omzeilen van drempelwaarden.
Inferentie: sampling, temperatuur en generatie
Hoe LLM's tekst genereren tijdens inferentie — greedy decoding, top-k, top-p, temperatuur — en hoe deze parameters het slagingspercentage van aanvallen beïnvloeden.
Tokenisatie en de beveiligingsimplicaties ervan
Hoe BPE- en SentencePiece-tokenizers werken, en hoe het gedrag van een tokenizer exploiteerbare aanvalsoppervlakken creëert, waaronder grensaanvallen, homoglyphen en encoding-trucs.
Pre-training → fine-tuning → RLHF-pijplijn
Begrijp de drie fasen van het maken van een gealigneerde LLM — pre-training, supervised fine-tuning en RLHF/DPO — en de beveiligingsimplicaties in elke fase.
De transformer-architectuur voor aanvallers
Diepe duik in de transformer-architectuur — attention, feed-forward-lagen en residual-verbindingen — vanuit het perspectief van welke onderdelen exploiteerbaar zijn.
Agentarchitecturen en tool use-patronen
Hoe de agentpatronen ReAct, Plan-and-Execute en LangGraph werken — tooldefinitie, aanroep en resultaatverwerking — en waar injectie plaatsvindt in elke architectuur.
Veelvoorkomende AI-deploymentpatronen en aanvalsoppervlakken
Analyse van deploymentpatronen — chatbots, copilots, autonome agents, batchverwerking en fine-tuned modellen — met hun unieke aanvalsoppervlakken en beveiligingsoverwegingen.
Lab: het aanvalsoppervlak van een AI-systeem in kaart brengen
Praktijklab dat je door de verkenning van een AI-systeem leidt — componenten identificeren, gegevensstromen in kaart brengen, tools inventariseren en het aanvalsoppervlak documenteren.
CTF: Code Gen Exploit
Manipulate AI code generation to produce vulnerable, backdoored, or malicious code. Explore how prompt manipulation influences code security, from subtle vulnerability injection to full backdoor insertion.
CTF: RAG Heist
Extract sensitive information from a Retrieval-Augmented Generation system by exploiting retrieval mechanisms, document parsing, embedding manipulation, and context window management vulnerabilities.
AI-red team-labs voor gevorderden
Overview of intermediate-level hands-on labs covering agent exploitation, RAG poisoning, multi-turn attacks, automated campaigns, and advanced injection techniques for AI red teaming.
Geavanceerde encoding chain-aanvallen
Chain multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
Lab: geavanceerd misbruik van function calling
Exploit advanced function calling patterns including nested calls, parallel execution, and schema manipulation.
Lab: geautomatiseerde generatie van adversarial prompts
Build automated systems for generating adversarial prompts using mutation, crossover, and selection strategies.
Manipulatie van agentgeheugen
Exploit persistent memory in LLM agents to plant false context that persists across sessions.
Lab: Agent Prompt Leaking
Praktijklab voor het extraheren van systeemprompts uit tool-gebruikende AI-agents door tool-interacties, observatiekanalen en redeneren in meerdere stappen uit te buiten om beschermde instructies te lekken.
Lab: beveiligingstesten van agent-workflows
Test multi-step agent workflows for injection points, state manipulation, and unauthorized action execution.
Lab: API-misbruikpatronen
Discover and exploit API abuse patterns including parameter injection, header manipulation, and endpoint confusion.
API-parameterfuzzing voor LLM's
Systematically fuzz LLM API parameters to discover unexpected behaviors and bypasses.
Lab: beveiligingslekken bij API-versionering
Exploit version differences in LLM APIs to find deprecated endpoints and backward-compatible safety bypasses.
Lab: geautomatiseerde red team-pijplijn
Hands-on lab for building a continuous AI red team testing pipeline using promptfoo, GitHub Actions, and automated attack generation to catch safety regressions before deployment.
Lab: geautomatiseerd red team-testen
Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.
Lab: effectiviteit van canary tokens testen
Deploy and test canary tokens in system prompts to evaluate their effectiveness at detecting prompt extraction.
Lab: misbruik van chain-of-thought
Exploit chain-of-thought reasoning to leak internal model reasoning, bypass safety filters, and manipulate decision processes.
Lab: CI/CD-pijplijnbeveiliging voor LLM-apps
Test CI/CD pipeline security for LLM applications including prompt template injection, model artifact tampering, and deployment attacks.
Lab: geavanceerde classifier-ontwijking
Evade advanced input classifiers using adversarial perturbation, semantic paraphrasing, and gradient-free optimization.
Lab: ontwijking van ML-classifiers
Develop payloads that evade machine learning-based input classifiers through adversarial text perturbation.
Lab: beveiligingstesten van cloud-AI-platforms
Test cloud AI platform configurations including AWS Bedrock guardrails, Azure content filters, and GCP safety settings.
Lab: beveiligingstesten van codegeneratie
Test LLM code generation for insecure patterns, injection vulnerabilities, and code execution safety issues.
Lab: grondbeginselen van AI-compliancetesten
Test LLM applications against OWASP LLM Top 10 and NIST AI RMF requirements using structured test procedures.
Lab: optimalisatie van aanvallen op het contextvenster
Optimize prompt injection placement within the context window to maximize attack effectiveness using attention dynamics.
Lab: poisoning van het context window
Exploit context window management to inject persistent adversarial content that influences future model responses.
Lab: conversation splitting-aanval
Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.
Poisoning van de conversatie-state
Exploit conversation history manipulation to alter model behavior across turns.
Cross-context injection
Inject prompts that persist across separate conversation contexts in shared deployments.
Generatie van custom jailbreak-wordlists
Generate targeted jailbreak wordlists using mutation strategies and seed prompts.
Lab: kanalen voor data-exfiltratie
Hands-on lab for extracting data from AI systems through markdown image rendering, invisible links, tool call parameters, and other covert exfiltration channels.
Lab: kanalen voor data-exfiltratie (lab voor gevorderden)
Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.
Data-exfiltratie via side channels
Extract data from LLM systems using timing, token probability, and formatting side channels.
Verdediging-bypassketens bouwen
Combine multiple bypass techniques into reliable chains that defeat layered defenses.
Lab: de effectiviteit van verdedigingen meten
Hands-on lab for quantifying AI guardrail robustness using attack success rates, evasion metrics, false positive rates, and statistical analysis of defense performance.
Lab: opzet voor regressietesten van verdedigingen
Build a regression testing framework to continuously verify that LLM defenses remain effective against known attack patterns.
Lab: de effectiviteit van verdedigingen testen
Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.
Lab: op documenten gebaseerde RAG injection
Inject adversarial content into documents that will be processed by a RAG system to influence model responses.
Lab: geavanceerde embedding-manipulatie
Manipulate embedding vectors to achieve target similarity scores for RAG poisoning and retrieval manipulation.
Praktische manipulatie van embeddings
Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.
Embedding-gelijkenisaanvallen
Manipulate text to achieve target embedding similarity scores for retrieval poisoning.
Lab: encoding omzeilen
Hands-on lab for systematically testing Base64, ROT13, Unicode homoglyph, hexadecimal, and combined encoding bypasses against LLM safety filters with measured success rates.
Beveiligingsonderzoek van de fine-tuning-API
Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.
Lab: testen van de veiligheidsimpact van fine-tuning
Measure how fine-tuning affects model safety by comparing pre and post fine-tuning safety benchmark scores.
Lab: technieken om guardrails te omzeilen
Practice guardrail bypass techniques against NeMo Guardrails, LLM Guard, and custom classifier-based defenses.
Fingerprinting van guardrails
Systematically map the rules and thresholds of input/output guardrail systems.
Lab: extractie van guardrail-regels
Systematically extract the rules and decision boundaries of guardrail systems through targeted probing.
Lab: LLM-hallucinaties misbruiken
Exploit hallucination tendencies to trigger fabricated tool calls, invented API endpoints, and false fact injection.
Lab: simulatie van AI-incidentrespons
Practice AI incident response procedures through a simulated prompt injection incident with escalation and containment.
Lab: indirecte prompt injection
Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.
Indirecte injection via webcontent
Plant prompt injection payloads in web pages consumed by RAG-enabled LLM applications.
Lab: het volgen van instructies meten
Quantitatively measure instruction following compliance to identify where models prioritize competing instructions.
Manipulatie van een LLM-judge
Craft responses that exploit LLM-as-judge evaluation patterns to achieve high safety scores while embedding harmful content.
Lab: log probability-analyse voor aanvallen
Use token log probabilities to extract information about training data, system prompts, and model internals.
Lab: geavanceerd testen van het MCP-protocol
Perform advanced security testing of MCP servers including transport analysis, capability enumeration, and tool poisoning.
Tool-poisoning-aanval in MCP
Exploit MCP tool descriptions to inject instructions that redirect agent behavior.
Lab: fuzzing van het Model Context Protocol
Fuzz MCP protocol messages to discover parsing errors, validation gaps, and unexpected behavior in MCP servers.
Praktisch lab voor model-fingerprinting
Fingerprint the underlying model behind an API endpoint to identify its type, version, and configuration.
Lab: opzet van LLM-beveiligingsmonitoring
Deploy a comprehensive security monitoring system for LLM applications with anomaly detection and alerting.
Lab: communicatie tussen agents testen
Test security of inter-agent communication in multi-agent systems including message injection and trust boundary violations.
Lab voor meertalige veiligheidsbypass
Exploit gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.
Meertalige injectie-aanvallen
Exploit language switching and low-resource language gaps to bypass safety training.
Chaining van aanvallen over meerdere modellen
Chain attacks across multiple LLM models in a pipeline to bypass per-model defenses.
Lab: beveiligingstesten met vergelijking over meerdere modellen
Compare security postures across multiple LLM providers by running identical attack suites and analyzing differential responses.
Lab voor aanvallen op multi-modelpijplijnen
Attack a pipeline where multiple models process data sequentially, exploiting trust between pipeline stages.
Lab: state tracking-aanvallen over meerdere beurten
Exploit conversation state tracking to build persistent attack context across multiple conversation turns.
Lab: multimodale injection
Hands-on lab for embedding text instructions in images to exploit vision-enabled LLMs. Learn to craft visual prompt injections, test OCR-based attacks, and evaluate multimodal safety boundaries.
Lab: multimodale injection (lab voor gevorderden)
Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.
Lab: multimodaal beveiligingstesten voor gevorderden
Test multimodal LLMs with crafted images containing embedded text, adversarial perturbations, and visual injection payloads.
Lab voor ontdekking van uitvoerkanalen
Discover alternative output channels in LLM applications that bypass standard output filtering.
Lab: misbruik van het uitvoerschema
Exploit structured output schemas to force models into revealing information through required field population.
Lab: side-channel-aanvallen op prompt-caching
Exploit prompt caching mechanisms to detect cached prompts and extract information through timing side channels.
Lab voor technieken van promptlekkage
Practice multiple system prompt extraction techniques and measure their effectiveness across different targets.
Geautomatiseerde redteaming met PyRIT
Use Microsoft's PyRIT framework to run automated multi-turn attack campaigns.
Lab: geavanceerd beveiligingstesten van RAG
Test RAG systems for chunking exploitation, reranking manipulation, and cross-document injection attacks.
Vergiftiging van RAG-context
Poison a vector database to inject adversarial content into RAG retrieval results.
Kaping van de redeneerketen
Hijack the model's chain-of-thought reasoning to redirect its conclusions toward attacker-controlled objectives.
Lab: misbruik van de rolhiërarchie
Exploit multi-role LLM architectures by manipulating the instruction hierarchy between system, user, and assistant roles.
Lab: benchmarking van veiligheidsfilters
Benchmark safety filters across providers using standardized test suites to compare detection rates and false positives.
Lab voor het maken van semantische misleiding
Craft semantically coherent prompts that conceal adversarial intent from both human reviewers and ML classifiers.
Het maken van semantische injectie
Craft semantically coherent injections that evade both classifiers and human review.
Lab: manipulatie van semantisch zoeken
Manipulate semantic search results by crafting documents that achieve target similarity scores against specific queries.
Lab: sessie-isolatie testen
Test session isolation in multi-tenant LLM deployments to detect cross-session data leakage and context contamination.
Lab voor misbruik van sessiestatus
Exploit session management vulnerabilities in LLM applications to access other users data or escalate privileges.
Lab: manipulatie van streaming-responses
Exploit streaming response mechanisms to bypass output filters that only check complete responses.
Veiligheidsbypass via gestructureerde uitvoer
Use JSON mode and structured output constraints to extract information that would be refused in free-form responses.
Misbruik van gestructureerde uitvoer
Exploit JSON mode and structured output constraints to bypass safety filters.
Lab: supply chain-audit
Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.
Lab: scan van de ML-supply chain
Hands-on lab for auditing machine learning model dependencies, detecting malicious packages in ML pipelines, and scanning model files for backdoors and supply chain threats.
Lab: reconstructie van de system prompt (lab voor gevorderden)
Reconstruct full system prompts through iterative partial extraction and cross-referencing multiple leakage vectors.
Geavanceerde token smuggling
Use Unicode homoglyphs, zero-width characters, and tokenizer quirks to smuggle payloads.
Lab: injection-aanvallen via toolresultaten
Inject adversarial content through tool call results to poison model reasoning and redirect subsequent actions.
Lab: bypass-aanvallen via Unicode-normalisatie
Exploit Unicode normalization differences between input validation and model processing to smuggle injection payloads.
Lab: toegangscontrole van een vectordatabase testen
Test vector database access controls for bypass vulnerabilities including namespace traversal and filter manipulation.
Agent goal-hijacking
Redirect an AI agent's objectives through carefully crafted inputs that override its primary task.
Assistant prefill injection-aanvallen
Exploit assistant message prefilling to prime model responses and bypass safety alignment.
Aanvallen via manipulatie van chatgeschiedenis
Inject false messages into conversation history to alter model behavior in subsequent turns.
Grenzen van de input-classifier in kaart brengen
Systematically probe input classifiers to map their decision boundaries and find bypass regions.
Misbruik van de grenzen van het context window
Exploit the boundaries of context windows to push safety instructions beyond the model's attention.
Implementatie van de Crescendo-aanval
Implement Microsoft's Crescendo multi-turn escalation attack with automated conversation management.
Cross-plug-in exploitatieketens koppelen
Chain exploits across multiple plugins/tools in an agent system to achieve multi-step attacks.
Manipulatie van function schema's
Exploit function calling schemas to inject instructions through parameter descriptions and enums.
Custom garak-probes bouwen
Develop custom probe modules for NVIDIA Garak to test organization-specific vulnerability patterns.
Detectie van guardrails op basis van latency
Use timing side channels to identify and characterize guardrail implementations in LLM applications.
Custom veiligheidsevaluaties met Inspect AI
Build custom safety evaluation tasks in Inspect AI to test application-specific vulnerability classes.
Vergelijkend testen met JailbreakBench
Run standardized jailbreak benchmarks across multiple models and compare vulnerability profiles.
Veiligheid omzeilen in JSON-modus
Exploit structured JSON output modes to bypass safety filters that only check plain-text responses.
Ontwijkingstechnieken voor LLM Guard
Develop evasion techniques against LLM Guard input scanners and output detectors.
Many-shot jailbreaking implementeren
Implement Anthropic's many-shot jailbreaking technique with scaling analysis across conversation lengths.
Data-exfiltratie via MCP-resources (lab voor gevorderden)
Exploit MCP resource URIs to access and exfiltrate data beyond intended scope boundaries.
Misbruik van meertalige veiligheidshiaten
Systematically identify and exploit safety training gaps across low-resource languages.
Misbruik van state tracking over meerdere beurten
Exploit state management in multi-turn conversations to gradually shift model behavior over many turns.
Technieken voor het omzeilen van NeMo Guardrails
Test and bypass NVIDIA NeMo Guardrails configurations using known evasion techniques.
Uitvoerfilter omzeilen via encoding
Bypass output safety filters by instructing the model to encode responses in Base64, hex, or custom schemes.
PDF-documentinjectie voor RAG-systemen
Craft adversarial PDF documents that inject instructions when processed by RAG document loaders.
Variabele-injectie in prompttemplates
Inject adversarial content through template variable expansion in prompt management systems.
Aangepaste aanvalsstrategieën in PyRIT
Build custom attack strategies and converters in Microsoft's PyRIT framework for targeted campaigns.
Campagne voor RAG-documentinjectie
Design and execute a document injection campaign against a RAG-powered application with vector search.
Vergiftiging van semantisch zoeken
Craft adversarial documents that rank highly in semantic search for targeted queries in RAG systems.
Exfiltratie via streaming responses
Exploit streaming API responses to exfiltrate data that might be caught by batch output filters.
Extractie van de systeemprompt via token probabilities
Exploit token probability APIs to reconstruct hidden system prompts character by character.
Manipulatie van tokengrenzen
Exploit tokenizer-specific behavior by crafting inputs that split across token boundaries in unexpected ways.
Injectie-aanvallen via toolresultaten
Craft malicious tool return values that inject instructions back into the model's reasoning chain.
Tekst injecteren in VLM's via afbeeldingen
Embed adversarial text in images that vision-language models read and follow as instructions.
Lab: data-exfiltratie via webhooks
Test data exfiltration paths through LLM-triggered webhooks, API callbacks, and external URL rendering.
Simulatie: AI-beoordeling van een startup
Red team a startup's AI-powered product with limited scope and budget, making pragmatic tradeoffs between thoroughness and time constraints.
Code injection via Markdown
Injecting executable payloads through markdown rendering in LLM outputs, exploiting the gap between text generation and content rendering in web-based LLM interfaces.
Context window stuffing
Techniques for filling the LLM context window to push system instructions out of active memory, manipulating token budgets to dilute or displace defensive prompts.
Delimiter escape-aanvallen
Techniques for escaping delimiters used to separate system and user content in LLM applications, breaking out of sandboxed input regions to inject instructions.
Ontwijking op basis van encoding
Using base64, ROT13, hexadecimal, Unicode, and other encoding schemes to evade input detection systems and bypass content filters in LLM applications.
Few-shot injection
Using crafted few-shot examples within user input to steer LLM behavior toward unintended outputs, exploiting in-context learning to override safety training.
Technieken voor payloadobfuscatie
Methods for disguising prompt injection payloads through encoding, splitting, substitution, and other obfuscation techniques to bypass input filters and detection systems.
Role-play-injectie
Using fictional scenarios, character role-play, and narrative framing to bypass LLM safety filters by having the model operate within a permissive fictional context.
Injectie via vertaling
Using translation requests and low-resource languages to bypass content filters, exploiting the uneven distribution of safety training across languages.
Het creëren van een virtuele persona
Creating persistent alternate personas within LLM conversations to bypass safety training, establishing character identities that override the model's default behavioral constraints.