# expert
69 artikelengetagd met “expert”
Oefenexamen 3: Expert Red Team
Oefenexamen van 25 vragen op expertniveau over onderzoekstechnieken, automatisering, fine-tuning-aanvallen, toeleveringsketenbeveiliging en incidentrespons.
Uitgebreid oefenexamen expert 2
Second expert-level comprehensive exam covering frontier research, novel attacks, and advanced defense.
Oefenexamen expertcertificering
Expert-level practice exam covering frontier research, novel attacks, and advanced defense architectures.
Voorbereidingsgids voor het examen op expertniveau
Advanced study guide for expert-level assessments covering frontier research and novel techniques.
Capstone: red team van een agentic systeem
Red team a multi-agent system with MCP servers, function calling, and inter-agent communication, producing an attack tree and comprehensive findings report.
Capstone: volledige red team-engagement
Scope, plan, execute, and report a complete AI red team engagement against a multi-component AI application including chatbot, RAG, agent, and API layers.
Capstone: onderzoeksproject naar kwetsbaarheden
Identify and responsibly disclose a novel AI vulnerability class, producing an advisory-format writeup, proof of concept, and MITRE ATLAS mapping.
CTF najaar 2026: geavanceerd aanvalsonderzoek
An expert-level CTF requiring novel technique development, research-grade problem solving, and original vulnerability discovery in AI systems.
Lab: poisoning-aanval op federated learning
Hands-on lab for understanding and simulating poisoning attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted gradient updates.
CTF: agent escalation
Five-flag CTF challenge focused on exploiting agentic AI systems. Progress from basic tool abuse through privilege escalation, indirect injection, memory poisoning, and multi-agent chain attacks.
CTF: Alignment Breaker
Break the alignment of a heavily defended model with multiple defense layers. Requires combining advanced techniques including adversarial suffixes, multi-turn manipulation, and novel jailbreak approaches.
CTF: Boss Rush
Chain attacks across multiple AI models in sequence. Each model guards the next, requiring different attack techniques at each stage. Defeat all five models to extract the final flag in this ultimate red teaming challenge.
Expert AI-red team-labs
Advanced labs tackling cutting-edge AI security challenges including quantization exploits, reward hacking, agent exploitation, multi-agent attacks, and watermark removal.
Activation steering voor misbruik
Use activation steering techniques to manipulate model behavior at the representation level without prompt modification.
Lab: adversarial machine learning vanaf nul
Hands-on expert lab for implementing gradient-based adversarial attacks against language models from scratch without frameworks, building intuition for how adversarial perturbations exploit model gradients.
Lab: optimalisatie van adversarial suffixes
Implement GCG-style adversarial suffix attacks that automatically discover token sequences causing language models to comply with harmful requests. Covers gradient-based optimization, transferability analysis, and defense evaluation.
Lab: stresstesten van alignment
Push language model alignment to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure alignment degradation curves, and map the boundaries of model compliance.
Ontwijken van backdoor-detectie
Insert backdoors into fine-tuned models that evade state-of-the-art detection methods.
Lab: bouw een beveiligingsscanner voor agents
Build an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific attack surfaces that traditional LLM testing misses.
Lab: bouw een AI-fuzzer
Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.
Lab: bouw een tool voor gedrags-diffing
Build a tool that systematically compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
Lab: bouw een guardrail-evaluator
Build an automated framework for evaluating AI guardrails and safety filters. Test input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.
Lab: bouw jailbreak-automatisering
Build an automated jailbreak testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.
Lab: cross-architectuur transfer
Develop attacks on one model architecture that reliably transfer to fundamentally different architectures.
Lab: verdedigingsengineering
Expert-level lab for building and evaluating a production-grade multi-layer AI defense system including input classifiers, output monitors, semantic analysis, and adversarial robustness testing.
Misbruik van emergent gedrag
Identify and exploit emergent behaviors in frontier models that arise from scale and are not present in smaller models.
Lab: onderzoek van emergente capaciteiten
Systematically test large language models for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. Build a structured probing framework for capability discovery.
Lab: grenzen van formele verificatie
Explore the limits of formal verification for neural network behavior and identify gaps exploitable by adversaries.
Lab: full-stack AI-misbruik
Chain vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. Learn to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.
Lab: misbruik van computer use-agents
Hands-on lab exploring attack vectors against GUI-based computer use agents that interact with screens through vision and action primitives.
Lab: rol een honeypot-AI uit
Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.
Aanvalsontwerp gestuurd door interpreteerbaarheid
Use mechanistic interpretability to identify exploitable circuits and design targeted attacks.
Modelextractie via API-toegang
Extract a functionally equivalent model using only API query access.
Lab: misbruik van MoE-routing
Exploit Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.
Lab: coördinatie van multi-agent-aanvallen
Hands-on lab building and deploying coordinated multi-agent attack systems that divide red teaming tasks across specialized attacking agents.
Engineering van neurale backdoors
Engineer sophisticated neural backdoors that activate on specific trigger patterns while evading detection methods.
Lab: onderzoek naar nieuwe jailbreaks
Systematic methodology for discovering new jailbreak techniques against large language models. Learn to identify unexplored attack surfaces, develop novel attack vectors, and validate findings with scientific rigor.
Lab: poisoning van de ML-pijplijn
Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. Learn to identify and exploit weaknesses across the full ML lifecycle.
Lab: zwakheden van gekwantiseerde modellen misbruiken
Hands-on lab exploring how model quantization degrades safety alignment, with techniques to find and exploit precision-related vulnerabilities.
Misbruik van kwetsbaarheden door kwantisatie
Exploit behavioral differences between full-precision and quantized models.
Misbruik van chain-of-thought in redeneermodellen
Exploit extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.
Lab: compromittering van het model registry
Explore techniques for compromising model registries and substituting malicious models into production pipelines. Learn to detect model tampering, verify model provenance, and secure the model supply chain.
Aanvallen via representation engineering
Manipulate internal model representations to alter behavior without prompt modification.
Lab: methodologie voor kwetsbaarheidsonderzoek
Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, attack surface mapping, experimental design, validation protocols, and responsible disclosure.
Lab: reward hacking bij RLHF
Hands-on lab exploring how to game reward models used in RLHF alignment, exposing gaps between reward proxy signals and actual safety objectives.
Lab: maak een veiligheidsbenchmark
Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.
End-to-end aanvallen tijdens training
Execute a complete training-time attack from data poisoning through model deployment to triggered exploitation.
Activation patching voor gerichte gedragsaanpassing
Use activation patching techniques to modify specific model behaviors without retraining or fine-tuning.
Lab: detecteren van alignment faking
Implement detection methods for alignment faking behaviors where models behave differently during evaluation versus deployment.
Poisoning van de continuous learning-pipeline
Exploit continuous/online learning pipelines by injecting adversarial examples that gradually shift model behavior.
Framework voor het testen op deceptive alignment
Build a testing framework for detecting mesa-optimization and deceptive alignment in fine-tuned models.
Beveiligingsonderzoek van emergente capaciteiten
Design evaluations that discover security-relevant emergent capabilities in frontier language models.
Poisoning-aanval op federated learning
Execute model poisoning attacks in a federated learning setting through adversarial participant manipulation.
Formele verificatie van veiligheidseigenschappen van LLM's
Apply formal verification methods to prove or disprove safety properties of LLM-based applications.
Gradient-gebaseerde universele prompt-zoekopdracht
Implement gradient-based optimization to search for universal adversarial prompts that transfer across contexts.
Manipulatie-aanvallen op de KV-cache
Manipulate key-value caches in inference servers to inject or alter cached context across requests.
Mechanistische interpreteerbaarheid voor het ontdekken van exploits
Use mechanistic interpretability tools to discover exploitable circuits and features in transformer models.
Membership inference tegen LLM's in productie
Implement membership inference attacks to determine whether specific data was used in training an LLM.
Modelextractie via knowledge distillation
Extract a functionally equivalent model from a commercial API using systematic distillation queries.
Verspreiding van backdoors via model merging
Demonstrate how backdoors propagate through model merging techniques like TIES, DARE, and spherical interpolation.
Misbruik van MoE-routing
Exploit Mixture-of-Experts routing mechanisms to activate specific expert networks for adversarial purposes.
Vergiftiging van multi-agent-consensus
Poison multi-agent consensus mechanisms to manipulate collective decision-making in agent swarms.
Representation engineering voor gedragssturing
Use representation engineering to steer model behavior by manipulating activation vectors during inference.
Reward hacking in RLHF-pijplijnen
Craft adversarial training examples that exploit reward model weaknesses in RLHF training pipelines.
Veiligheid verwijderen via fine-tuning met minimale data
Determine the minimum number of fine-tuning examples needed to remove safety alignment from various model families.
Side-channel-aanvallen op speculative decoding
Exploit speculative decoding implementations to extract information about draft and verifier model behavior.
Reconstructie van de systeemprompt via token probabilities
Reconstruct complete system prompts using token-by-token probability analysis through API access.
Aanvallen om LLM-watermerken te verwijderen
Develop techniques to remove or corrupt watermarks embedded in LLM-generated text.
Lab: detectie en verwijdering van AI-watermerken
Hands-on lab exploring techniques for detecting and removing statistical watermarks embedded in AI-generated text, and evaluating watermark robustness.