# hands-on
68 artikelengetagd met “hands-on”
Overzicht van vaardigheidsverificatie
Overview of timed skill verification labs for AI red teaming, including format, pass/fail criteria, and preparation guidance.
Vaardigheidsverificatie: misbruik van agents (beoordeling)
Timed skill verification lab: exploit an agent system to perform unauthorized actions within 25 minutes.
Vaardigheidsverificatie: implementatie van verdediging
Timed skill verification lab: build a working guardrail system that passes automated attack tests within 45 minutes.
Vaardigheidsverificatie: jailbreaken
Timed skill verification lab: bypass safety measures on a defended AI system within 30 minutes using jailbreak techniques.
Vaardigheidsverificatie: prompt injection (beoordeling)
Timed skill verification lab: extract a system prompt from a defended AI system within 15 minutes using prompt injection techniques.
Vaardigheidsverificatie: reconnaissance
Timed skill verification lab: profile an unknown AI system in 20 minutes by identifying the model, extracting configuration, and mapping capabilities.
Vaardigheidsverificatie: rapporten schrijven
Timed skill verification lab: write a professional AI red team finding report from provided evidence within 30 minutes.
Lab: embeddingruimtes verkennen
Praktisch lab met Python om embeddingruimtes te visualiseren, semantische gelijkenis te meten en te demonstreren hoe adversarial documenten kunnen worden gemaakt die matchen met doel-queries.
Lab: adversarial voorbeelden in audio
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
Lab: beveiligingsbeoordeling van cloud-AI
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
Lab: eigen test-harness voor specifieke applicaties
Build a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
Lab: poisoning-aanval op federated learning
Hands-on lab for understanding and simulating poisoning attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted gradient updates.
Lab: purple team-oefening
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
Lab: ontwikkeling van transfer-aanvallen (lab voor gevorderden)
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
Lab: bouw je eerste verdediging (beginnerlab)
Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.
Lab: modelvergelijking
Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.
Lab: contextmanipulatie
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
Lab: grondbeginselen van het omzeilen van verdedigingen
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
Lab: delimiter escape-aanvallen
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
Lab: ethisch redteamen
Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.
Lab: je eerste prompt injection
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
Lab: je eerste jailbreak
Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.
Lab: garak opzetten en je eerste scan
Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.
Lab: tool voor injection-detectie
Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
Lab: overzicht van injection-technieken
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
Lab: prioriteit bij het volgen van instructies
Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.
Lab: meertalige injection
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
Lab: misbruik van het uitvoerformaat
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
Lab: uitvoersturing
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
Lab: payloads maken
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
Lab: grondbeginselen van prompt leaking
Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.
Lab: promptfoo opzetten en je eerste evaluatie
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
Lab: PyRIT opzetten en je eerste aanval
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.
Lab: rollenspel-aanvallen
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
Lab: Veiligheidsgrenzen in kaart brengen
Ontdek systematisch wat een taalmodel wel en niet wil doen door zijn veiligheidsgrenzen over meerdere categorieën te onderzoeken en de resultaten te documenteren.
Lab: system prompt overschrijven
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
Lab: optimalisatie van adversarial suffixes
Implement GCG-style adversarial suffix attacks that automatically discover token sequences causing language models to comply with harmful requests. Covers gradient-based optimization, transferability analysis, and defense evaluation.
Lab: stresstesten van alignment
Push language model alignment to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure alignment degradation curves, and map the boundaries of model compliance.
Lab: bouw een beveiligingsscanner voor agents
Build an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific attack surfaces that traditional LLM testing misses.
Lab: bouw een AI-fuzzer
Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.
Lab: bouw een tool voor gedrags-diffing
Build a tool that systematically compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
Lab: bouw een guardrail-evaluator
Build an automated framework for evaluating AI guardrails and safety filters. Test input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.
Lab: bouw jailbreak-automatisering
Build an automated jailbreak testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.
Lab: onderzoek van emergente capaciteiten
Systematically test large language models for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. Build a structured probing framework for capability discovery.
Lab: full-stack AI-misbruik
Chain vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. Learn to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.
Lab: misbruik van computer use-agents
Hands-on lab exploring attack vectors against GUI-based computer use agents that interact with screens through vision and action primitives.
Lab: rol een honeypot-AI uit
Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.
Lab: coördinatie van multi-agent-aanvallen
Hands-on lab building and deploying coordinated multi-agent attack systems that divide red teaming tasks across specialized attacking agents.
Lab: onderzoek naar nieuwe jailbreaks
Systematic methodology for discovering new jailbreak techniques against large language models. Learn to identify unexplored attack surfaces, develop novel attack vectors, and validate findings with scientific rigor.
Lab: poisoning van de ML-pijplijn
Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. Learn to identify and exploit weaknesses across the full ML lifecycle.
Lab: zwakheden van gekwantiseerde modellen misbruiken
Hands-on lab exploring how model quantization degrades safety alignment, with techniques to find and exploit precision-related vulnerabilities.
Lab: compromittering van het model registry
Explore techniques for compromising model registries and substituting malicious models into production pipelines. Learn to detect model tampering, verify model provenance, and secure the model supply chain.
Lab: reward hacking bij RLHF
Hands-on lab exploring how to game reward models used in RLHF alignment, exposing gaps between reward proxy signals and actual safety objectives.
Lab: maak een veiligheidsbenchmark
Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.
Lab: detectie en verwijdering van AI-watermerken
Hands-on lab exploring techniques for detecting and removing statistical watermarks embedded in AI-generated text, and evaluating watermark robustness.
Labs en hands-on oefenen
Progressive hands-on lab exercises spanning beginner to expert difficulty, including CTF challenges and full engagement simulations for AI red teaming skill development.
Lab: geautomatiseerd red team-testen
Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.
Lab: kanalen voor data-exfiltratie (lab voor gevorderden)
Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.
Lab: de effectiviteit van verdedigingen testen
Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.
Lab: indirecte prompt injection
Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.
Lab: multimodale injection (lab voor gevorderden)
Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.
Lab: supply chain-audit
Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.
Lab: Adversariële audiovoorbeelden vervaardigen
Praktijklab voor het creëren van adversariële audiovoorbeelden met Python-audioverwerking, gericht op Whisper-transcriptie met geïnjecteerde tekst.
Lab: Adversariële aanvallen op videomodellen
Praktisch lab waarin je adversariële videoframes maakt met perturbatie op frameniveau met OpenCV en PyTorch voor de exploitatie van videomodellen.
Lab: Op afbeeldingen gebaseerde injecties vervaardigen
Praktisch lab voor het maken van op afbeeldingen gebaseerde prompt-injecties, testen tegen VLM's en het meten van slagingspercentages over verschillende injectietechnieken.
Lab: Federated learning aanvallen
Praktisch lab dat modelvergiftigingsaanvallen implementeert in een gesimuleerde federated learning-opzet met het Flower-framework: Byzantijnse aanvallen, modelvervanging en het meten van aanvalsimpact.
Lab: Gequantiseerde modellen exploiteren
Praktijklab dat slagingskansen van aanvallen vergelijkt over quantisatieniveaus: jailbreaks testen op FP16 vs INT8 vs INT4, veiligheidsdegradatie meten, en quantisatiebewuste exploits ontwerpen.
Lab: een trainingsdataset vergiftigen
Praktisch lab dat datasetvergiftiging en fine-tuning demonstreert om gedragsverandering te tonen, met stapsgewijze Python-code, meting van de backdoor-trigger en troubleshooting-richtlijnen.