# hands-on
標記為「hands-on」的 136 篇文章
Skill Verification Overview
Overview of timed skill verification labs for AI red teaming, including format, pass/fail criteria, and preparation guidance.
Skill Verification: Agent Exploitation (Assessment)
Timed skill verification lab: exploit an agent system to perform unauthorized actions within 25 minutes.
Skill Verification: Defense Implementation
Timed skill verification lab: build a working guardrail system that passes automated attack tests within 45 minutes.
Skill Verification: Jailbreaking
Timed skill verification lab: bypass safety measures on a defended AI system within 30 minutes using jailbreak techniques.
Skill Verification: Prompt Injection (Assessment)
Timed skill verification lab: extract a system prompt from a defended AI system within 15 minutes using prompt injection techniques.
Skill Verification: Reconnaissance
Timed skill verification lab: profile an unknown AI system in 20 minutes by identifying the model, extracting configuration, and mapping capabilities.
Skill Verification: Report Writing
Timed skill verification lab: write a professional AI red team finding report from provided evidence within 30 minutes.
Lab: Exploring Embedding Spaces
Hands-on lab using Python to visualize embedding spaces, measure semantic similarity, and demonstrate how adversarial documents can be crafted to match target queries.
Lab: Audio Adversarial Examples
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
Lab: Cloud AI Security Assessment
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
Lab: Custom Test Harness for Specific Applications
Build a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
Lab: Federated Learning Poisoning Attack
Hands-on lab for understanding and simulating poisoning attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted gradient updates.
Lab: Purple Team Exercise
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
Lab: Transfer Attack Development (Advanced Lab)
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
Lab: Build Your First Defense (Beginner Lab)
Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.
Lab: Model Comparison
Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.
Lab: Context Manipulation
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
Lab: Defense Bypass Basics
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
Lab: Delimiter Escape Attacks
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
Lab: Ethical Red Teaming
Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.
Lab: Your First Prompt Injection
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
Lab: Your First Jailbreak
Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.
Lab: Garak Setup and First Scan
Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.
Lab: Injection Detection Tool
Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
Lab: Injection Techniques Survey
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
Lab: Instruction Following Priority
Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.
Lab: Multi-Language Injection
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
Lab: Output Format Exploitation
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
Lab: Output Steering
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
Lab: Payload Crafting
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
Lab: Prompt Leaking Basics
Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.
Lab: Promptfoo Setup and First Eval
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
Lab: PyRIT Setup and First Attack
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.
Lab: Role-Play Attacks
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
Lab: Mapping Safety Boundaries
Systematically discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.
Lab: System Prompt Override
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
Lab: Adversarial Suffix Optimization
Implement GCG-style adversarial suffix attacks that automatically discover token sequences causing language models to comply with harmful requests. Covers gradient-based optimization, transferability analysis, and defense evaluation.
Lab: Alignment Stress Testing
Push language model alignment to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure alignment degradation curves, and map the boundaries of model compliance.
Lab: Build Agent Security Scanner
Build an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific attack surfaces that traditional LLM testing misses.
Lab: Build an AI Fuzzer
Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.
Lab: Build Behavior Diff Tool
Build a tool that systematically compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
Lab: Build Guardrail Evaluator
Build an automated framework for evaluating AI guardrails and safety filters. Test input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.
Lab: Build Jailbreak Automation
Build an automated jailbreak testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.
Lab: Emergent Capability Probing
Systematically test large language models for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. Build a structured probing framework for capability discovery.
Lab: Full-Stack AI Exploitation
Chain vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. Learn to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.
Lab: Computer Use Agent Exploitation
Hands-on lab exploring attack vectors against GUI-based computer use agents that interact with screens through vision and action primitives.
Lab: Deploy Honeypot AI
Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.
Lab: Multi-Agent Attack Coordination
Hands-on lab building and deploying coordinated multi-agent attack systems that divide red teaming tasks across specialized attacking agents.
Lab: Novel Jailbreak Research
Systematic methodology for discovering new jailbreak techniques against large language models. Learn to identify unexplored attack surfaces, develop novel attack vectors, and validate findings with scientific rigor.
Lab: ML Pipeline Poisoning
Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. Learn to identify and exploit weaknesses across the full ML lifecycle.
Lab: Exploiting Quantized Model Weaknesses
Hands-on lab exploring how model quantization degrades safety alignment, with techniques to find and exploit precision-related vulnerabilities.
Lab: Model Registry Compromise
Explore techniques for compromising model registries and substituting malicious models into production pipelines. Learn to detect model tampering, verify model provenance, and secure the model supply chain.
Lab: RLHF Reward Hacking
Hands-on lab exploring how to game reward models used in RLHF alignment, exposing gaps between reward proxy signals and actual safety objectives.
Lab: Create a Safety Benchmark
Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.
Lab: AI Watermark Detection & Removal
Hands-on lab exploring techniques for detecting and removing statistical watermarks embedded in AI-generated text, and evaluating watermark robustness.
Labs & Hands-On Practice
Progressive hands-on lab exercises spanning beginner to expert difficulty, including CTF challenges and full engagement simulations for AI red teaming skill development.
Lab: Automated Red Team Testing
Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.
Lab: Data Exfiltration Channels (Intermediate Lab)
Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.
Lab: Defense Effectiveness Testing
Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.
Lab: Indirect Prompt Injection
Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.
Lab: Multimodal Injection (Intermediate Lab)
Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.
Lab: Supply Chain Audit
Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.
Lab: Crafting Audio Adversarial Examples
Hands-on lab creating adversarial audio examples using Python audio processing, targeting Whisper transcription with injected text.
Lab: Video Model Adversarial Attacks
Hands-on lab crafting adversarial video frames using frame-level perturbation with OpenCV and PyTorch for video model exploitation.
Lab: Crafting Image-Based Injections
Hands-on lab for creating image-based prompt injections, testing against VLMs, and measuring success rates across different injection techniques.
Lab: Attacking Federated Learning
Hands-on lab implementing model poisoning attacks in a simulated federated learning setup using the Flower framework: Byzantine attacks, model replacement, and measuring attack impact.
Lab: Exploiting Quantized Models
Hands-on lab comparing attack success rates across quantization levels: testing jailbreaks on FP16 vs INT8 vs INT4, measuring safety degradation, and crafting quantization-aware exploits.
Lab: Poisoning a Training Dataset
Hands-on lab demonstrating dataset poisoning and fine-tuning to show behavioral change, with step-by-step Python code, backdoor trigger measurement, and troubleshooting guidance.
技能驗證概覽
AI 紅隊計時技能驗證實驗室概覽,包含格式、通過/失敗標準與準備指引。
技能驗證:代理利用(評量)
限時技能驗證實驗室:在 25 分鐘內透過操控工具使用利用代理系統執行未授權動作。
技能驗證:防禦實作
限時技能驗證實驗室:在 45 分鐘內建構通過自動化攻擊測試的可運作護欄系統。
技能驗證:越獄
限時技能驗證實驗室:在 30 分鐘內使用越獄技術繞過防禦 AI 系統的安全措施。
技能驗證:提示詞注入(評估)
計時技能驗證實驗室:使用提示詞注入技術在 15 分鐘內從防禦 AI 系統萃取系統提示詞。
技能驗證:偵察
計時技能驗證實驗室:在 20 分鐘內透過識別模型、萃取設定並描繪能力,剖析未知的 AI 系統。
技能驗證:報告撰寫
限時技能驗證實驗室:在 30 分鐘內從提供的證據撰寫專業 AI 紅隊發現報告。
實作:探索 Embedding 空間
使用 Python 視覺化 embedding 空間、量測語意相似度,並展示如何打造與目標查詢匹配之對抗文件之實作。
實驗室: Audio Adversarial Examples
Hands-on lab for crafting adversarial audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
實驗室: Cloud AI 安全 評量
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
實驗室: Custom Test Harness for Specific Applications
Build a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
實驗室: Federated Learning 投毒 攻擊
Hands-on lab for understanding and simulating poisoning attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted gradient updates.
實驗室: Purple Team 練習
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
實驗室: Transfer 攻擊 Development (進階 實驗室)
Develop adversarial attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box exploitation.
實驗室: Build Your First 防禦 (初階 實驗室)
Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.
實驗室: 模型 Comparison
Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.
實驗室: Context Manipulation
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
實驗室: 防禦 Bypass Basics
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
實驗室: Delimiter Escape 攻擊s
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
實驗室: Ethical 紅隊演練
Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.
實驗室: Your First 提示詞注入
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
實驗室: Your First 越獄
Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.
實驗室: Garak Setup and First Scan
Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.
實驗室: Injection Detection 工具
Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
實驗室: Injection Techniques Survey
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
實驗室: Instruction Following Priority
Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.
實驗室: Multi-Language Injection
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
實驗室: Output Format 利用ation
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
實驗室: Output Steering
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
實驗室: Payload Crafting
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
實驗室: Prompt Leaking Basics
Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.
實驗室: Promptfoo Setup and First Eval
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
實驗室: PyRIT Setup and First 攻擊
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.
實驗室: Role-Play 攻擊s
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
實驗室: Mapping Safety Boundaries
Systematically discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.
實驗室: System Prompt Override
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
實驗室: Adversarial Suffix Optimization
Implement GCG-style adversarial suffix attacks that automatically discover token sequences causing language models to comply with harmful requests. Covers gradient-based optimization, transferability analysis, and defense evaluation.
實驗室: Alignment Stress Testing
Push language model alignment to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure alignment degradation curves, and map the boundaries of model compliance.
實驗室: Build 代理 安全 Scanner
Build an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific attack surfaces that traditional LLM testing misses.
實驗室: Build an AI Fuzzer
Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.
實驗室: Build Behavior Diff 工具
Build a tool that systematically compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
實驗室: Build Guardrail Evaluator
Build an automated framework for evaluating AI guardrails and safety filters. Test input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.
實驗室: Build 越獄 Automation
Build an automated jailbreak testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.
實驗室: Emergent Capability Probing
Systematically test large language models for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. Build a structured probing framework for capability discovery.
實驗室: Full-Stack AI 利用ation
Chain vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. Learn to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.
實作:Computer Use 代理攻擊
探索針對以 GUI 為本之 computer use 代理(透過視覺與動作原語與畫面互動)之攻擊向量的實作課程。
實驗室: Deploy Honeypot AI
Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.
實驗室: Multi-代理 攻擊 Coordination
Hands-on lab building and deploying coordinated multi-agent attack systems that divide red teaming tasks across specialized attacking agents.
實驗室: Novel 越獄 Research
Systematic methodology for discovering new jailbreak techniques against large language models. Learn to identify unexplored attack surfaces, develop novel attack vectors, and validate findings with scientific rigor.
實驗室: ML Pipeline 投毒
Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. Learn to identify and exploit weaknesses across the full ML lifecycle.
實作:利用量化模型弱點
實作探索模型量化如何退化安全對齊之實作,含尋找並利用與精度相關漏洞之技術。
實驗室: 模型 Registry Compromise
Explore techniques for compromising model registries and substituting malicious models into production pipelines. Learn to detect model tampering, verify model provenance, and secure the model supply chain.
實作:RLHF 獎勵 hacking
探索如何鑽營用於 RLHF 對齊之獎勵模型——暴露獎勵代理訊號與實際安全目標之間缺口之實作。
實驗室: Create a Safety Benchmark
Design, build, and validate a comprehensive AI safety evaluation suite. Learn benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.
實驗室: AI Watermark Detection & Removal
Hands-on lab exploring techniques for detecting and removing statistical watermarks embedded in AI-generated text, and evaluating watermark robustness.
實驗與動手練習
從入門到專家難度的漸進式實作實驗,包含 CTF 挑戰與完整案件模擬,用於 AI 紅隊演練技能發展。
實驗室: Automated 紅隊 Testing
Set up promptfoo for continuous automated security testing of AI applications, with custom evaluators and CI/CD integration.
實驗室: Data Exfiltration Channels (中階 實驗室)
Extract sensitive information from AI systems through various exfiltration channels including crafted links, image tags, tool calls, and side-channel leakage.
實驗室: 防禦 Effectiveness Testing
Systematically test and measure the robustness of AI guardrails using structured methodology, metrics, and repeatable test suites.
實驗室: Indirect 提示詞注入
Inject instructions through external data sources including documents, web pages, and emails that a target AI system processes as context.
實驗室: Multimodal Injection (中階 實驗室)
Embed prompt injection instructions in images for vision-enabled models, testing how visual content can carry adversarial payloads.
實驗室: Supply Chain Audit
Audit an ML project's dependencies for vulnerabilities, covering model files, Python packages, container images, and training data provenance.
實作:打造音訊對抗範例
使用 Python 音訊處理建立對抗音訊範例之實作,針對 Whisper 轉譯注入文字。
實驗室: Video 模型 Adversarial 攻擊s
Hands-on lab crafting adversarial video frames using frame-level perturbation with OpenCV and PyTorch for video model exploitation.
實驗室: Crafting Image-Based Injections
Hands-on lab for creating image-based prompt injections, testing against VLMs, and measuring success rates across different injection techniques.
實驗室: 攻擊ing Federated Learning
Hands-on lab implementing model poisoning attacks in a simulated federated learning setup using the Flower framework: Byzantine attacks, model replacement, and measuring attack impact.
實驗室: 利用ing Quantized 模型s
Hands-on lab comparing attack success rates across quantization levels: testing jailbreaks on FP16 vs INT8 vs INT4, measuring safety degradation, and crafting quantization-aware exploits.
實驗室: 投毒 a 訓練 Dataset
Hands-on lab demonstrating dataset poisoning and fine-tuning to show behavioral change, with step-by-step Python code, backdoor trigger measurement, and troubleshooting guidance.