# hands-on
標記為「hands-on」的 68 篇文章
技能驗證概覽
AI 紅隊計時技能驗證實驗室概覽,包含格式、通過/失敗標準與準備指引。
技能驗證:代理利用(評量)
限時技能驗證實驗室:在 25 分鐘內透過操控工具使用利用代理系統執行未授權動作。
技能驗證:防禦實作
限時技能驗證實驗室:在 45 分鐘內建構通過自動化攻擊測試的可運作護欄系統。
技能驗證:越獄
限時技能驗證實驗室:在 30 分鐘內使用越獄技術繞過防禦 AI 系統的安全措施。
技能驗證:提示詞注入(評估)
計時技能驗證實驗室:使用提示詞注入技術在 15 分鐘內從防禦 AI 系統萃取系統提示詞。
技能驗證:偵察
計時技能驗證實驗室:在 20 分鐘內透過識別模型、萃取設定並描繪能力,剖析未知的 AI 系統。
技能驗證:報告撰寫
限時技能驗證實驗室:在 30 分鐘內從提供的證據撰寫專業 AI 紅隊發現報告。
實作:探索嵌入空間
使用 Python 視覺化嵌入空間、測量語意相似度,並展示如何打造與目標查詢匹配的對抗性文件之動手實作。
實作:音訊對抗樣本
動手實作 for crafting 對抗性 audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
實作:雲端 AI 安全評估
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
Lab: Custom Test Harness for Specific Applications
建構 a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
Lab: Federated Learning Poisoning Attack
動手實作 for understanding and simulating 投毒 attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted 梯度 updates.
實作:紫隊演練
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
Lab: Transfer Attack Development (Advanced Lab)
開發 對抗性 attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box 漏洞利用.
Lab: Build Your First Defense (Beginner Lab)
Create a simple input filter that blocks common 提示詞注入 patterns, then test it against the attack 技術 you have learned in previous labs.
實作:模型比較
測試 the same attack 技術 against different 語言模型 and compare their safety behaviors, refusal patterns, and 漏洞 profiles.
實作:上下文操控
Manipulate the conversational context to change model behavior, using 技術 like context flooding, false history injection, and persona shifting.
實作:防禦繞過基礎
學習 basic 技術 to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
實作:分隔符逃脫攻擊
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
Lab: Ethical Red Teaming
Practice responsible AI 紅隊演練 with proper documentation, scope management, and ethical decision-making frameworks.
實作:你的第一個提示詞注入
動手實作 where you perform your first 提示詞注入 attacks against a chatbot, progressing from basic overrides to more sophisticated 技術.
實作:你的第一個越獄嘗試
Try basic 越獄 技術 against a local model using Ollama, learning the difference between 提示詞注入 and 越獄ing through hands-on experimentation.
實作:Garak 設定與首次掃描
Install and configure garak, the LLM 漏洞 scanner, then run your first automated scan against a local model to discover common vulnerabilities.
Lab: Injection Detection Tool
建構 a basic 提示詞注入 detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
實作:注入技術概覽
Survey and test ten fundamental 提示詞注入 技術 against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
Lab: Instruction Following Priority
測試 how 語言模型 prioritize conflicting instructions from 系統提示詞s, user messages, and embedded directives to understand the instruction hierarchy.
實作:多語言注入
測試 提示詞注入 技術 using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
實作:輸出格式利用
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
實作:輸出引導
學習 to steer LLM outputs into specific formats, tones, and content using injection 技術 that manipulate how the model structures and presents its responses.
實作:載荷製作
學習 to craft effective 提示詞注入 payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
Lab: Prompt Leaking Basics
學習 to extract 系統提示詞s from chatbots using common leaking 技術 including direct requests, indirect elicitation, and format manipulation.
Lab: Promptfoo Setup and First Eval
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
Lab: PyRIT Setup and First Attack
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated 紅隊演練, then run your first orchestrated attack against a local model.
實作:角色扮演攻擊
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
Lab: Mapping Safety Boundaries
系統性 discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.
實作:系統提示詞覆寫
探索 技術 to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
實作:對抗性後綴優化
實作 GCG-style 對抗性 suffix attacks that automatically discover 符元 sequences causing 語言模型 to comply with harmful requests. Covers 梯度-based optimization, transferability analysis, and defense evaluation.
實作:對齊壓力測試
Push language model 對齊 to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure 對齊 degradation curves, and map the boundaries of model compliance.
Lab: Build Agent Security Scanner
建構 an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific 攻擊面s that traditional LLM testing misses.
實作:建構 AI 模糊測試器
建構 a comprehensive fuzzing framework for large 語言模型 that 系統性地 generates malformed, unexpected, and 對抗性 inputs to discover crashes, unexpected behaviors, and safety bypasses.
Lab: Build Behavior Diff Tool
建構 a tool that 系統性地 compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
Lab: Build Guardrail Evaluator
建構 an automated framework for evaluating AI 護欄s and safety filters. 測試 input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.
Lab: Build Jailbreak Automation
建構 an automated 越獄 testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.
實作:湧現能力探測
系統性 test large 語言模型 for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. 建構 a structured probing framework for capability discovery.
Lab: Full-Stack AI Exploitation
串聯 vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. 學習 to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.
實作:Computer Use 代理攻擊
探索針對以 GUI 為本之 computer use 代理(透過視覺與動作原語與畫面互動)之攻擊向量的實作課程。
Lab: Deploy Honeypot AI
建構 and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. 學習 honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.
Lab: Multi-Agent Attack Coordination
動手實作 building and deploying coordinated multi-agent attack systems that divide 紅隊演練 tasks across specialized attacking agents.
實作:新型越獄研究
Systematic methodology for discovering new 越獄 技術 against large 語言模型. 學習 to identify unexplored 攻擊面s, develop novel attack vectors, and validate findings with scientific rigor.
實作:ML 流水線投毒
Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. 學習 to identify and exploit weaknesses across the full ML lifecycle.
實作:利用量化模型弱點
實作探索模型量化如何退化安全對齊之實作,含尋找並利用與精度相關漏洞之技術。
Lab: Model Registry Compromise
探索 技術 for compromising model registries and substituting malicious models into production pipelines. 學習 to detect model tampering, verify model provenance, and secure the model supply chain.
實作:RLHF 獎勵 hacking
探索如何鑽營用於 RLHF 對齊之獎勵模型——暴露獎勵代理訊號與實際安全目標之間缺口之實作。
Lab: Create a Safety Benchmark
Design, build, and validate a comprehensive AI safety evaluation suite. 學習 benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.
Lab: AI Watermark Detection & Removal
動手實作 exploring 技術 for detecting and removing statistical 浮水印s embedded in AI-generated text, and evaluating 浮水印 robustness.
實作實驗與動手練習
從入門到專家難度的漸進式動手實作練習,包含 CTF 挑戰與完整案件模擬,用於培養 AI 紅隊演練技能。
實驗室: Automated Red Team 測試
Set up promptfoo for continuous automated 安全測試 of AI applications,with custom evaluators與CI/CD integration.
實驗室: Data Exfiltration Channels (中階 實驗室)
萃取sensitive information from AI systems through various exfiltration channels,包括 crafted links,image tags,tool calls,side-channel leakage.
實驗室: 防禦 Effectiveness 測試
Systematically test與measure the robustness of AI guardrails using structured methodology,metrics,repeatable test suites.
實驗室: Indirect 提示詞 注入
注入instructions through external data sources,包括 documents,web pages,emails that a target AI system processes as context.
實驗室: Multimodal 注入 (中階 實驗室)
Embed 提示詞注入 instructions in images for vision-enabled models,testing how visual content can carry adversarial payloads.
實驗室: 供應鏈 Audit
Audit an ML project's dependencies for vulnerabilities,covering model files,Python packages,container images,training data provenance.
實作:打造音訊對抗範例
使用 Python 音訊處理建立對抗音訊範例之實作,針對 Whisper 轉譯注入文字。
實驗室:影片模型對抗性攻擊
使用 OpenCV 和 PyTorch 進行影格層面擾動,實作對抗性影片影格的實作實驗室。
實驗室:製作基於圖像的注入
建立基於圖像的提示詞注入、對 VLM 進行測試,以及衡量不同注入技術成功率的實作實驗室。
實作:攻擊聯邦學習
動手實作實驗,在使用 Flower 框架模擬的聯邦學習環境中實作模型投毒攻擊:拜占庭攻擊、模型替換,並量測攻擊影響。
實作:利用量化後的模型
動手實驗:比較不同量化等級下的攻擊成功率,在 FP16、INT8、INT4 上測試越獄,量測安全性退化,並設計量化感知的漏洞利用。
實作:投毒預訓練資料集
動手實驗——在公開可爬取資源中植入投毒內容,觀察對小型預訓練模型的影響與偵測機制。