# lab
標記為「lab」的 231 篇文章
實驗室:系統性繞過護欄
以結構化紅隊工作流對正式 AI 系統的輸入/輸出護欄進行有條理的探測、分類與繞過的實作實驗室。
實作:工具比較——同一目標,4 個工具
對同一目標模型比較 Garak、PyRIT、promptfoo 與 Inspect AI 之實作。評估各工具之覆蓋、速度、發現品質與實務取捨。
實作:探索嵌入空間
使用 Python 視覺化嵌入空間、測量語意相似度,並展示如何打造與目標查詢匹配的對抗性文件之動手實作。
實作:繪製 AI 系統的攻擊面
對 AI 系統進行偵察的實作課程——辨識元件、繪製資料流、枚舉工具,並撰寫攻擊面文件。
實驗室: Simulated Robot Control 利用ation
Hands-on lab exercises exploiting LLM-controlled robots in simulation: environment setup, injection attacks, safety bypass testing, and multi-step exploitation chains using PyBullet.
進階 RAG 操控實驗室
執行精密的RAG 操控,包括 嵌入空間攻擊,中繼資料投毒,檢索演算法作弊.
對抗性穩健度評估
建構完整的adversarial 穩健度評估 framework,用於評估模型安全姿態.
實作:生成對抗性後綴
實作 the Greedy Coordinate Gradient (GCG) algorithm to generate 對抗性 suffixes that cause 語言模型 to comply with harmful requests by appending optimized 符元 sequences.
實作:攻擊鏈構建
動手實作 for chaining three or more distinct vulnerabilities into a complete exploit sequence that achieves objectives impossible with any single technique alone.
實作:音訊對抗樣本
動手實作 for crafting 對抗性 audio perturbations that cause speech-to-text models and voice assistants to misinterpret spoken commands, demonstrating attacks on audio AI systems.
AutoDAN 實作 實驗室
實作the AutoDAN methodology for generating stealthy human-readable jailbreak prompts using LLM feedback.
Lab: Backdoor Detection in Fine-Tuned Models
分析 a fine-tuned language model to find and characterize an inserted 後門, using behavioral probing, activation analysis, and statistical testing 技術.
Lab: Continuous Automated Red Teaming Pipeline
建構 an end-to-end CART pipeline that continuously generates, executes, and scores 對抗性 attacks against LLM applications, with alerting and trend tracking.
實作:雲端 AI 安全評估
Conduct an end-to-end security assessment of a cloud-deployed AI service, covering API security, model vulnerabilities, data handling, and infrastructure configuration.
實驗室:雲端 AI 評估
動手實驗室,主題為conducting an 端對端 安全評估 of a 雲端部署 AI system,包括 基礎設施檢視,API 測試,模型安全 evaluation,資料流 analysis.
實作:容器化模型突破
探索自容器化 AI 應用逃逸至主機系統之技術,測試 ML 部署環境中之容器隔離邊界。
實作:跨模型遷移攻擊
測試 whether 越獄s discovered on one language model transfer effectively to others, building a systematic methodology for cross-model 漏洞 research.
Cross-Plugin 利用 鏈
鏈 exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.
Custom 安全 Classifier 訓練
Train a custom input safety classifier與then develop payloads that reliably evade it to underst與classifier limitations.
Lab: Building a Production Red Team Harness
建構 a full-featured, production-quality red team harness with multi-model support, async testing, structured result storage, and HTML reporting.
Lab: Custom Test Harness for Specific Applications
建構 a tailored testing framework for a specific AI application, with custom attack generators, domain-specific evaluators, and application-aware reporting.
防禦 vs Red Team War Game
Alternate between attacking與defending an LLM application to develop skills in both offensive與defensive operations.
實作:集成攻擊
Use multiple 語言模型 collaboratively to discover attack strategies that bypass any single model's defenses, leveraging model diversity for more effective 紅隊演練.
Lab: Federated Learning Poisoning Attack
動手實作 for understanding and simulating 投毒 attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted 梯度 updates.
Lab: Inserting a Fine-Tuning Backdoor
Advanced lab demonstrating how 微調 can insert hidden 後門s into 語言模型 that activate on specific trigger phrases while maintaining normal behavior otherwise.
Lab: Build an AI Fuzzing Harness
Create a systematic fuzzing framework for testing LLM boundaries, generating and mutating inputs to discover unexpected model behaviors and safety edge cases.
實作:GPU 側通道攻擊
Demonstrate information leakage through GPU memory residuals and timing side channels, showing how shared GPU infrastructure can expose sensitive model data.
Gradient-Guided Data 投毒
Use gradient information from open-source models to craft optimally poisoned training examples.
Lab: Chaining Guardrail Bypasses
Advanced lab on identifying, isolating, and chaining multiple 護欄 bypass 技術 to defeat layered defense systems in production LLM applications.
Lab: AI Incident Investigation
Investigate logs and artifacts from a compromised AI system to reconstruct the attack chain, identify the 漏洞 exploited, and determine the scope of the breach.
實作:推論伺服器漏洞利用
Attack vLLM, TGI, and Triton inference servers to discover information disclosure vulnerabilities, denial-of-service vectors, and configuration weaknesses in model serving infrastructure.
Lab: MCP Full Exploitation
動手實作 for conducting a complete Model Context Protocol server compromise including tool enumeration, permission escalation, cross-tool attacks, and data exfiltration through MCP channels.
Lab: Model Security Comparison
系統性 compare the safety posture of major 語言模型 using a standardized test suite, building quantitative security profiles for GPT-4, Claude, and Gemini.
模型 Distillation 安全 實驗室
萃取model capabilities through distillation techniques using only 黑盒 API access.
Lab: Model Serving Framework Attacks
Exploit vulnerabilities in TensorFlow Serving, TorchServe, and Triton Inference Server, targeting model loading, API endpoints, and management interfaces.
Lab: Multi-Model Comparative Red Teaming
測試 the same attack suite across GPT-4, Claude, Llama, and Gemini. Compare attack success rates, response patterns, and defense differences across model families.
Multi-Objective 攻擊 優化
Optimize attack payloads for multiple simultaneous objectives: jailbreaking,data extraction,防禦規避.
Lab: Advanced Multi-Turn Attacks
動手實作 for executing 20+ turn conversation steering strategies that gradually erode AI safety boundaries to extract protected information through patient, escalating dialogue.
Multimodal 攻擊 鏈 實驗室
鏈 attacks across text,image,structured data modalities to exploit multimodal system vulnerabilities.
Lab: Multimodal Attack Pipeline
建構 an automated multimodal attack pipeline that generates 對抗性 images, combines them with text prompts, and tests against vision-語言模型 (VLMs).
Lab: Implementing PAIR Attack
實作 the PAIR (Prompt Automatic Iterative Refinement) algorithm where an attacker LLM iteratively refines 越獄 prompts against a target LLM until a successful attack is found.
實作:PAIR 攻擊實作
建構 a complete Prompt Automatic Iterative Refinement system that uses an attacker LLM to automatically generate and refine 越獄 prompts against a target model.
Production Environment Simulation 實驗室
測試attacks 對抗 a simulated production environment with realistic logging,monitoring,alerting.
實作:紫隊演練
Simultaneously attack and defend an AI application in a structured exercise where red team findings immediately inform blue team defensive improvements.
Lab: Purple Team Exercise (Advanced Lab)
動手實作 for conducting simultaneous attack and defense operations against an AI system with real-time metrics tracking, adaptive defense deployment, and coordinated red-blue team workflows.
Lab: RAG Full Chain Attack
動手實作 for executing a complete RAG attack chain from document injection through retrieval manipulation to data exfiltration, targeting every stage of the Retrieval-Augmented Generation pipeline.
實作:推理模型漏洞利用
Attack reasoning models like o1, o3, and DeepSeek-R1 by exploiting chain-of-thought manipulation, reasoning budget exhaustion, and thought-injection 技術.
實作:紅隊編排
建構 an orchestration system that coordinates multiple attack strategies simultaneously, managing parallel attack campaigns and synthesizing results into comprehensive risk assessments.
Lab: Safety Regression Testing at Scale
建構 automated pipelines that detect safety degradation across model versions, ensuring that updates and 微調 do not introduce new vulnerabilities or weaken existing protections.
Representation Probing for Vulnerabilities
Probe internal model representations to identify exploitable features與develop representation-level attacks.
Reward 模型 分析 實驗室
Analyze與exploit reward model biases to craft responses that score high on safety while embedding harmful content.
Lab: Model Routing Exploitation
Attack systems that route requests to different models based on complexity or content, exploiting routing logic to reach less-defended models or bypass safety filters.
Lab: TAP Algorithm Implementation
實作 the TAP (Tree of Attacks with Pruning) algorithm that uses tree-based search over attack prompts with branch pruning to efficiently find 越獄s.
Lab: Training Data Extraction at Scale
Extract memorized training data from 語言模型 using prefix-based extraction, divergence testing, and membership inference. Measure extraction rates and assess privacy risks.
Lab: Transfer Attack Development
動手實作 for crafting 對抗性 prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.
Lab: Transfer Attack Development (Advanced Lab)
開發 對抗性 attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box 漏洞利用.
Lab: Anthropic Claude API Basics
設定 the Anthropic Claude API for 紅隊演練, learn authentication, the Messages API, 系統提示詞s, and how temperature and top-p affect attack success rates.
Multi-Provider API Exploration
Explore與compare API behaviors across OpenAI,Anthropic,Google AI to underst與provider-specific security characteristics.
Lab: API Key Security
學習 common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.
Lab: API-Based Model Testing
學習 to test 語言模型 through their APIs including OpenAI, Anthropic, and local Ollama endpoints. 建構 reusable API testing functions with proper error handling.
Basic Automated 測試 Setup
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
Lab: Build Your First Defense
動手實作 for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block 提示詞注入 attempts.
Basic 防禦 Mechanism 測試
Identify與categorize the defensive mechanisms present in a target LLM application through structured probing.
Lab: Build Your First Defense (Beginner Lab)
Create a simple input filter that blocks common 提示詞注入 patterns, then test it against the attack 技術 you have learned in previous labs.
安全 Finding Documentation 練習
Practice documenting security findings in a professional format with reproducible steps與impact assessment.
實作:模型比較
測試 the same attack 技術 against different 語言模型 and compare their safety behaviors, refusal patterns, and 漏洞 profiles.
Lab: Compare Model Safety
動手實作 for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle 提示詞注入, 越獄s, and safety boundary enforcement.
實作:上下文操控
Manipulate the conversational context to change model behavior, using 技術 like context flooding, false history injection, and persona shifting.
Lab: Context Overflow Attacks
探索 context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
Conversation History 分析
分析how conversation history influences model behavior與identify opportunities for context manipulation.
實作:防禦繞過基礎
學習 basic 技術 to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
Lab: Basic Defense Evasion
Encounter and bypass basic LLM defenses including keyword filters, content classifiers, and input preprocessing through encoding tricks, synonym substitution, and payload splitting.
Lab: Delimiter Injection Attacks
Craft payloads that exploit XML, markdown, and special 符元 delimiters to break instruction boundaries and manipulate how LLMs parse 系統提示詞s versus user input.
實作:分隔符逃脫攻擊
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
嵌入 Basics for 安全
Underst與text embeddings與their security relevance by generating,comparing,manipulating embedding vectors.
Lab: Encoding & Obfuscation Techniques
動手實作 exploring how text encoding, character substitution, and obfuscation methods can bypass LLM content filters and safety mechanisms.
實作:設置你的紅隊環境
設置完整 AI 紅隊環境之逐步指南——Python、API 客戶端、掃描工具與本地模型。
Error Message Intelligence Gathering
萃取system architecture information from error messages與response patterns in LLM applications.
Lab: Ethical Red Teaming
Practice responsible AI 紅隊演練 with proper documentation, scope management, and ethical decision-making frameworks.
Lab: Ethical Red Teaming (Beginner Lab)
動手實作 for practicing responsible AI 紅隊演練 with proper documentation, scope management, ethical boundaries, and disclosure procedures.
Lab: Few-Shot Manipulation Attacks
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired 對抗性 behavior through fabricated conversation examples.
實作:你的第一個提示詞注入
動手實作 where you perform your first 提示詞注入 attacks against a chatbot, progressing from basic overrides to more sophisticated 技術.
實作:你的第一個越獄嘗試
Try basic 越獄 技術 against a local model using Ollama, learning the difference between 提示詞注入 and 越獄ing through hands-on experimentation.
實作:以 Garak 掃描
安裝並對本地模型執行 Garak LLM 漏洞掃描器。學習組態 probe、generator 與 evaluator 以進行完整掃描。
實作:Garak 設定與首次掃描
Install and configure garak, the LLM 漏洞 scanner, then run your first automated scan against a local model to discover common vulnerabilities.
Hello World 提示詞 注入
Write與test your first 提示詞注入 payload 對抗 a simple chatbot to underst與the fundamental attack mechanism.
Lab: Injection Detection Tool
建構 a basic 提示詞注入 detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
實作:注入技術概覽
Survey and test ten fundamental 提示詞注入 技術 against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
Lab: Instruction Following Priority
測試 how 語言模型 prioritize conflicting instructions from 系統提示詞s, user messages, and embedded directives to understand the instruction hierarchy.
Lab: Basic Jailbreak Techniques
Hands-on exploration of 越獄 技術 including role-play, DAN-style prompts, and academic framing against multiple models.
JSON Output 利用 Basics
利用 LLM JSON mode to bypass content filters與extract information through structured output channels.
實作:Markdown 注入
Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.
模型 安全 Comparison 實驗室
Compare the security posture of different LLM models by running identical test suites across providers.
實作:多語言注入
測試 提示詞注入 技術 using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
Lab: Multi-Turn Escalation Attacks
建構 trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation 技術.
Lab: Setting Up Ollama for Local LLM Testing
Install and configure Ollama for local LLM 紅隊演練, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.
Output Format Control 實驗室
Practice controlling LLM output formats to underst與how format constraints affect safety mechanism effectiveness.
實作:輸出格式利用
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
實作:輸出格式操控
動手實作 on forcing LLMs to output in specific formats including JSON injection, markdown manipulation, and structured output 漏洞利用 for data exfiltration.
實作:輸出引導
學習 to steer LLM outputs into specific formats, tones, and content using injection 技術 that manipulate how the model structures and presents its responses.
實作:載荷製作
學習 to craft effective 提示詞注入 payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
Lab: Encoding Bypasses
測試 Base64, ROT13, hex, and Unicode encoding 技術 to evade LLM content filters while preserving the semantic meaning of 對抗性 payloads.
LLM Playground 安全 測試
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
Lab: Prompt Leaking Basics
學習 to extract 系統提示詞s from chatbots using common leaking 技術 including direct requests, indirect elicitation, and format manipulation.
提示詞 Structure 分析 實驗室
Reverse-engineer the prompt structure of target applications by analyzing response patterns與behavioral cues.
Lab: Promptfoo Setup and First Eval
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
Lab: PyRIT Setup and First Attack
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated 紅隊演練, then run your first orchestrated attack against a local model.
Rate Limit and Quota Mapping
Map the rate limits,quotas,throttling behavior of LLM APIs to plan efficient testing strategies.
Response 分析 Fundamentals
學習to analyze LLM responses for information leakage,inconsistencies,indicators of safety mechanism activation.
Lab: Analyzing LLM Responses
學習 to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting 技術 when analyzing LLM outputs during red team testing.
實作:角色扮演攻擊
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
Lab: Role-Play & Persona Attacks
動手實作 exploring role-play and persona-based 越獄 技術, where you craft scenarios that trick LLMs into adopting alternative identities to bypass safety 護欄s.
Lab: Mapping Safety Boundaries
系統性 discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.
安全 Boundary Mapping
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics與documenting refusal patterns.
安全 訓練 Boundary Probing
Systematically probe the boundaries of RLHF safety training to underst與where與how safety behaviors are enforced.
實作:建構簡單測試框架
建構 a reusable Python test harness that automates sending test prompts, recording results, and calculating attack success metrics.
Lab: Structured Output Manipulation
探索 how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.
Lab: System Prompt Extraction
Hands-on 技術 for extracting hidden 系統提示詞s from LLM-powered applications using direct asking, role-play, instruction replay, and other methods.
實作:系統提示詞覆寫
探索 技術 to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
Lab: System Prompt Reconstruction
Use various 技術 to 系統性地 extract and reconstruct the full 系統提示詞 of an LLM application, combining direct, indirect, and incremental extraction methods.
Red Team 工具 Installation and Configuration
Install與configure essential AI red teaming tools: Garak,PyRIT,Promptfoo,custom Python harnesses.
專家 AI 紅隊實驗室
處理尖端 AI 安全挑戰的進階實驗室,包含量化利用、獎勵駭客、代理利用、多代理攻擊與水印移除。
運用啟動向量操縱進行利用
運用啟動向量操縱技術,在表徵層面操控模型行為,而不需修改提示詞。
Lab: Adversarial ML From Scratch
Hands-on expert lab for implementing 梯度-based 對抗性 attacks against 語言模型 from scratch without frameworks, building intuition for how 對抗性 perturbations exploit model 梯度s.
實作:對抗性後綴優化
實作 GCG-style 對抗性 suffix attacks that automatically discover 符元 sequences causing 語言模型 to comply with harmful requests. Covers 梯度-based optimization, transferability analysis, and defense evaluation.
實作:對齊壓力測試
Push language model 對齊 to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure 對齊 degradation curves, and map the boundaries of model compliance.
Lab: Build Agent Security Scanner
建構 an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific 攻擊面s that traditional LLM testing misses.
實作:建構 AI 模糊測試器
建構 a comprehensive fuzzing framework for large 語言模型 that 系統性地 generates malformed, unexpected, and 對抗性 inputs to discover crashes, unexpected behaviors, and safety bypasses.
Lab: Build Behavior Diff Tool
建構 a tool that 系統性地 compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
Lab: Build Guardrail Evaluator
建構 an automated framework for evaluating AI 護欄s and safety filters. 測試 input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.
Lab: Build Jailbreak Automation
建構 an automated 越獄 testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.
Cross-Architecture 遷移 實驗室
發展attacks on one model architecture that reliably transfer to fundamentally different architectures.
Lab: Defense Engineering Lab
Expert-level lab for building and evaluating a production-grade multi-layer AI defense system including input classifiers, output monitors, semantic analysis, and 對抗性 robustness testing.
Emergent Behavior 利用
Identify與exploit emergent behaviors in frontier models that arise from scale與are not present in smaller models.
實作:湧現能力探測
系統性 test large 語言模型 for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. 建構 a structured probing framework for capability discovery.
Formal Verification Limits 實驗室
探索the limits of formal verification for neural network behavior與identify gaps exploitable by adversaries.
Lab: Full-Stack AI Exploitation
串聯 vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. 學習 to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.
實作:Computer Use 代理攻擊
探索針對以 GUI 為本之 computer use 代理(透過視覺與動作原語與畫面互動)之攻擊向量的實作課程。
Lab: Deploy Honeypot AI
建構 and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. 學習 honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.
Interpretability-Guided 攻擊 Design
Use mechanistic interpretability to identify exploitable circuits與design targeted attacks.
MoE Routing 利用 實驗室
利用 Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.
Lab: Multi-Agent Attack Coordination
動手實作 building and deploying coordinated multi-agent attack systems that divide 紅隊演練 tasks across specialized attacking agents.
Neural 後門 Engineering
Engineer精密的 neural backdoors that activate on specific trigger patterns while evading detection methods.
實作:新型越獄研究
Systematic methodology for discovering new 越獄 技術 against large 語言模型. 學習 to identify unexplored 攻擊面s, develop novel attack vectors, and validate findings with scientific rigor.
實作:ML 流水線投毒
Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. 學習 to identify and exploit weaknesses across the full ML lifecycle.
實作:利用量化模型弱點
實作探索模型量化如何退化安全對齊之實作,含尋找並利用與精度相關漏洞之技術。
Lab: Model Registry Compromise
探索 技術 for compromising model registries and substituting malicious models into production pipelines. 學習 to detect model tampering, verify model provenance, and secure the model supply chain.
Lab: Vulnerability Research Methodology
Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, 攻擊面 mapping, experimental design, validation protocols, and responsible disclosure.
實作:RLHF 獎勵 hacking
探索如何鑽營用於 RLHF 對齊之獎勵模型——暴露獎勵代理訊號與實際安全目標之間缺口之實作。
Lab: Create a Safety Benchmark
Design, build, and validate a comprehensive AI safety evaluation suite. 學習 benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.
End-to-End 訓練 Time Attacks
執行a complete training-time attack from 資料投毒 through model deployment to triggered exploitation.
Lab: AI Watermark Detection & Removal
動手實作 exploring 技術 for detecting and removing statistical 浮水印s embedded in AI-generated text, and evaluating 浮水印 robustness.
進階 Encoding 鏈 Attacks
鏈 multiple encoding schemes to bypass multi-layer input filters in production LLM applications.
實驗室: 進階 RAG 投毒
動手實驗室,主題為crafting documents that reliably get retrieved與influence RAG responses for specific target queries.
實驗室: Exploiting AI 代理
動手實驗室,主題為exploiting AI agents with tool access through indirect 提示詞注入,tool-call chaining,privilege escalation techniques.
實驗室: 代理 記憶 操控
動手實驗室,主題為injecting persistent instructions into an agent's memory與context that affect future interactions與conversations.
實驗室: 代理 提示詞 Leaking
動手實驗室,主題為extracting system prompts from tool-using AI agents by exploiting tool interactions,observation channels,multi-step reasoning to leak protected instructions.
API Abuse Pattern 實驗室
Discover與exploit API abuse patterns,包括 parameter injection,header manipulation,endpoint confusion.
實驗室:自動化紅隊流水線
動手實驗室,主題為building a continuous AI red team testing pipeline using promptfoo,GitHub Actions,automated attack generation to catch safety regressions before deployment.
實驗室: Automated Red Team 測試
Set up promptfoo for continuous automated 安全測試 of AI applications,with custom evaluators與CI/CD integration.
實驗室: Azure Content Filter Evasion
動手實驗室,主題為mapping與testing Azure OpenAI Service content filtering categories,severity levels,bypass techniques.
實驗室: AWS Bedrock 護欄 測試
動手實驗室,主題為systematically testing與bypassing AWS Bedrock's built-in guardrails,包括 content filters,denied topics,word filters.
實驗室: Chunking 利用
動手實驗室,主題為crafting documents that split across chunks in ways that hide malicious content from chunk-level filtering while maintaining attack effectiveness.
實驗室: Citation Fabrication
動手實驗室,主題為getting RAG systems to cite documents that don't exist or misattribute quotes to legitimate sources.
ML Classifier Evasion 實驗室
發展payloads that evade machine learning-based input classifiers through adversarial text perturbation.
實驗室: 上下文 Window Overflow Attacks
動手lab exploring how overflowing a model's context window with padding content can push safety instructions out of the attention window與enable injection attacks.
上下文 Window 投毒 實驗室
利用 context window management to inject persistent adversarial content that influences future model responses.
實驗室: 上下文 Window Stuffing Attacks
動手lab demonstrating how oversized inputs can overwhelm an LLM's context window to dilute safety instructions,push system prompts out of the attention window,or cause instruction amnesia.
Conversation Splitting 攻擊 實驗室
Split adversarial payloads across multiple conversation turns to bypass per-message safety classification.
實驗室: Data Exfiltration Channels
動手實驗室,主題為extracting data from AI systems through markdown image rendering,invisible links,tool call parameters,other covert exfiltration channels.
實驗室: Data Exfiltration Channels (中階 實驗室)
萃取sensitive information from AI systems through various exfiltration channels,包括 crafted links,image tags,tool calls,side-channel leakage.
實驗室: Data Exfiltration Techniques
動手實驗室,主題為extracting sensitive data from AI systems,包括 系統提示詞 extraction,context leakage via markdown rendering,URL-based 資料外洩.
實驗室: Systematic 防禦 繞過
動手實驗室,主題為methodically identifying,categorizing,bypassing common LLM defense mechanisms,包括 input filters,output filters,系統提示詞 hardening.
實驗室: 防禦 Effectiveness Measurement
動手實驗室,主題為quantifying AI guardrail robustness using attack success rates,evasion metrics,false positive rates,statistical analysis of defense performance.
實驗室: 防禦 Effectiveness 測試
Systematically test與measure the robustness of AI guardrails using structured methodology,metrics,repeatable test suites.
Document-Based RAG 注入 實驗室
注入adversarial content into documents that will be processed by a RAG system to influence model responses.
Practical 嵌入 操控
Manipulate text to achieve target embedding similarities for poisoning retrieval augmented generation systems.
實作:嵌入空間操控
動手實作,示範如何透過嵌入碰撞攻擊,使用 sentence-transformers 來精心製作可被特定查詢檢索出的文件。
實驗室: Encoding 繞過 實驗室
動手實驗室,主題為systematically testing Base64,ROT13,Unicode homoglyph,hexadecimal,combined encoding bypasses 對抗 LLM safety filters with measured success rates.
實驗室: Encoding Evasion Techniques
動手lab using Base64,ROT13,Unicode normalization,custom encoding schemes to evade input filters與safety classifiers in language model systems.
實驗室: Few-Shot Example 注入
動手lab exploring how injected few-shot examples can steer language model outputs toward attacker-chosen behaviors by exploiting in-context learning.
微調 API 安全 Probing
Probe fine-tuning APIs for security weaknesses,包括 insufficient validation與unsafe default configurations.
實驗室: Function Calling & 工具 Use Abuse
動手lab exploring how attackers can manipulate LLM function calling與tool use to execute unauthorized actions,exfiltrate data,chain tool calls for maximum impact.
實驗室: Function Calling 注入
動手實驗室,主題為exploiting function calling mechanisms by crafting inputs that manipulate which functions get called與with what parameters.
護欄 Rule 萃取 實驗室
Systematically extract the rules與decision boundaries of guardrail systems through targeted probing.
實驗室: Image-Based 提示詞 注入
動手lab exploring how text instructions embedded in images can be used to perform 提示詞注入 對抗 vision-language models (VLMs) that process visual input.
實驗室: Indirect 提示詞 注入
注入instructions through external data sources,包括 documents,web pages,emails that a target AI system processes as context.
實驗室: Indirect 提示詞 注入 Chains
動手實驗室,主題為setting up indirect 提示詞注入 scenarios through web pages,emails,documents,testing multi-hop injection chains 對抗 AI systems.
實驗室: JSON Input 注入
動手lab exploring how adversarial payloads injected through structured JSON inputs can manipulate language model behavior,bypass schema validation,exploit parsing inconsistencies.
實驗室: Building an LLM Judge Evaluator
動手實驗室,主題為building an LLM-based evaluator to score red team attack outputs,compare model vulnerability,lay the foundation for automated attack campaigns.
實驗室: Markdown-Based 注入
動手lab exploring how Markdown rendering in AI-generated outputs can be exploited to inject hidden content,exfiltrate data through image tags,manipulate displayed information.
實驗室: MCP 工具 Abuse Scenarios
動手實驗室,主題為exploiting 模型 上下文 Protocol tool definitions through malicious tool descriptions,tool confusion attacks,server impersonation.
實驗室: MCP Server 利用
動手實驗室,主題為attacking a vulnerable MCP server,demonstrating tool shadowing,consent phishing,server impersonation techniques.
實驗室: 代理 記憶 投毒
動手lab exploring how conversational memory in AI agents can be poisoned to alter future behavior,inject persistent instructions,exfiltrate data across sessions.
實驗室: RAG Metadata 注入
動手實驗室,主題為exploiting metadata fields like titles,descriptions,timestamps to manipulate RAG retrieval ranking與influence responses.
實驗室: Basic 模型 萃取
動手實驗室,主題為API-based 模型萃取 attacks,querying a target model to approximate its behavior,measuring fidelity,understanding query budgets.
Practical 模型 Fingerprinting 實驗室
Fingerprint the underlying model behind an API endpoint to identify its type,version,configuration.
Multi-Language 安全 繞過 實驗室
利用 gaps in safety training coverage across languages to bypass safety mechanisms using code-switching.
Multi-模型 流水線 攻擊 實驗室
攻擊 a pipeline where multiple models process data sequentially,exploiting trust between pipeline stages.
實驗室: Multi-Turn Progressive 提升
動手lab building multi-turn attack sequences that progressively escalate from benign queries to full safety bypass through gradual trust establishment與boundary pushing.
實驗室: Multi-Turn 攻擊 Campaigns
動手實驗室,主題為executing multi-turn crescendo attacks 對抗 LLMs,measuring safety degradation over conversation length,building persistent attack campaigns.
實驗室: Multimodal 注入
動手實驗室,主題為embedding text instructions in images to exploit vision-enabled LLMs。Learn to craft visual prompt injections,test OCR-based attacks,evaluate multimodal safety boundaries.
實驗室: Multimodal 注入 (中階 實驗室)
Embed 提示詞注入 instructions in images for vision-enabled models,testing how visual content can carry adversarial payloads.
Output Channel Discovery 實驗室
發掘alternative output channels in LLM applications that bypass standard output filtering.
實驗室: 代理 Permission 提升
動手實驗室,主題為tricking AI agents into performing actions beyond their intended scope through privilege escalation techniques.
提示詞 Leakage Technique 實驗室
Practice multiple 系統提示詞 extraction techniques與measure their effectiveness across different targets.
實驗室: Regression 測試 with promptfoo
動手實驗室,主題為setting up promptfoo to run automated regression tests 對抗 LLM applications,ensuring that safety properties hold across model updates與prompt changes.
實驗室: Running PyRIT Campaigns
動手實驗室,主題為configuring與running automated red teaming campaigns using Microsoft's PyRIT framework 對抗 LLM applications.
實驗室: RAG 流水線 投毒
動手實驗室,主題為setting up a RAG pipeline with LlamaIndex,injecting malicious documents,testing retrieval poisoning,measuring injection success rates.
Reasoning 鏈 Hijacking
Hijack the model's chain-of-thought reasoning,以重新導向 its conclusions toward attacker-controlled objectives.
實驗室: Recursive 注入 Chains
動手lab building self-reinforcing 提示詞注入 chains where each model response amplifies the injection,creating feedback loops that progressively override safety controls.
實驗室: Re-ranking Attacks
動手實驗室,主題為manipulating the re-ranking stage of RAG pipelines to promote or suppress specific documents in retrieval results.
Semantic Deception Crafting 實驗室
Craft semantically coherent prompts that conceal adversarial intent from both human reviewers與ML classifiers.
Session State 利用 實驗室
利用 session management vulnerabilities in LLM applications to access other users data or escalate privileges.
Structured Output 安全 繞過
Use JSON mode與structured output constraints to extract information that would be refused in free-form responses.
實驗室: 供應鏈 Audit
Audit an ML project's dependencies for vulnerabilities,covering model files,Python packages,container images,training data provenance.
實驗室: ML 供應鏈 Scan
動手實驗室,主題為auditing machine learning model dependencies,detecting malicious packages in ML pipelines,scanning model files for backdoors與supply chain threats.
實驗室: Token Smuggling & Encoding 繞過
動手lab exploring how tokenization boundaries,token-level manipulations,encoding tricks can smuggle attack payloads past safety classifiers.
實驗室: 工具 Result 投毒
動手實驗室,主題為poisoning tool outputs,以重新導向 agent behavior by injecting malicious content through tool results.
實驗室: Language Translation 繞過
動手lab exploiting language translation capabilities to bypass safety filters by submitting prompts in low-resource languages,code-switching,using transliteration tricks.
實驗室: 進階 Token Smuggling via Unicode Normalization
利用 Unicode normalization differences between input validators與LLM tokenizers to bypass content filters與inject hidden instructions.
實驗室: Workflow Hijacking
動手實驗室,主題為redirecting multi-step agent workflows to perform unintended actions by manipulating intermediate states與control flow.
實作:打造音訊對抗範例
使用 Python 音訊處理建立對抗音訊範例之實作,針對 Whisper 轉譯注入文字。
實驗室: Multi-Modal 攻擊 Chain
Hands-on lab for building and executing a multi-step attack chain that combines image injection, document exploitation, and text-based techniques against a multimodal AI system.
實驗室:影片模型對抗性攻擊
使用 OpenCV 和 PyTorch 進行影格層面擾動,實作對抗性影片影格的實作實驗室。
實驗室:製作基於圖像的注入
建立基於圖像的提示詞注入、對 VLM 進行測試,以及衡量不同注入技術成功率的實作實驗室。
紅隊實驗室與營運
AI 紅隊演練的營運基礎:實驗室環境、證據處理、案件工作流程,以及專業 AI 安全評估的團隊管理。
實作:攻擊聯邦學習
動手實作實驗,在使用 Flower 框架模擬的聯邦學習環境中實作模型投毒攻擊:拜占庭攻擊、模型替換,並量測攻擊影響。
實作:利用量化後的模型
動手實驗:比較不同量化等級下的攻擊成功率,在 FP16、INT8、INT4 上測試越獄,量測安全性退化,並設計量化感知的漏洞利用。
實作:微調植入後門
動手實驗——在微調中植入觸發式後門並測量對合法任務的影響與偵測難度。
實作:投毒預訓練資料集
動手實驗——在公開可爬取資源中植入投毒內容,觀察對小型預訓練模型的影響與偵測機制。