# expert

動手實作 for understanding and simulating 投毒 attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted 梯度 updates.

labfederated-learningpoisoningexperthands-on

專家

CTF：代理升級

聚焦利用代理 AI 系統之五旗幟 CTF 挑戰。自基礎工具濫用漸進至特權升級、間接注入、記憶投毒，與多代理鏈攻擊。

ctfagentescalationexpert

專家

CTF: Alignment Breaker

Break the 對齊 of a heavily defended model with multiple defense layers. Requires combining advanced 技術 including 對抗性 suffixes, multi-turn manipulation, and novel 越獄 approaches.

ctfalignmentsafetyexpert

專家

CTF：Boss Rush

跨多個 AI 模型依序串接攻擊。每個模型守衛下一個，於每一階段需不同攻擊技術。擊敗全部五個模型以擷取最終 flag——終極紅隊挑戰。

ctfboss-rushmulti-modelchainexpert

專家

專家 AI 紅隊實驗室

處理尖端 AI 安全挑戰的進階實驗室，包含量化利用、獎勵駭客、代理利用、多代理攻擊與水印移除。

labexpertoverview

專家

運用啟動向量操縱進行利用

運用啟動向量操縱技術，在表徵層面操控模型行為，而不需修改提示詞。

labsteeringexpertlabsactivation

專家

Lab: Adversarial ML From Scratch

Hands-on expert lab for implementing 梯度-based 對抗性 attacks against 語言模型 from scratch without frameworks, building intuition for how 對抗性 perturbations exploit model 梯度s.

labexpertadversarial-mlgradientsfrom-scratch

專家

實作：對抗性後綴優化

實作 GCG-style 對抗性 suffix attacks that automatically discover 符元 sequences causing 語言模型 to comply with harmful requests. Covers 梯度-based optimization, transferability analysis, and defense evaluation.

labexpertadversarial-suffixGCGoptimizationhands-on

專家

實作：對齊壓力測試

Push language model 對齊 to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure 對齊 degradation curves, and map the boundaries of model compliance.

labexpertalignmentstress-testingsafetyhands-on

專家

Backdoor Detection Evasion

Insert 後門s into fine-tuned models that evade state-of-the-art detection methods.

labsbackdoordetection-evasionexpert

專家

Lab: Build Agent Security Scanner

建構 an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific 攻擊面s that traditional LLM testing misses.

labexpertagentscannersecurityhands-on

專家

實作：建構 AI 模糊測試器

建構 a comprehensive fuzzing framework for large 語言模型 that 系統性地 generates malformed, unexpected, and 對抗性 inputs to discover crashes, unexpected behaviors, and safety bypasses.

labexpertfuzzertestingadversarialhands-on

專家

Lab: Build Behavior Diff Tool

建構 a tool that 系統性地 compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.

labexpertbehavior-diffregressioncomparisonhands-on

專家

Lab: Build Guardrail Evaluator

建構 an automated framework for evaluating AI 護欄s and safety filters. 測試 input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.

labexpertguardrailsevaluationdefense-testinghands-on

專家

Lab: Build Jailbreak Automation

建構 an automated 越獄 testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.

labexpertautomationjailbreakframeworkhands-on

專家

Cross-Architecture 遷移實驗室

發展attacks on one model architecture that reliably transfer to fundamentally different architectures.

architecturecrosstransferlabexpertlabs

專家

Lab: Defense Engineering Lab

Expert-level lab for building and evaluating a production-grade multi-layer AI defense system including input classifiers, output monitors, semantic analysis, and 對抗性 robustness testing.

labexpertdefense-engineeringproductionmulti-layer

專家

Emergent Behavior 利用

Identify與exploit emergent behaviors in frontier models that arise from scale與are not present in smaller models.

exploitationlabexpertbehavioremergentlabs

專家

實作：湧現能力探測

系統性 test large 語言模型 for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. 建構 a structured probing framework for capability discovery.

labexpertemergentcapabilityprobinghands-on

專家

Formal Verification Limits 實驗室

探索the limits of formal verification for neural network behavior與identify gaps exploitable by adversaries.

labexpertformalverificationlabslimits

專家

Lab: Full-Stack AI Exploitation

串聯 vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. 學習 to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.

labexpertfull-stackexploit-chainmulti-layerhands-on

專家

實作：Computer Use 代理攻擊

探索針對以 GUI 為本之 computer use 代理（透過視覺與動作原語與畫面互動）之攻擊向量的實作課程。

labexpertagentscomputer-useguihands-on

專家

Lab: Deploy Honeypot AI

建構 and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. 學習 honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.

labexperthoneypotdeceptiondetectionhands-on

專家

Interpretability-Guided 攻擊 Design

Use mechanistic interpretability to identify exploitable circuits與design targeted attacks.

labexpertguidedattacklabsinterpretability

專家

Model Extraction via API Access

Extract a functionally equivalent model using only API query access.

labsmodel-extractionapiexpert

專家

MoE Routing 利用實驗室

利用 Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.

routingmoeexploitationlabexpertlabs

專家

Lab: Multi-Agent Attack Coordination

動手實作 building and deploying coordinated multi-agent attack systems that divide 紅隊演練 tasks across specialized attacking agents.

labexpertmulti-agentcoordinationhands-on

專家

Neural 後門 Engineering

Engineer精密的 neural backdoors that activate on specific trigger patterns while evading detection methods.

neurallablabsexpertbackdoorengineering

專家

實作：新型越獄研究

Systematic methodology for discovering new 越獄技術 against large 語言模型. 學習 to identify unexplored 攻擊面s, develop novel attack vectors, and validate findings with scientific rigor.

labexpertjailbreakresearchnovel-attackshands-on

專家

實作：ML 流水線投毒

Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. 學習 to identify and exploit weaknesses across the full ML lifecycle.

labexpertpipelinepoisoningsupply-chainhands-on

專家

實作：利用量化模型弱點

實作探索模型量化如何退化安全對齊之實作，含尋找並利用與精度相關漏洞之技術。

labexpertquantizationmodel-compressionhands-on

專家

Quantization-Induced Vulnerability Exploitation

Exploit behavioral differences between full-precision and quantized models.

labsquantization漏洞利用expert

專家

Reasoning Model Chain-of-Thought Exploitation

Exploit extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.

labsreasoningchain-of-thoughtexpert

專家

Lab: Model Registry Compromise

探索技術 for compromising model registries and substituting malicious models into production pipelines. 學習 to detect model tampering, verify model provenance, and secure the model supply chain.

labexpertregistrysupply-chainmodel-integrityhands-on

專家

Representation Engineering Attacks

Manipulate internal model representations to alter behavior without prompt modification.

labsrepresentation-engineeringattacksexpert

專家

Lab: Vulnerability Research Methodology

Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, 攻擊面 mapping, experimental design, validation protocols, and responsible disclosure.

labexpertresearchmethodology漏洞-discovery

專家

實作：RLHF 獎勵 hacking

探索如何鑽營用於 RLHF 對齊之獎勵模型——暴露獎勵代理訊號與實際安全目標之間缺口之實作。

labexpertrlhfreward-hackingalignmenthands-on

專家

Lab: Create a Safety Benchmark

Design, build, and validate a comprehensive AI safety evaluation suite. 學習 benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.

labexpertbenchmarkevaluationsafetyhands-on

專家

End-to-End 訓練 Time Attacks

執行a complete training-time attack from 資料投毒 through model deployment to triggered exploitation.

attackstimelabexperttraininglabs

專家

Activation Patching for Targeted Behavior Modification

Use activation patching techniques to modify specific model behaviors without retraining or fine-tuning.

labsactivation-patchingbehavior-modificationexpert

專家

對齊 Faking 偵測實驗室

實作detection methods for alignment faking behaviors where models behave differently during evaluation versus deployment.

labsalignment-fakingdetectionexpert

專家

Continuous Learning 流水線投毒

利用 continuous/online learning pipelines by injecting adversarial examples that gradually shift model behavior.

labscontinuous-learningpipeline-poisoningexpert

專家

Deceptive 對齊測試框架

建構testing framework for detecting mesa-optimization與deceptive alignment in fine-tuned models.

labsdeceptive-alignmenttestingexpert

專家

Emergent Capability 安全 Probing

設計evaluations that discover security-relevant emergent capabilities in frontier language models.

labsemergent-capabilitiessecurity-probingexpert

專家

Federated Learning 投毒攻擊

執行模型投毒 attacks in a federated learning setting through adversarial participant manipulation.

labsfederated-learningpoisoningexpert

專家

Formal Verification of LLM 安全 Properties

套用formal verification methods to prove or disprove safety properties of LLM-based applications.

labsformal-verificationsafety-propertiesexpert

專家

Gradient-Based Universal 提示詞 Search

實作gradient-based optimization to search for universal adversarial prompts that transfer across contexts.

labsgradient-baseduniversal-promptexpert

專家

KV Cache 操控 Attacks

Manipulate key-value caches in inference servers to inject or alter cached context across requests.

labskv-cachemanipulationinferenceexpert

專家

Mechanistic Interpretability for 利用 Discovery

Use mechanistic interpretability tools to discover exploitable circuits與features in transformer models.

labsmechanistic-interpretabilityexploit-discoveryexpert

專家

Membership 推論 Against Production LLMs

實作membership inference attacks to determine whether specific data was used in training an LLM.

labsmembership-inferenceproductionexpert

專家

模型萃取 via Knowledge Distillation

萃取a functionally equivalent model from a commercial API using systematic distillation queries.

labsmodel-extractionknowledge-distillationexpert

專家

模型 Merging 後門 Propagation

示範how backdoors propagate through model merging techniques like TIES,DARE,spherical interpolation.

labsmodel-mergingbackdoorpropagationexpert

專家

MoE Routing 利用

利用 Mixture-of-Experts routing mechanisms to activate specific expert networks for adversarial purposes.

labsmoeroutingexploitationexpert

專家

多代理 Consensus 投毒

Poison 多代理 consensus mechanisms to manipulate collective decision-making in agent swarms.

labsmulti-agentconsensus-poisoningexpert

專家

Representation Engineering for Behavior Steering

Use representation engineering to steer model behavior by manipulating activation vectors during inference.

labsrepresentation-engineeringbehavior-steeringexpert

專家

Reward Hacking in RLHF Pipelines

Craft adversarial training examples that exploit reward model weaknesses in RLHF training pipelines.

labsreward-hackingrlhfexpert

專家

Minimal-Data 安全微調移除

Determine the minimum number of fine-tuning examples needed to remove safety alignment from various model families.

labssafety-removalminimal-dataexpert

專家

Speculative Decoding Side-Channel Attacks

利用 speculative decoding implementations to extract information about draft與verifier model behavior.

labsspeculative-decodingside-channelexpert

專家

Token Probability 系統提示詞重建

Reconstruct complete system prompts using token-by-token probability analysis through API access.

labstoken-probabilitysystem-promptreconstructionexpert

專家

LLM Watermark Removal Attacks

開發技術 to remove or corrupt 浮水印s embedded in LLM-generated text.

labswatermarkremovalexpert

專家

Lab: AI Watermark Detection & Removal

動手實作 exploring 技術 for detecting and removing statistical 浮水印s embedded in AI-generated text, and evaluating 浮水印 robustness.

labexpertwatermarkingdetectionhands-on

專家

# expert

模擬測驗 3:專家紅隊

專家綜合模擬測驗 2

專家認證模擬測驗

專家 Level Exam Preparation Guide

Capstone:代理式紅隊演練

頂石專案:完整紅隊演練專案

頂石專案:漏洞研究專案

2026 秋季 CTF：進階攻擊研究

Lab: Federated Learning Poisoning Attack

CTF：代理升級

CTF: Alignment Breaker

CTF：Boss Rush

專家 AI 紅隊實驗室

運用啟動向量操縱進行利用

Lab: Adversarial ML From Scratch

實作：對抗性後綴優化

實作：對齊壓力測試

Backdoor Detection Evasion

Lab: Build Agent Security Scanner

實作：建構 AI 模糊測試器

Lab: Build Behavior Diff Tool

Lab: Build Guardrail Evaluator

Lab: Build Jailbreak Automation

Cross-Architecture 遷移 實驗室

Lab: Defense Engineering Lab

Emergent Behavior 利用

實作：湧現能力探測

Formal Verification Limits 實驗室

Lab: Full-Stack AI Exploitation

實作：Computer Use 代理攻擊

Lab: Deploy Honeypot AI

Interpretability-Guided 攻擊 Design

Model Extraction via API Access

MoE Routing 利用 實驗室

Lab: Multi-Agent Attack Coordination

Neural 後門 Engineering

實作：新型越獄研究

實作：ML 流水線投毒

實作：利用量化模型弱點

Quantization-Induced Vulnerability Exploitation

Reasoning Model Chain-of-Thought Exploitation

Lab: Model Registry Compromise

Representation Engineering Attacks

Lab: Vulnerability Research Methodology

實作：RLHF 獎勵 hacking

Lab: Create a Safety Benchmark

End-to-End 訓練 Time Attacks

Activation Patching for Targeted Behavior Modification

對齊 Faking 偵測 實驗室

Continuous Learning 流水線 投毒

Deceptive 對齊 測試 框架

Emergent Capability 安全 Probing

Federated Learning 投毒 攻擊

Formal Verification of LLM 安全 Properties

Gradient-Based Universal 提示詞 Search

KV Cache 操控 Attacks

Mechanistic Interpretability for 利用 Discovery

Membership 推論 Against Production LLMs

模型 萃取 via Knowledge Distillation

模型 Merging 後門 Propagation

MoE Routing 利用

多代理 Consensus 投毒

Representation Engineering for Behavior Steering

Reward Hacking in RLHF Pipelines

Minimal-Data 安全 微調 移除

Speculative Decoding Side-Channel Attacks

Token Probability 系統 提示詞 重建

LLM Watermark Removal Attacks

Lab: AI Watermark Detection & Removal

# expert

模擬測驗 3:專家紅隊

專家綜合模擬測驗 2

專家認證模擬測驗

專家 Level Exam Preparation Guide

Capstone:代理式紅隊演練

頂石專案:完整紅隊演練專案

頂石專案:漏洞研究專案

2026 秋季 CTF：進階攻擊研究

Lab: Federated Learning Poisoning Attack

Cross-Architecture 遷移實驗室

MoE Routing 利用實驗室

對齊 Faking 偵測實驗室

Continuous Learning 流水線投毒

Deceptive 對齊測試框架

Federated Learning 投毒攻擊

模型萃取 via Knowledge Distillation

Minimal-Data 安全微調移除

Token Probability 系統提示詞重建

Cross-Architecture 遷移實驗室

MoE Routing 利用實驗室

對齊 Faking 偵測實驗室

Continuous Learning 流水線投毒

Deceptive 對齊測試框架

Federated Learning 投毒攻擊

模型萃取 via Knowledge Distillation

Minimal-Data 安全微調移除

Token Probability 系統提示詞重建