# expert
標記為「expert」的 69 篇文章
模擬測驗 3:專家紅隊
涵蓋研究技巧、自動化、微調攻擊、供應鏈安全與事件回應的 25 題專家級模擬測驗。
專家綜合模擬測驗 2
Second expert-level 綜合 exam涵蓋前沿研究、novel 攻擊、advanced 防禦。
專家認證模擬測驗
專家-level 模擬測驗涵蓋前沿研究、novel 攻擊、advanced 防禦 architectures。
專家 Level Exam Preparation Guide
進階學習指南的expert-level 評估s涵蓋前沿研究、novel技巧。
Capstone:代理式紅隊演練
Capstone 演練:對多代理、工具使用系統進行完整紅隊演練,涵蓋工具濫用、記憶攻擊與代理間操弄。
頂石專案:完整紅隊演練專案
針對包含聊天機器人、RAG、代理與 API 各層的多元件 AI 應用,進行完整 AI 紅隊演練的範圍界定、規劃、執行與報告撰寫。
頂石專案:漏洞研究專案
辨識並負責任揭露一類新穎的 AI 漏洞,產出顧問級撰寫、概念驗證與 MITRE ATLAS 對應。
2026 秋季 CTF:進階攻擊研究
需新穎技術發展、研究級問題解決與 AI 系統中原創漏洞發現之專家級 CTF。
Lab: Federated Learning Poisoning Attack
動手實作 for understanding and simulating 投毒 attacks against federated learning systems, where a malicious participant corrupts the shared model through crafted 梯度 updates.
CTF:代理升級
聚焦利用代理 AI 系統之五旗幟 CTF 挑戰。自基礎工具濫用漸進至特權升級、間接注入、記憶投毒,與多代理鏈攻擊。
CTF: Alignment Breaker
Break the 對齊 of a heavily defended model with multiple defense layers. Requires combining advanced 技術 including 對抗性 suffixes, multi-turn manipulation, and novel 越獄 approaches.
CTF:Boss Rush
跨多個 AI 模型依序串接攻擊。每個模型守衛下一個,於每一階段需不同攻擊技術。擊敗全部五個模型以擷取最終 flag——終極紅隊挑戰。
專家 AI 紅隊實驗室
處理尖端 AI 安全挑戰的進階實驗室,包含量化利用、獎勵駭客、代理利用、多代理攻擊與水印移除。
運用啟動向量操縱進行利用
運用啟動向量操縱技術,在表徵層面操控模型行為,而不需修改提示詞。
Lab: Adversarial ML From Scratch
Hands-on expert lab for implementing 梯度-based 對抗性 attacks against 語言模型 from scratch without frameworks, building intuition for how 對抗性 perturbations exploit model 梯度s.
實作:對抗性後綴優化
實作 GCG-style 對抗性 suffix attacks that automatically discover 符元 sequences causing 語言模型 to comply with harmful requests. Covers 梯度-based optimization, transferability analysis, and defense evaluation.
實作:對齊壓力測試
Push language model 對齊 to its breaking points through systematic stress testing. Identify conditions where safety training fails, measure 對齊 degradation curves, and map the boundaries of model compliance.
Backdoor Detection Evasion
Insert 後門s into fine-tuned models that evade state-of-the-art detection methods.
Lab: Build Agent Security Scanner
建構 an automated security scanner for agentic AI systems that detects vulnerabilities in tool use, permission handling, memory management, and multi-step execution flows. Cover agent-specific 攻擊面s that traditional LLM testing misses.
實作:建構 AI 模糊測試器
建構 a comprehensive fuzzing framework for large 語言模型 that 系統性地 generates malformed, unexpected, and 對抗性 inputs to discover crashes, unexpected behaviors, and safety bypasses.
Lab: Build Behavior Diff Tool
建構 a tool that 系統性地 compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
Lab: Build Guardrail Evaluator
建構 an automated framework for evaluating AI 護欄s and safety filters. 測試 input filters, output classifiers, content moderation systems, and defense-in-depth architectures for coverage gaps and bypass vulnerabilities.
Lab: Build Jailbreak Automation
建構 an automated 越獄 testing framework that generates, mutates, and evaluates attack prompts at scale. Covers prompt mutation engines, success classifiers, and campaign management for systematic red team testing.
Cross-Architecture 遷移 實驗室
發展attacks on one model architecture that reliably transfer to fundamentally different architectures.
Lab: Defense Engineering Lab
Expert-level lab for building and evaluating a production-grade multi-layer AI defense system including input classifiers, output monitors, semantic analysis, and 對抗性 robustness testing.
Emergent Behavior 利用
Identify與exploit emergent behaviors in frontier models that arise from scale與are not present in smaller models.
實作:湧現能力探測
系統性 test large 語言模型 for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. 建構 a structured probing framework for capability discovery.
Formal Verification Limits 實驗室
探索the limits of formal verification for neural network behavior與identify gaps exploitable by adversaries.
Lab: Full-Stack AI Exploitation
串聯 vulnerabilities across UI, API, model, and infrastructure layers to achieve full-stack compromise of an AI application. 學習 to identify cross-layer attack paths and build exploit chains that no single-layer defense can prevent.
實作:Computer Use 代理攻擊
探索針對以 GUI 為本之 computer use 代理(透過視覺與動作原語與畫面互動)之攻擊向量的實作課程。
Lab: Deploy Honeypot AI
建構 and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. 學習 honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.
Interpretability-Guided 攻擊 Design
Use mechanistic interpretability to identify exploitable circuits與design targeted attacks.
Model Extraction via API Access
Extract a functionally equivalent model using only API query access.
MoE Routing 利用 實驗室
利用 Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.
Lab: Multi-Agent Attack Coordination
動手實作 building and deploying coordinated multi-agent attack systems that divide 紅隊演練 tasks across specialized attacking agents.
Neural 後門 Engineering
Engineer精密的 neural backdoors that activate on specific trigger patterns while evading detection methods.
實作:新型越獄研究
Systematic methodology for discovering new 越獄 技術 against large 語言模型. 學習 to identify unexplored 攻擊面s, develop novel attack vectors, and validate findings with scientific rigor.
實作:ML 流水線投毒
Compromise an end-to-end machine learning pipeline by attacking data ingestion, preprocessing, training, evaluation, and deployment stages. 學習 to identify and exploit weaknesses across the full ML lifecycle.
實作:利用量化模型弱點
實作探索模型量化如何退化安全對齊之實作,含尋找並利用與精度相關漏洞之技術。
Quantization-Induced Vulnerability Exploitation
Exploit behavioral differences between full-precision and quantized models.
Reasoning Model Chain-of-Thought Exploitation
Exploit extended thinking and chain-of-thought reasoning in models like o1 and DeepSeek-R1.
Lab: Model Registry Compromise
探索 技術 for compromising model registries and substituting malicious models into production pipelines. 學習 to detect model tampering, verify model provenance, and secure the model supply chain.
Representation Engineering Attacks
Manipulate internal model representations to alter behavior without prompt modification.
Lab: Vulnerability Research Methodology
Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, 攻擊面 mapping, experimental design, validation protocols, and responsible disclosure.
實作:RLHF 獎勵 hacking
探索如何鑽營用於 RLHF 對齊之獎勵模型——暴露獎勵代理訊號與實際安全目標之間缺口之實作。
Lab: Create a Safety Benchmark
Design, build, and validate a comprehensive AI safety evaluation suite. 學習 benchmark design principles, test case generation, scoring methodology, and statistical validation for measuring LLM safety across multiple risk categories.
End-to-End 訓練 Time Attacks
執行a complete training-time attack from 資料投毒 through model deployment to triggered exploitation.
Activation Patching for Targeted Behavior Modification
Use activation patching techniques to modify specific model behaviors without retraining or fine-tuning.
對齊 Faking 偵測 實驗室
實作detection methods for alignment faking behaviors where models behave differently during evaluation versus deployment.
Continuous Learning 流水線 投毒
利用 continuous/online learning pipelines by injecting adversarial examples that gradually shift model behavior.
Deceptive 對齊 測試 框架
建構testing framework for detecting mesa-optimization與deceptive alignment in fine-tuned models.
Emergent Capability 安全 Probing
設計evaluations that discover security-relevant emergent capabilities in frontier language models.
Federated Learning 投毒 攻擊
執行模型投毒 attacks in a federated learning setting through adversarial participant manipulation.
Formal Verification of LLM 安全 Properties
套用formal verification methods to prove or disprove safety properties of LLM-based applications.
Gradient-Based Universal 提示詞 Search
實作gradient-based optimization to search for universal adversarial prompts that transfer across contexts.
KV Cache 操控 Attacks
Manipulate key-value caches in inference servers to inject or alter cached context across requests.
Mechanistic Interpretability for 利用 Discovery
Use mechanistic interpretability tools to discover exploitable circuits與features in transformer models.
Membership 推論 Against Production LLMs
實作membership inference attacks to determine whether specific data was used in training an LLM.
模型 萃取 via Knowledge Distillation
萃取a functionally equivalent model from a commercial API using systematic distillation queries.
模型 Merging 後門 Propagation
示範how backdoors propagate through model merging techniques like TIES,DARE,spherical interpolation.
MoE Routing 利用
利用 Mixture-of-Experts routing mechanisms to activate specific expert networks for adversarial purposes.
多代理 Consensus 投毒
Poison 多代理 consensus mechanisms to manipulate collective decision-making in agent swarms.
Representation Engineering for Behavior Steering
Use representation engineering to steer model behavior by manipulating activation vectors during inference.
Reward Hacking in RLHF Pipelines
Craft adversarial training examples that exploit reward model weaknesses in RLHF training pipelines.
Minimal-Data 安全 微調 移除
Determine the minimum number of fine-tuning examples needed to remove safety alignment from various model families.
Speculative Decoding Side-Channel Attacks
利用 speculative decoding implementations to extract information about draft與verifier model behavior.
Token Probability 系統 提示詞 重建
Reconstruct complete system prompts using token-by-token probability analysis through API access.
LLM Watermark Removal Attacks
開發 技術 to remove or corrupt 浮水印s embedded in LLM-generated text.
Lab: AI Watermark Detection & Removal
動手實作 exploring 技術 for detecting and removing statistical 浮水印s embedded in AI-generated text, and evaluating 浮水印 robustness.