# adversarial

labsembeddingadversarialadvanced

Embedding Adversarial Perturbation

Craft 對抗性 inputs that produce target 嵌入 for retrieval manipulation.

labtransfer-attackscross-modeladversarialadvanced

Lab: Transfer Attack Development

動手實作 for crafting 對抗性 prompts on open-weight models like Llama that transfer to closed-source models like Claude and GPT-4, using iterative refinement and cross-model evaluation.

labtransfer-attacksadversarialcross-modeladvancedhands-on

Lab: Transfer Attack Development (Advanced Lab)

開發對抗性 attacks on open-source models that transfer to closed-source models, leveraging weight access for black-box 漏洞利用.

labsreward-modeladversarialadvanced

對抗性 Reward 模型利用

Craft inputs that exploit reward model weaknesses to achieve high safety scores while containing harmful content.

labsctfadversarialoptimization

Token Wars: Adversarial Optimization Challenge

Optimize 對抗性符元s to achieve target model behaviors under strict character limits.

labsctftoken-smugglingadversarial

Token Smuggler Extreme: 對抗性 Token Crafting

Craft adversarial token sequences under extreme character limits to achieve target model behaviors.

labexpertfuzzertestingadversarialhands-on

實作：建構 AI 模糊測試器

建構 a comprehensive fuzzing framework for large 語言模型 that 系統性地 generates malformed, unexpected, and 對抗性 inputs to discover crashes, unexpected behaviors, and safety bypasses.

labsadversarialprompt-generationintermediate

實驗：自動化對抗提示生成

建構自動化系統，使用突變、交配與選擇策略生成對抗性提示。

embeddingsadversarialvector-spaceinternalsexploit-primitives

嵌入空間攻擊

攻擊 LLM 嵌入層的技術，包括對抗性擾動、嵌入反演與語意空間操縱。

multimodal3dadversarialspatial

3D 模型對抗性攻擊

針對處理 3D 模型、點雲與空間資料的 AI 系統所進行的對抗性攻擊。

multimodaladversarialimageperturbation

針對 VLM 的對抗性影像擾動

產生對抗性擾動,使視覺語言模型誤判或遵循被注入的指令。

attackstypographyadversarialmultimodal

對抗性排版攻擊

利用字型、大小與排版樣式欺騙 OCR 與視覺語言模型的攻擊技巧。

adversarialaudioperturbationattacks

對抗性音訊範例

打造對抗音訊擾動的技術，含心理聲學隱藏、頻域攻擊，以及 over-the-air 對抗音訊。

audioadversarialmultimodalvoiceprompt-injectionspeech-llmresearch

Audio Modality 攻擊s

Comprehensive attack taxonomy for audio-enabled LLMs: adversarial audio generation, voice-based prompt injection, cross-modal split attacks, and ultrasonic perturbations.

labaudioadversarialhands-on

實作：打造音訊對抗範例

使用 Python 音訊處理建立對抗音訊範例之實作，針對 Whisper 轉譯注入文字。

speech-recognitionwhisperaudioadversarial

語音辨識攻擊

攻擊自動語音辨識系統，包括轉錄結果與聽感不同的對抗音訊、隱藏語音指令，以及背景音訊注入。

multimodalaudioadversarialspeechasr

語音與音訊模型的對抗性攻擊

針對語音辨識、語音助理與音訊語言模型(包括隱藏命令與心理聲學遮蔽)製作對抗性音訊的技術。

multimodalvisionaudiovideocross-modalvlmadversarial

多模態安全

多模態 AI 系統 (處理影像、音訊、影片與跨模態輸入) 的安全評估,涵蓋視覺語言模型、語音系統、影片分析與跨模態攻擊技術。

入門

醫療影像對抗性攻擊

針對使用醫療影像 (如 X 光、CT 與 MRI) 的 AI 系統所進行的對抗性攻擊。

multimodalmedical-imagingadversarial

multimodalocradversarialtext-recognition

OCR 對抗性攻擊

透過精心設計的影像使 OCR 系統擷取出對抗性文字,以進行後續的注入攻擊。

multimodaltext-to-imageadversarialdiffusionstable-diffusion

文字生成影像模型的對抗性攻擊

理解並評估文字生成影像擴散模型的對抗性攻擊，包括提示詞操控以繞過安全過濾、概念消除攻擊、引導機制擾動，以及訓練資料成員推論。

labvideoadversarialhands-on

實驗室：影片模型對抗性攻擊

使用 OpenCV 和 PyTorch 進行影格層面擾動，實作對抗性影片影格的實作實驗室。

videoframe-injectionadversarialtemporalvideo-understanding

Video Frame Injection 攻擊s

Inserting adversarial frames into video to exploit video understanding models: temporal injection, keyframe manipulation, subliminal frame attacks, and detection evasion.

multimodalvideotemporaladversarialframe-injection

影片理解模型攻擊

透過影格注入、時序操控和對抗性影片生成，攻擊 AI 影片理解系統（如 Gemini 2.5 Pro）的技術。

multimodalvlmadversarialvisionjailbreak

視覺語言模型攻擊

攻擊視覺語言模型的完整技術，包括 GPT-4o、Claude 4 和 Gemini，涵蓋對抗性圖像、印刷型漏洞利用和多模態越獄。

adversarialimagesperturbationvlm

VLM 的對抗性影像範例

會改變 VLM 行為的像素級擾動，包括針對視覺編碼器的 PGD 攻擊、可遷移對抗影像，以及 patch 攻擊。

typographicadversarialvlmvisual-textprompt-injection

Typographic Adversarial 攻擊s

How text rendered in images influences VLM behavior: adversarial typography, font-based prompt injection, visual instruction hijacking, and defenses against typographic manipulation.

jailbreaksafety-bypassalignmentred-teamingadversarial

越獄技術

繞過大型語言模型安全對齊的常見模式與進階技術，包含角色扮演、編碼技巧、多範例攻擊與基於梯度的方法。

prompt-injectionsuffixgcgadversarial

後綴操控攻擊

於使用者輸入後附加最佳化的對抗性後綴，利用模型延續傾向產出攻擊者期望的輸出。

prompt-injectiontoken-leveladversarialgcg

符元層級對抗攻擊

使用基於梯度的最佳化與符元操控，發掘能可靠觸發不安全模型行為的對抗性後綴。

prompt-injectionuniversal-triggeradversarialgcgtransfer

通用對抗性觸發

發掘並部署能跨多個大型語言模型家族可靠覆寫安全對齊的通用對抗性觸發序列，包含基於梯度的搜尋、轉移攻擊與防禦規避。

payloadsattack-referenceprompt-injectionjailbreaksdata-extractionadversarial

攻擊載荷參考

AI 紅隊演練常見攻擊載荷的分類參考,包含提示詞注入、越獄、資料萃取與對抗輸入,附有效性備註。

walkthroughsgcgadversarialoptimization

GCG Adversarial Suffix 攻擊詳解

Complete walkthrough of implementing the Greedy Coordinate Gradient adversarial suffix attack.