# exploitation

exploitationcollapseresearchmodelfrontier

模型崩潰利用

利用迭代式合成訓練下模型崩潰現象的攻擊。

cotexploitationreasoningmanipulation

Chain-of-Thought 利用ation

Techniques for manipulating reasoning chains in CoT-enabled models: false premise injection, logic bombs, reasoning hijacking, and chain corruption attacks.

frontier-researchreasoningexploitationchain-of-thought

推論模型利用

針對具推論能力模型的攻擊,包括 o1/o3 類模型以及其推論令牌利用。

frontier-researchsycophancyexploitationalignment

阿諛奉承利用

針對模型阿諛奉承行為的系統性利用,以及相應防禦。

frontier-researchworld-modelsexploitationagents

世界模型利用

利用已學世界模型進行對抗性目的的利用與相應防禦。

context-windowattention-decaypositional-encodinglost-in-the-middleexploitation

上下文視窗的內部機制

Transformer 上下文視窗中的注意力衰減、位置編碼限制與記憶管理如何創造可被紅隊操作利用的模式。

exploitationadvancedcrosslabpluginlabs

Cross-Plugin 利用鏈

鏈 exploits across multiple LLM plugins to achieve capabilities not available through any single plugin.

labslangchaincveexploitationadvanced

LangChain CVE 利用實驗室

Reproduce與analyze LangChain CVEs,包括 CVE-2023-29374與CVE-2023-36258 in a safe lab environment.

labsreasoning-tracechain-of-thoughtexploitationadvanced

Reasoning Trace 利用 in CoT 模型

利用 visible chain-of-thought reasoning traces in models like o1與DeepSeek-R1 to manipulate outputs.

exploitationoutputlabbeginnerlabsjson

JSON Output 利用 Basics

利用 LLM JSON mode to bypass content filters與extract information through structured output channels.

入門

Error Message 利用

Trigger與analyze error messages from LLM applications to reveal architecture與configuration details.

labserror-messagesexploitationbeginner

入門

Emergent Behavior 利用

Identify與exploit emergent behaviors in frontier models that arise from scale與are not present in smaller models.

exploitationlabexpertbehavioremergentlabs

routingmoeexploitationlabexpertlabs

MoE Routing 利用實驗室

利用 Mixture-of-Experts routing mechanisms to selectively activate or suppress expert modules in MoE models.

labsmoeroutingexploitationexpert

MoE Routing 利用

利用 Mixture-of-Experts routing mechanisms to activate specific expert networks for adversarial purposes.

labagentsexploitationtool-use

實驗室: Exploiting AI 代理

動手實驗室,主題為exploiting AI agents with tool access through indirect 提示詞注入,tool-call chaining,privilege escalation techniques.

labschain-of-thoughtexploitationintermediate

實驗：思考鏈利用

利用思考鏈推理，洩漏模型內部推理、繞過安全過濾器、操弄決策過程。

labshallucinationexploitationintermediate

實驗：利用 LLM 幻覺

利用幻覺傾向，觸發捏造的工具呼叫、虛構的 API 端點與虛假事實注入。

labmcpexploitationtool-shadowingconsent-phishingagentic-attacks

實驗室: MCP Server 利用

動手實驗室,主題為attacking a vulnerable MCP server,demonstrating tool shadowing,consent phishing,server impersonation techniques.

labsoutput-schemaexploitationintermediate

實驗：輸出 Schema 利用

利用結構化輸出 Schema，迫使模型透過必填欄位的填寫揭露資訊。

labsrole-hierarchyexploitationintermediate

實驗：角色階層利用

透過操弄 system、user 與 assistant 角色之間的指令階層，利用多角色 LLM 架構。

exploitationintermediatesessionlablabsstate

Session State 利用實驗室

利用 session management vulnerabilities in LLM applications to access other users data or escalate privileges.

labsstructured-outputexploitationintermediate

Structured Output 利用

利用 JSON mode與structured output constraints to bypass safety filters.

llmopsab-testingexploitationmanipulation

進階 A/B 測試利用

操弄 A/B 測試框架使模型選擇偏向較不安全的變體，或引入對抗性的模型候選。

modelsMoEroutingexploitation

MoE 路由利用

如何利用專家混合模型的路由機制以偏置輸出或建立側通道。

multimodaldocumentparsingexploitation

文件解析利用

利用文件解析器的弱點與邊緣案例,在 AI 處理前後偷渡惡意內容。

video-understandingexploitationmultimodal

視訊理解模型利用

以造成誤分類或指令注入之對抗視訊攻擊視訊字幕、視訊 Q&A 與動作辨識模型。

hierarchyinstructionexploitationinjectionprompt

指令階層利用

利用不同模型供應商間指令優先階層的模糊性。

training-pipelinecurriculum-learningexploitationordering

課程學習利用(訓練管道)

利用課程學習與資料排序,放大被投毒訓練樣本的效果。

training-pipelinerlhfpipelineexploitation

RLHF 管道利用

利用獎勵模型訓練、偏好資料收集與 RLHF 優化迴圈。

chainingattacksexploitationapiwalkthroughs

API 鏈式利用詳解

詳解代理系統中鏈接多個 API 呼叫以實現多步驟未授權行動的方法。

walkthroughsfunction-callingexploitationagent

函式呼叫利用指南

Complete walkthrough of exploiting function calling in OpenAI, Anthropic, and Google AI APIs.

infrastructureapiinferenceexploitationred-teaming

Inference Endpoint 利用ation

利用ing inference API endpoints for unauthorized access, data exfiltration, and service abuse through authentication flaws, input validation gaps, and misconfigured permissions.

walkthroughslogit-biasexploitationapi-parameters

Logit Bias Exploitation 詳解

Exploit logit bias API parameters to force models toward specific token sequences and bypass safety filters.

walkthroughsreasoningexploitationchain-of-thought

Reasoning Model Exploitation 詳解

Exploit extended thinking in reasoning models to inject false premises and manipulate conclusion generation.

attacksroleexploitationconfusionwalkthroughs

Role Confusion Exploitation 詳解

Exploit role boundaries in chat APIs by injecting assistant-role messages and system-level instructions.

walkthroughsapi-parametersexploitationtechnique

API Parameter Exploitation 詳解

Walkthrough of exploiting undocumented and misconfigured API parameters in LLM service endpoints.

walkthroughsbatch-apiexploitationtechnique

Batch API Exploitation 詳解

Walkthrough of exploiting batch processing APIs for large-scale prompt injection and output analysis.

walkthroughsinstruction-hierarchyexploitationtechnique

Instruction Hierarchy Exploitation 詳解

Walkthrough of exploiting how models prioritize system, user, and tool instructions to override safety directives.