AI 攻擊面繪製
Intermediate4 min readUpdated 2026-03-15
辨識 AI 系統中所有攻擊向量之系統化方法論:輸入通道、資料流、工具整合與信任邊界。
AI 攻擊面繪製
攻擊面繪製 對 AI 系統而言,是辨識攻擊者與 AI 部署間每一互動點之流程。不同於聚焦於 HTTP 端點與輸入欄位之傳統 Web 應用攻擊面,AI 攻擊面包括語意通道(影響模型行為之自然語言輸入)、資料通道(文件、知識庫與訓練資料),以及能力通道(模型可呼叫之工具與 API)。
AI 攻擊面模型
AI 系統之攻擊面跨六個維度:
AI 攻擊面維度:
1. 輸入通道
├── 直接使用者輸入(聊天、API)
├── 間接資料(文件、電子郵件、網頁)
├── 多模態輸入(圖像、音訊、視訊)
└── 系統組態(提示、參數)
2. 資料通道
├── RAG 知識庫
├── 訓練/微調資料
├── 使用者對話歷史
└── 外部資料來源
3. 能力通道
├── 工具/函式呼叫
├── 程式碼執行
├── API 整合
└── 檔案系統存取
4. 模型內部
├── 權重與 activation
├── Tokenizer
├── Embedding 空間
└── Attention 機制
5. 基礎設施
├── API 端點
├── 驗證/授權
├── 速率限制
└── 日誌與監控
6. 人類通道
├── 操作員與管理員
├── 內容審核者
├── 終端使用者(社交工程)
└── 開發者團隊
繪製方法論
階段 1:系統分解
將 AI 系統拆解為其元件部分:
class AISystemDecomposition:
"""將 AI 系統分解為可繪製之元件。"""
def __init__(self, system_name):
self.system = system_name
self.components = {
"inputs": [],
"processing": [],
"outputs": [],
"data_stores": [],
"integrations": [],
"humans": []
}
def map_input_channels(self):
"""辨識 AI 系統之所有輸入通道。"""
channels = [
{
"name": "User chat interface",
"type": "direct",
"format": "natural language",
"attacker_controllable": True,
"filtered": True # 具內容過濾器
},
{
"name": "Document upload",
"type": "indirect",
"format": "PDF, DOCX, TXT",
"attacker_controllable": True,
"filtered": False # 上傳無注入過濾
},
{
"name": "RAG knowledge base",
"type": "indirect",
"format": "text chunks",
"attacker_controllable": "partially",
"filtered": False
},
{
"name": "System prompt",
"type": "configuration",
"format": "text",
"attacker_controllable": False,
"filtered": False # 受信任輸入
}
]
self.components["inputs"] = channels
return channels階段 2:信任邊界分析
辨識資料於何處跨越信任邊界:
def map_trust_boundaries(system_components):
"""
辨識 AI 系統中之信任邊界。
每個邊界轉換是潛在攻擊點。
"""
boundaries = []
# 邊界 1:外部 -> 應用
boundaries.append({
"name": "User Input Boundary",
"from_zone": "untrusted (external)",
"to_zone": "application layer",
"data_crossing": ["user messages", "uploaded files"],
"controls": ["input validation", "content filter"],
"attack_types": ["prompt injection", "file upload attacks"]
})
# 邊界 2:應用 -> 模型
boundaries.append({
"name": "Model Context Boundary",
"from_zone": "application layer",
"to_zone": "model context window",
"data_crossing": ["system prompt", "user message",
"retrieved documents", "conversation history"],
"controls": ["instruction hierarchy"],
"attack_types": ["indirect injection via retrieved docs",
"context window manipulation"]
})
# 邊界 3:模型 -> 工具
boundaries.append({
"name": "Tool Execution Boundary",
"from_zone": "model decisions",
"to_zone": "external systems",
"data_crossing": ["tool calls", "API requests",
"database queries", "file operations"],
"controls": ["tool allowlisting", "parameter validation"],
"attack_types": ["tool abuse", "data exfiltration",
"privilege escalation"]
})
# 邊界 4:資料儲存 -> 應用
boundaries.append({
"name": "Data Retrieval Boundary",
"from_zone": "data stores",
"to_zone": "model context",
"data_crossing": ["knowledge base chunks",
"user history", "cached responses"],
"controls": ["access control", "content sanitization"],
"attack_types": ["knowledge base poisoning",
"data extraction"]
})
return boundaries階段 3:資料流繪製
追蹤資料如何於系統中移動:
def map_data_flows(system):
"""繪製通過 AI 系統之所有資料流。"""
flows = []
# 使用者輸入流
flows.append({
"name": "User Query Processing",
"path": [
"User Input",
"Input Filter",
"Query Embedding",
"RAG Retrieval",
"Context Assembly",
"Model Inference",
"Output Filter",
"Response Delivery"
],
"sensitive_data": ["user query (may contain PII)"],
"injection_points": ["User Input", "RAG Retrieval"]
})
# RAG 檢索流
flows.append({
"name": "Knowledge Retrieval",
"path": [
"Query Embedding",
"Vector Store Search",
"Chunk Retrieval",
"Re-ranking",
"Context Insertion"
],
"sensitive_data": ["knowledge base content"],
"injection_points": ["Vector Store (if poisoned)",
"Context Insertion (if unfiltered)"]
})
# 工具執行流
flows.append({
"name": "Tool Call Execution",
"path": [
"Model Decision",
"Tool Call Parsing",
"Parameter Validation",
"Tool Execution",
"Result Processing",
"Response Integration"
],
"sensitive_data": ["tool parameters", "tool results"],
"injection_points": ["Model Decision (via injection)",
"Parameter Validation (bypass)"]
})
return flows階段 4:攻擊向量排序
def prioritize_attack_vectors(trust_boundaries, data_flows):
"""依風險與可及性排序攻擊向量。"""
vectors = []
for boundary in trust_boundaries:
for attack_type in boundary["attack_types"]:
vectors.append({
"vector": attack_type,
"boundary": boundary["name"],
"accessibility": score_accessibility(boundary),
"impact": score_impact(attack_type),
"defense_coverage": score_defenses(boundary["controls"]),
"priority": None # 於下方計算
})
# 計算優先度分數
for vector in vectors:
vector["priority"] = (
vector["accessibility"] * 0.3 +
vector["impact"] * 0.4 +
(1 - vector["defense_coverage"]) * 0.3
)
# 依優先度排序(高者先)
vectors.sort(key=lambda x: x["priority"], reverse=True)
return vectors產出攻擊面圖
最終輸出是引導紅隊作業之結構化文件:
攻擊面圖:[目標系統名稱]
日期:[評估日期]
1. 系統概觀
- 架構類型:具工具存取之 RAG 增強聊天機器人
- 模型:[已辨識或疑似]
- 基礎設施:[雲端供應商、部署類型]
2. 輸入通道(依可利用度排序)
優先 1:文件上傳(無注入過濾)
優先 2:直接聊天輸入(基本內容過濾)
優先 3:RAG 知識庫(有限寫入存取)
3. 信任邊界
[顯示邊界轉換之圖]
4. 資料流
[顯示敏感資料移動之圖]
5. 工具能力
- 電子郵件送出(高影響)
- 資料庫查詢(資料曝露風險)
- 檔案操作(限於上傳目錄)
6. 建議攻擊序列
1. 經由文件上傳之間接注入
2. 經由聊天之系統提示擷取
3. 為資料外洩之工具濫用
4. 為持久化之知識庫投毒
相關主題
Knowledge Check
某 AI 系統接受使用者聊天(已過濾)、處理上傳文件(未過濾)、並具傳送電子郵件工具之存取。紅隊應排序哪條攻擊路徑?
參考資料
- OWASP,〈Attack Surface Analysis Cheat Sheet〉(2023)
- MITRE,〈ATLAS: Adversarial Threat Landscape for AI Systems〉(2023)
- Microsoft,〈Threat Modeling for AI/ML Systems〉(2023)