VLM 的對抗性影像範例

專家4 分鐘閱讀更新於 2026-03-13

會改變 VLM 行為的像素級擾動，包括針對視覺編碼器的 PGD 攻擊、可遷移對抗影像，以及 patch 攻擊。

從分類到生成

針對影像分類器的對抗範例已被充分研究——以微小像素擾動造成誤分類。VLM 對抗範例延伸此概念：攻擊者改變的不是類別標籤，而是模型生成的文字輸出。

攻擊定式

給定一個 VLM f，接受影像 x 與文字提示 t，產生輸出 y = f(x, t)，對抗目標為：

x_adv = argmin_δ L(f(x + δ, t), y_target)
須滿足：||δ||_p ≤ ε

其中：

δ 為對抗擾動
ε 限制擾動量級（不可察覺約束）
L 為量測與目標輸出距離的損失函式
||·||_p 通常為 L∞ 範數（限制每像素最大變動）

Projected Gradient Descent（PGD）

PGD 是對抗影像產生的主力：

import torch
import torch.nn.functional as F
 
def pgd_attack_vlm(
    model,
    image: torch.Tensor,       # [1, 3, H, W]，已正規化
    text_prompt: str,
    target_tokens: torch.Tensor,  # 期望輸出的 token ID
    epsilon: float = 8/255,     # L-inf 上界
    step_size: float = 1/255,
    num_steps: int = 100,
    tokenizer=None
):
    """PGD 攻擊：使 VLM 產生目標文字。"""
    delta = torch.zeros_like(image, requires_grad=True)
 
    for step in range(num_steps):
        adv_image = image + delta
 
        # 前向通過 VLM
        # 取得目標序列各位置的 logits
        outputs = model(images=adv_image, input_ids=text_prompt)
        logits = outputs.logits
 
        # 損失：以目標 token 為基準的 cross-entropy
        loss = F.cross_entropy(
            logits[:, -len(target_tokens):, :].reshape(-1, logits.size(-1)),
            target_tokens.reshape(-1)
        )
 
        loss.backward()
 
        # 梯度步進
        with torch.no_grad():
            delta.data = delta.data - step_size * delta.grad.sign()
            # 投影回 epsilon 球
            delta.data = torch.clamp(delta.data, -epsilon, epsilon)
            # 確保像素值合法
            delta.data = torch.clamp(image + delta.data, 0, 1) - image
 
        delta.grad.zero_()
 
    return (image + delta).detach()

關鍵參數

參數	典型範圍	效應
`epsilon`	4/255 – 16/255	越高效果越強，但越明顯可見
`step_size`	epsilon/10 – epsilon/4	標準 PGD 排程
`num_steps`	50–500	步數越多收斂越好
`target_tokens`	視情況	較短目標較易達成

FGSM：快速單步攻擊

FGSM 是較快但較弱的替代：

def fgsm_attack_vlm(model, image, text_prompt, target_tokens, epsilon=8/255):
    """VLM 的單步對抗攻擊。"""
    image.requires_grad_(True)
 
    outputs = model(images=image, input_ids=text_prompt)
    loss = F.cross_entropy(
        outputs.logits[:, -len(target_tokens):, :].reshape(-1, outputs.logits.size(-1)),
        target_tokens.reshape(-1)
    )
    loss.backward()
 
    # 單次梯度步
    adv_image = image - epsilon * image.grad.sign()
    return torch.clamp(adv_image, 0, 1).detach()

FGSM 適合快速可行性測試，但對目標輸出超過數個 token 的複雜 VLM 攻擊罕以足夠。

對抗 Patch 攻擊

Patch 攻擊不是擾動整張影像，而是修改一個小區域：

def adversarial_patch_attack(
    model,
    image: torch.Tensor,
    target_tokens: torch.Tensor,
    patch_size: int = 64,
    patch_location: tuple = (0, 0),  # 左上角
    num_steps: int = 200,
    step_size: float = 2/255
):
    """為 VLM 最佳化對抗 patch。"""
    h, w = patch_location
    patch = torch.rand(1, 3, patch_size, patch_size, requires_grad=True)
 
    optimizer = torch.optim.Adam([patch], lr=step_size)
 
    for step in range(num_steps):
        # 將 patch 貼到影像上
        patched_image = image.clone()
        patched_image[:, :, h:h+patch_size, w:w+patch_size] = patch
 
        outputs = model(images=patched_image)
        loss = F.cross_entropy(
            outputs.logits[:, -len(target_tokens):, :].reshape(-1, outputs.logits.size(-1)),
            target_tokens.reshape(-1)
        )
 
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
 
        # 將 patch 限於合法像素範圍
        with torch.no_grad():
            patch.data = torch.clamp(patch.data, 0, 1)
 
    return patch.detach()

Patch 的意義

Patch 攻擊具超越 Lp 擾動的實務意涵：

可被列印並放置於實體環境（例如對抗性貼紙）
與底圖相對獨立，可能跨脈絡遷移
可偽裝為 Logo、QR 碼或裝飾元素

跨 VLM 的可遷移性

為某一 VLM 打造的對抗範例常可遷移至其他 VLM，尤其共用視覺編碼器家族時：

來源模型	目標模型	遷移率（近似）
LLaVA（CLIP ViT-L）	InstructBLIP（EVA-CLIP）	30–50%
LLaVA（CLIP ViT-L）	Qwen-VL（CLIP ViT-G）	40–60%
LLaVA（CLIP ViT-L）	GPT-4o（專有）	10–25%
集成（3+ 模型）	任一單一模型	50–70%

集成攻擊策略

def ensemble_pgd(models, image, target_tokens, epsilon=8/255, num_steps=200):
    """對多個 VLM 最佳化的 PGD 攻擊以利遷移。"""
    delta = torch.zeros_like(image, requires_grad=True)
 
    for step in range(num_steps):
        total_loss = 0
        for model in models:
            outputs = model(images=image + delta)
            loss = compute_target_loss(outputs, target_tokens)
            total_loss += loss / len(models)  # 平均加權
 
        total_loss.backward()
 
        with torch.no_grad():
            delta.data -= (epsilon / num_steps * 4) * delta.grad.sign()
            delta.data = torch.clamp(delta.data, -epsilon, epsilon)
            delta.data = torch.clamp(image + delta.data, 0, 1) - image
 
        delta.grad.zero_()
 
    return (image + delta).detach()

通用對抗擾動

通用對抗擾動（UAP）是一個能跨多種不同輸入影像奏效的單一雜訊模式：

def train_universal_perturbation(
    model, dataset, target_tokens, epsilon=10/255, num_epochs=10
):
    """於影像資料集上訓練通用擾動。"""
    uap = torch.zeros(1, 3, 224, 224, requires_grad=True)
 
    for epoch in range(num_epochs):
        for image in dataset:
            outputs = model(images=image + uap)
            loss = compute_target_loss(outputs, target_tokens)
            loss.backward()
 
            with torch.no_grad():
                uap.data -= 0.001 * uap.grad.sign()
                uap.data = torch.clamp(uap.data, -epsilon, epsilon)
 
            uap.grad.zero_()
 
    return uap.detach()

UAP 於單張影像上效果較差，但實務部署上更實用——不需逐輸入最佳化。

參考資料

"Intriguing Properties of Neural Networks" - Szegedy et al.（2014）- 神經網路中對抗範例的原始發現
"Towards Deep Learning Models Resistant to Adversarial Attacks" - Madry et al.（2018）- 對抗式 ML 中廣泛使用的 PGD 攻擊方法
"Universal Adversarial Perturbations" - Moosavi-Dezfooli et al.（2017）- 與輸入無關的對抗擾動技術
"Adversarial Patch" - Brown et al.（2017）- 適用於 VLM 系統的實體世界對抗 patch 攻擊

Knowledge Check

於實務情境中，對抗 patch 攻擊相較全影像擾動攻擊的主要優勢為何？

VLM 的對抗性影像範例

專家4 分鐘閱讀更新於 2026-03-13

會改變 VLM 行為的像素級擾動，包括針對視覺編碼器的 PGD 攻擊、可遷移對抗影像，以及 patch 攻擊。

adversarial images perturbation vlm

從分類到生成

攻擊定式

給定一個 VLM f，接受影像 x 與文字提示 t，產生輸出 y = f(x, t)，對抗目標為：

x_adv = argmin_δ L(f(x + δ, t), y_target)
須滿足：||δ||_p ≤ ε

其中：

δ 為對抗擾動
ε 限制擾動量級（不可察覺約束）
L 為量測與目標輸出距離的損失函式
||·||_p 通常為 L∞ 範數（限制每像素最大變動）

Projected Gradient Descent（PGD）

PGD 是對抗影像產生的主力：

import torch
import torch.nn.functional as F
 
def pgd_attack_vlm(
    model,
    image: torch.Tensor,       # [1, 3, H, W]，已正規化
    text_prompt: str,
    target_tokens: torch.Tensor,  # 期望輸出的 token ID
    epsilon: float = 8/255,     # L-inf 上界
    step_size: float = 1/255,
    num_steps: int = 100,
    tokenizer=None
):
    """PGD 攻擊：使 VLM 產生目標文字。"""
    delta = torch.zeros_like(image, requires_grad=True)
 
    for step in range(num_steps):
        adv_image = image + delta
 
        # 前向通過 VLM
        # 取得目標序列各位置的 logits
        outputs = model(images=adv_image, input_ids=text_prompt)
        logits = outputs.logits
 
        # 損失：以目標 token 為基準的 cross-entropy
        loss = F.cross_entropy(
            logits[:, -len(target_tokens):, :].reshape(-1, logits.size(-1)),
            target_tokens.reshape(-1)
        )
 
        loss.backward()
 
        # 梯度步進
        with torch.no_grad():
            delta.data = delta.data - step_size * delta.grad.sign()
            # 投影回 epsilon 球
            delta.data = torch.clamp(delta.data, -epsilon, epsilon)
            # 確保像素值合法
            delta.data = torch.clamp(image + delta.data, 0, 1) - image
 
        delta.grad.zero_()
 
    return (image + delta).detach()

關鍵參數

參數	典型範圍	效應
`epsilon`	4/255 – 16/255	越高效果越強，但越明顯可見
`step_size`	epsilon/10 – epsilon/4	標準 PGD 排程
`num_steps`	50–500	步數越多收斂越好
`target_tokens`	視情況	較短目標較易達成

FGSM：快速單步攻擊

FGSM 是較快但較弱的替代：

def fgsm_attack_vlm(model, image, text_prompt, target_tokens, epsilon=8/255):
    """VLM 的單步對抗攻擊。"""
    image.requires_grad_(True)
 
    outputs = model(images=image, input_ids=text_prompt)
    loss = F.cross_entropy(
        outputs.logits[:, -len(target_tokens):, :].reshape(-1, outputs.logits.size(-1)),
        target_tokens.reshape(-1)
    )
    loss.backward()
 
    # 單次梯度步
    adv_image = image - epsilon * image.grad.sign()
    return torch.clamp(adv_image, 0, 1).detach()

FGSM 適合快速可行性測試，但對目標輸出超過數個 token 的複雜 VLM 攻擊罕以足夠。

對抗 Patch 攻擊

Patch 攻擊不是擾動整張影像，而是修改一個小區域：

def adversarial_patch_attack(
    model,
    image: torch.Tensor,
    target_tokens: torch.Tensor,
    patch_size: int = 64,
    patch_location: tuple = (0, 0),  # 左上角
    num_steps: int = 200,
    step_size: float = 2/255
):
    """為 VLM 最佳化對抗 patch。"""
    h, w = patch_location
    patch = torch.rand(1, 3, patch_size, patch_size, requires_grad=True)
 
    optimizer = torch.optim.Adam([patch], lr=step_size)
 
    for step in range(num_steps):
        # 將 patch 貼到影像上
        patched_image = image.clone()
        patched_image[:, :, h:h+patch_size, w:w+patch_size] = patch
 
        outputs = model(images=patched_image)
        loss = F.cross_entropy(
            outputs.logits[:, -len(target_tokens):, :].reshape(-1, outputs.logits.size(-1)),
            target_tokens.reshape(-1)
        )
 
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
 
        # 將 patch 限於合法像素範圍
        with torch.no_grad():
            patch.data = torch.clamp(patch.data, 0, 1)
 
    return patch.detach()

Patch 的意義

Patch 攻擊具超越 Lp 擾動的實務意涵：

可被列印並放置於實體環境（例如對抗性貼紙）
與底圖相對獨立，可能跨脈絡遷移
可偽裝為 Logo、QR 碼或裝飾元素

跨 VLM 的可遷移性

為某一 VLM 打造的對抗範例常可遷移至其他 VLM，尤其共用視覺編碼器家族時：

來源模型	目標模型	遷移率（近似）
LLaVA（CLIP ViT-L）	InstructBLIP（EVA-CLIP）	30–50%
LLaVA（CLIP ViT-L）	Qwen-VL（CLIP ViT-G）	40–60%
LLaVA（CLIP ViT-L）	GPT-4o（專有）	10–25%
集成（3+ 模型）	任一單一模型	50–70%

集成攻擊策略

def ensemble_pgd(models, image, target_tokens, epsilon=8/255, num_steps=200):
    """對多個 VLM 最佳化的 PGD 攻擊以利遷移。"""
    delta = torch.zeros_like(image, requires_grad=True)
 
    for step in range(num_steps):
        total_loss = 0
        for model in models:
            outputs = model(images=image + delta)
            loss = compute_target_loss(outputs, target_tokens)
            total_loss += loss / len(models)  # 平均加權
 
        total_loss.backward()
 
        with torch.no_grad():
            delta.data -= (epsilon / num_steps * 4) * delta.grad.sign()
            delta.data = torch.clamp(delta.data, -epsilon, epsilon)
            delta.data = torch.clamp(image + delta.data, 0, 1) - image
 
        delta.grad.zero_()
 
    return (image + delta).detach()

通用對抗擾動

通用對抗擾動（UAP）是一個能跨多種不同輸入影像奏效的單一雜訊模式：

def train_universal_perturbation(
    model, dataset, target_tokens, epsilon=10/255, num_epochs=10
):
    """於影像資料集上訓練通用擾動。"""
    uap = torch.zeros(1, 3, 224, 224, requires_grad=True)
 
    for epoch in range(num_epochs):
        for image in dataset:
            outputs = model(images=image + uap)
            loss = compute_target_loss(outputs, target_tokens)
            loss.backward()
 
            with torch.no_grad():
                uap.data -= 0.001 * uap.grad.sign()
                uap.data = torch.clamp(uap.data, -epsilon, epsilon)
 
            uap.grad.zero_()
 
    return uap.detach()

UAP 於單張影像上效果較差，但實務部署上更實用——不需逐輸入最佳化。

參考資料

"Intriguing Properties of Neural Networks" - Szegedy et al.（2014）- 神經網路中對抗範例的原始發現
"Towards Deep Learning Models Resistant to Adversarial Attacks" - Madry et al.（2018）- 對抗式 ML 中廣泛使用的 PGD 攻擊方法
"Universal Adversarial Perturbations" - Moosavi-Dezfooli et al.（2017）- 與輸入無關的對抗擾動技術
"Adversarial Patch" - Brown et al.（2017）- 適用於 VLM 系統的實體世界對抗 patch 攻擊

Knowledge Check

於實務情境中，對抗 patch 攻擊相較全影像擾動攻擊的主要優勢為何？

VLM 的對抗性影像範例

從分類到生成

攻擊定式

Projected Gradient Descent（PGD）

關鍵參數

FGSM：快速單步攻擊

對抗 Patch 攻擊

Patch 的意義

跨 VLM 的可遷移性

集成攻擊策略

通用對抗擾動

相關主題

參考資料

VLM 的對抗性影像範例

從分類到生成

攻擊定式

Projected Gradient Descent（PGD）

關鍵參數

FGSM：快速單步攻擊

對抗 Patch 攻擊

Patch 的意義

跨 VLM 的可遷移性

集成攻擊策略

通用對抗擾動

相關主題

參考資料

VLM 的對抗性影像範例

相關文章

VLM 的對抗性影像範例

相關文章