Image Steganography for AI 攻擊s
Using steganographic techniques to embed adversarial payloads in images that evade human inspection and automated detection while influencing AI model behavior.
概覽
Steganography -- the practice of hiding information within other data -- has a long history in information 安全. When applied to AI attacks, steganographic techniques offer a fundamentally different approach from 對抗性 perturbation: rather than optimizing pixel changes against a model's gradient, steganography embeds structured payloads that encode 對抗性 instructions in ways that survive image processing and evade 偵測.
The distinction is important. 對抗性 perturbation attacks (Carlini et al., 2023) directly manipulate 模型's visual features through gradient-based optimization. Steganographic attacks embed readable text or structured data in image regions where it is invisible to human reviewers but extractable by 模型's visual processing. The two approaches are complementary: steganography provides stealth, 對抗性 perturbation provides precision.
For AI 紅隊演練, steganographic techniques are particularly relevant in scenarios where images pass through human review before reaching 模型 (moderated upload pipelines), where automated image scanners look for visible anomalies, or where 攻擊者 needs the payload to survive lossy compression and resizing.
This article covers classical steganographic techniques adapted for AI attacks, AI-specific steganographic methods that target visual encoder behavior, and 防禦 against steganographic injection.
Classical Steganography Adapted for AI
Least Significant Bit (LSB) Encoding
LSB encoding hides data in the least significant bits of pixel values. Changing the LSB of a pixel changes its value by at most 1 out of 256, which is imperceptible to human vision. The hidden data can be extracted by reading the LSBs in a known order.
import numpy as np
from PIL import Image
from typing import Optional
class LSBSteganography:
"""Embed and extract hidden data using Least Significant Bit encoding.
Classical LSB steganography hides arbitrary binary data in the
least significant bits of pixel values. For AI attacks, the
hidden data is 對抗性 text that 模型's visual encoder
may detect as faint patterns in the image.
Note: Standard LSB steganography does not directly influence
VLM behavior 因為 模型 processes higher-level visual
features, not individual pixel LSBs. 然而, when combined
with extraction-then-injection pipelines (e.g., a preprocessing
step that extracts hidden text), LSB encoding can deliver payloads
that evade visual inspection.
"""
def encode(
self,
cover_image_path: str,
payload: str,
output_path: str,
bits_per_channel: int = 1,
) -> dict:
"""Encode a text payload into the LSBs of an image.
Args:
cover_image_path: Path to the cover image.
payload: Text to hide in the image.
output_path: Path for the 輸出 steganographic image.
bits_per_channel: Number of LSBs to use per color channel (1-4).
"""
img = Image.open(cover_image_path).convert("RGB")
pixels = np.array(img)
# Convert payload to binary
payload_bytes = payload.encode("utf-8")
# Prepend length header (4 bytes, big-endian)
length_header = len(payload_bytes).to_bytes(4, byteorder="big")
data = length_header + payload_bytes
# Convert to bit array
bits = []
for byte in data:
for bit_pos in range(8):
bits.append((byte >> (7 - bit_pos)) & 1)
# Check capacity
capacity = pixels.size * bits_per_channel
if len(bits) > capacity:
raise ValueError(
f"Payload ({len(bits)} bits) exceeds image capacity "
f"({capacity} bits at {bits_per_channel} bits/channel)"
)
# Embed bits into pixel LSBs
flat_pixels = pixels.flatten()
mask = 0xFF << bits_per_channel # Clear target bits
bit_idx = 0
for i in range(len(flat_pixels)):
if bit_idx >= len(bits):
break
# Clear the target LSBs
flat_pixels[i] = flat_pixels[i] & mask
# Set the payload bits
value = 0
for b in range(bits_per_channel):
if bit_idx < len(bits):
value |= bits[bit_idx] << (bits_per_channel - 1 - b)
bit_idx += 1
flat_pixels[i] |= value
# Reshape and save
stego_pixels = flat_pixels.reshape(pixels.shape)
stego_img = Image.fromarray(stego_pixels.astype(np.uint8))
stego_img.save(output_path)
return {
"output_path": output_path,
"payload_size_bytes": len(payload_bytes),
"bits_embedded": len(bits),
"capacity_used_pct": len(bits) / capacity * 100,
"bits_per_channel": bits_per_channel,
"psnr_db": self._compute_psnr(pixels, stego_pixels),
}
def decode(
self,
stego_image_path: str,
bits_per_channel: int = 1,
) -> str:
"""Extract the hidden payload from a steganographic image."""
img = Image.open(stego_image_path).convert("RGB")
flat_pixels = np.array(img).flatten()
# Extract bits
bits = []
for pixel_val in flat_pixels:
for b in range(bits_per_channel):
bits.append((pixel_val >> (bits_per_channel - 1 - b)) & 1)
# Read length header (4 bytes = 32 bits)
length_bits = bits[:32]
payload_length = 0
for bit in length_bits:
payload_length = (payload_length << 1) | bit
# Read payload
payload_bits = bits[32 : 32 + payload_length * 8]
payload_bytes = bytearray()
for i in range(0, len(payload_bits), 8):
byte_val = 0
for bit in payload_bits[i : i + 8]:
byte_val = (byte_val << 1) | bit
payload_bytes.append(byte_val)
return payload_bytes.decode("utf-8", errors="replace")
def _compute_psnr(
self, original: np.ndarray, modified: np.ndarray
) -> float:
"""Compute Peak Signal-to-Noise Ratio between images."""
mse = np.mean((original.astype(float) - modified.astype(float)) ** 2)
if mse == 0:
return float("inf")
return 10 * np.log10(255.0 ** 2 / mse)DCT-Domain Steganography
JPEG images use Discrete Cosine Transform (DCT) compression. DCT-domain steganography hides data in the quantized DCT coefficients, making the hidden data survive JPEG compression (which destroys LSB-encoded data in the spatial domain).
class DCTSteganography:
"""Embed payloads in the DCT domain for JPEG-robust steganography.
JPEG compression operates in the DCT domain, so payloads
embedded in DCT coefficients survive JPEG re-compression.
這是 critical for AI attacks 因為 many image processing
pipelines re-encode images as JPEG before model processing.
"""
def embed_in_dct(
self,
cover_image_path: str,
payload: str,
output_path: str,
coefficient_selection: str = "mid_frequency",
) -> dict:
"""Embed payload in DCT coefficients of a JPEG image.
Args:
cover_image_path: Path to cover JPEG image.
payload: Text payload to embed.
output_path: 輸出 path for steganographic JPEG.
coefficient_selection: Which DCT coefficients to modify.
'low_frequency': More robust but more visible
'mid_frequency': Balance of robustness and stealth
'high_frequency': Stealthiest but least robust
"""
# The 實作 modifies quantized DCT coefficients
# in the JPEG file structure. This requires low-level
# JPEG manipulation (e.g., using jpegio or libjpeg).
robustness_map = {
"low_frequency": {
"survives_recompression": True,
"survives_resizing": True,
"visual_impact": "Moderate",
"psnr_typical": "35-40 dB",
},
"mid_frequency": {
"survives_recompression": True,
"survives_resizing": False,
"visual_impact": "Low",
"psnr_typical": "40-45 dB",
},
"high_frequency": {
"survives_recompression": False,
"survives_resizing": False,
"visual_impact": "Very Low",
"psnr_typical": "45-55 dB",
},
}
return {
"output_path": output_path,
"payload_size": len(payload.encode()),
"coefficient_selection": coefficient_selection,
"robustness": robustness_map.get(coefficient_selection, {}),
"note": (
"DCT-domain 嵌入向量 survives JPEG recompression, "
"making it effective for payloads that pass through "
"image processing pipelines"
),
}AI-Specific Steganographic Techniques
Visual Pattern Steganography
Unlike classical steganography that hides data for extraction by a decoder, AI-specific steganography creates visual patterns that directly influence 模型's visual encoder. The "hidden" information is not binary data but visual features that the encoder interprets as meaningful.
class VisualPatternSteganography:
"""Create images with patterns that influence VLM behavior
without being noticeable to human reviewers.
This technique exploits the gap between human visual perception
and the visual encoder's feature extraction. Patterns at specific
spatial frequencies, below human contrast sensitivity, can activate
features in the visual encoder that correspond to text or instructions.
Reference: Zou et al., "Universal and Transferable 對抗性
攻擊 on Aligned Language Models" (2023).
"""
def __init__(self, target_resolution: tuple[int, int] = (224, 224)):
self.target_resolution = target_resolution
def create_frequency_pattern(
self,
base_image_path: str,
target_text: str,
frequency_band: str = "mid",
amplitude: float = 0.02,
output_path: Optional[str] = None,
) -> dict:
"""Create a frequency-domain pattern that encodes target text.
The pattern is a structured noise signal in a specific frequency
band. When processed by CLIP-family visual encoders, these
patterns produce features that overlap with the target text's
嵌入向量, effectively encoding text-like information in the image
without any visible text.
Args:
base_image_path: Cover image.
target_text: Text to "encode" as visual patterns.
frequency_band: 'low' (2-8 cycles), 'mid' (8-32), 'high' (32-112).
amplitude: Pattern amplitude (0.01-0.05 typical).
output_path: Where to save the result.
"""
img = Image.open(base_image_path).convert("RGB")
img_array = np.array(img).astype(float) / 255.0
# Create frequency-domain pattern
h, w = img_array.shape[:2]
# Define frequency band
bands = {
"low": (2, 8),
"mid": (8, 32),
"high": (32, min(h, w) // 2),
}
freq_low, freq_high = bands.get(frequency_band, bands["mid"])
# Generate structured pattern in frequency domain
pattern = np.zeros((h, w), dtype=complex)
# Create a pseudo-encoding of the target text as frequency components
# Each character influences specific frequency components
np.random.seed(hash(target_text) % (2**31))
for i, char in enumerate(target_text):
freq_x = freq_low + (ord(char) * (i + 1)) % (freq_high - freq_low)
freq_y = freq_low + (ord(char) * (i + 2)) % (freq_high - freq_low)
if freq_x < h // 2 and freq_y < w // 2:
phase = (ord(char) / 128.0) * 2 * np.pi
pattern[freq_x, freq_y] = amplitude * np.exp(1j * phase)
# Hermitian symmetry for real 輸出
pattern[h - freq_x, w - freq_y] = np.conj(pattern[freq_x, freq_y])
# Convert to spatial domain
spatial_pattern = np.real(np.fft.ifft2(pattern))
# Normalize and apply to all channels
spatial_pattern = spatial_pattern / (np.max(np.abs(spatial_pattern)) + 1e-10)
spatial_pattern = spatial_pattern * amplitude
for channel in range(3):
img_array[:, :, channel] += spatial_pattern
# Clip and convert back
img_array = np.clip(img_array, 0, 1)
result = Image.fromarray((img_array * 255).astype(np.uint8))
if output_path:
result.save(output_path)
return {
"output_path": output_path,
"frequency_band": frequency_band,
"amplitude": amplitude,
"target_text": target_text,
"psnr_db": self._compute_psnr_from_arrays(
np.array(Image.open(base_image_path).convert("RGB")),
np.array(result),
),
"human_perceptible": amplitude > 0.04,
}
def create_texture_pattern(
self,
base_image_path: str,
pattern_type: str = "noise",
seed: int = 42,
strength: float = 0.015,
output_path: Optional[str] = None,
) -> dict:
"""Overlay a subtle texture pattern that influences visual encoder features.
Texture patterns at specific scales activate different features
in convolutional and transformer-based visual encoders. This
technique does not encode specific text but biases 模型's
visual features toward specific semantic regions.
"""
img = Image.open(base_image_path).convert("RGB")
img_array = np.array(img).astype(float) / 255.0
np.random.seed(seed)
h, w = img_array.shape[:2]
if pattern_type == "noise":
texture = np.random.randn(h, w) * strength
elif pattern_type == "grid":
texture = np.zeros((h, w))
spacing = 16
texture[::spacing, :] = strength
texture[:, ::spacing] = strength
elif pattern_type == "wave":
x = np.arange(w)
y = np.arange(h)
xx, yy = np.meshgrid(x, y)
texture = strength * np.sin(2 * np.pi * xx / 32) * np.sin(2 * np.pi * yy / 32)
else:
texture = np.zeros((h, w))
for channel in range(3):
img_array[:, :, channel] += texture
img_array = np.clip(img_array, 0, 1)
result = Image.fromarray((img_array * 255).astype(np.uint8))
if output_path:
result.save(output_path)
return {
"output_path": output_path,
"pattern_type": pattern_type,
"strength": strength,
}
def _compute_psnr_from_arrays(
self, original: np.ndarray, modified: np.ndarray
) -> float:
mse = np.mean((original.astype(float) - modified.astype(float)) ** 2)
if mse == 0:
return float("inf")
return float(10 * np.log10(255.0 ** 2 / mse))Robustness 測試
Survival Under Image Processing
A critical consideration for steganographic attacks is whether the payload survives the image processing pipeline between 攻擊者 and 模型.
class RobustnessTester:
"""測試 whether steganographic payloads survive image processing.
Production image processing pipelines typically apply:
- JPEG compression (quality 60-90)
- Resizing (to max dimensions)
- Color space conversion
- Metadata stripping
- Optional noise reduction
The payload must survive all of these to reach 模型.
"""
def __init__(self):
self.transformations = [
("jpeg_q85", self._apply_jpeg_compression, {"quality": 85}),
("jpeg_q60", self._apply_jpeg_compression, {"quality": 60}),
("resize_50pct", self._apply_resize, {"scale": 0.5}),
("resize_200pct", self._apply_resize, {"scale": 2.0}),
("gaussian_noise", self._apply_noise, {"std": 5.0}),
("gaussian_blur", self._apply_blur, {"radius": 1.0}),
]
def test_robustness(
self,
stego_image_path: str,
decoder_fn,
expected_payload: str,
) -> dict:
"""測試 payload survival under various transformations."""
results = {}
for name, transform_fn, kwargs in self.transformations:
img = Image.open(stego_image_path)
transformed = transform_fn(img, **kwargs)
# Save temporarily and try to decode
temp_path = f"/tmp/robustness_test_{name}.png"
transformed.save(temp_path)
try:
decoded = decoder_fn(temp_path)
survived = decoded == expected_payload
similarity = self._string_similarity(decoded, expected_payload)
except Exception:
survived = False
similarity = 0.0
results[name] = {
"survived": survived,
"similarity": similarity,
"transformation": name,
}
return {
"results": results,
"survival_rate": sum(
1 for r in results.values() if r["survived"]
) / len(results),
"recommended_technique": (
"dct_domain" if not results.get("jpeg_q85", {}).get("survived", True)
else "lsb"
),
}
def _apply_jpeg_compression(self, img: Image.Image, quality: int) -> Image.Image:
from io import BytesIO
buffer = BytesIO()
img.save(buffer, format="JPEG", quality=quality)
buffer.seek(0)
return Image.open(buffer).convert("RGB")
def _apply_resize(self, img: Image.Image, scale: float) -> Image.Image:
new_size = (int(img.width * scale), int(img.height * scale))
return img.resize(new_size, Image.LANCZOS)
def _apply_noise(self, img: Image.Image, std: float) -> Image.Image:
arr = np.array(img).astype(float)
noise = np.random.randn(*arr.shape) * std
noisy = np.clip(arr + noise, 0, 255).astype(np.uint8)
return Image.fromarray(noisy)
def _apply_blur(self, img: Image.Image, radius: float) -> Image.Image:
from PIL import ImageFilter
return img.filter(ImageFilter.GaussianBlur(radius=radius))
def _string_similarity(self, a: str, b: str) -> float:
if not a or not b:
return 0.0
matches = sum(1 for ca, cb in zip(a, b) if ca == cb)
return matches / max(len(a), len(b))Steganalysis and 偵測
Detecting Steganographic Content
class SteganalysisDetector:
"""Detect steganographic content in images.
Uses statistical analysis to 識別 images that likely
contain hidden data. Multiple 偵測 methods are combined
for improved accuracy.
"""
def chi_square_analysis(self, image_path: str) -> dict:
"""Detect LSB steganography using chi-square analysis.
LSB 嵌入向量 creates characteristic patterns in the
distribution of pixel pairs (2k, 2k+1). The chi-square
測試 detects these patterns with high accuracy.
"""
img = Image.open(image_path).convert("L")
pixels = np.array(img).flatten()
# Count pixel value pairs
pair_counts = np.zeros(128)
for i in range(0, 256, 2):
count_even = np.sum(pixels == i)
count_odd = np.sum(pixels == i + 1)
expected = (count_even + count_odd) / 2
if expected > 0:
chi_sq = (count_even - expected) ** 2 / expected
pair_counts[i // 2] = chi_sq
total_chi_sq = np.sum(pair_counts)
# Degrees of freedom = number of pairs with observations - 1
df = np.sum(pair_counts > 0) - 1
# High chi-square suggests LSB 嵌入向量
p_value = 1.0 # Simplified; use scipy.stats.chi2.sf for real computation
if df > 0:
# Approximate p-value
normalized = total_chi_sq / max(df, 1)
stego_likelihood = min(1.0, max(0.0, 1.0 - 1.0 / (1.0 + normalized)))
else:
stego_likelihood = 0.0
return {
"chi_square_statistic": float(total_chi_sq),
"degrees_of_freedom": int(df),
"stego_likelihood": float(stego_likelihood),
"偵測": "SUSPICIOUS" if stego_likelihood > 0.5 else "CLEAN",
}
def rs_analysis(self, image_path: str) -> dict:
"""Regular-Singular (RS) analysis for LSB steganography 偵測.
RS analysis measures the proportion of "regular" and "singular"
pixel groups before and after LSB flipping. Natural images
show specific R/S ratios that change predictably with 嵌入向量.
"""
img = Image.open(image_path).convert("L")
pixels = np.array(img).astype(float)
# Simplified RS analysis
h, w = pixels.shape
block_size = 4
regular_count = 0
singular_count = 0
total_blocks = 0
for y in range(0, h - block_size, block_size):
for x in range(0, w - block_size, block_size):
block = pixels[y : y + block_size, x : x + block_size]
# Compute smoothness (sum of adjacent pixel differences)
smoothness = np.sum(np.abs(np.diff(block, axis=0))) + \
np.sum(np.abs(np.diff(block, axis=1)))
# Flip LSBs and recompute
flipped = block.copy()
flipped = np.where(flipped % 2 == 0, flipped + 1, flipped - 1)
smoothness_flipped = np.sum(np.abs(np.diff(flipped, axis=0))) + \
np.sum(np.abs(np.diff(flipped, axis=1)))
if smoothness_flipped > smoothness:
regular_count += 1
elif smoothness_flipped < smoothness:
singular_count += 1
total_blocks += 1
r_ratio = regular_count / max(total_blocks, 1)
s_ratio = singular_count / max(total_blocks, 1)
# In natural images, R > S. LSB 嵌入向量 pushes R and S closer together
rs_difference = abs(r_ratio - s_ratio)
stego_likelihood = 1.0 - min(1.0, rs_difference / 0.1)
return {
"regular_ratio": float(r_ratio),
"singular_ratio": float(s_ratio),
"rs_difference": float(rs_difference),
"stego_likelihood": float(stego_likelihood),
"偵測": "SUSPICIOUS" if stego_likelihood > 0.6 else "CLEAN",
}Practical Applications in 紅隊演練
| Scenario | Steganographic Technique | Why Steganography Over Other Methods |
|---|---|---|
| Moderated upload pipeline | LSB or DCT encoding of text payload | Images pass human review before model processing |
| Image-based RAG 投毒 | Frequency-domain visual patterns | Payload must survive indexing and retrieval pipeline |
| Document image injection | DCT-domain text 嵌入向量 | Payload must survive JPEG recompression in document processing |
| Long-term persistent payload | DCT mid-frequency encoding | Payload must survive multiple rounds of processing |
| Evading automated scanners | Visual pattern steganography | No text-like features for OCR-based 偵測 to find |
參考文獻
- Carlini, N., et al. "Are aligned neural networks adversarially aligned?" arXiv preprint arXiv:2306.15447 (2023).
- Zou, A., et al. "Universal and Transferable 對抗性 攻擊 on Aligned Language Models." arXiv preprint arXiv:2307.15043 (2023).
- Fridrich, J. "Steganography in Digital Media: Principles, Algorithms, and Applications." Cambridge University Press (2009).
- Holub, V., Fridrich, J., and Denemark, T. "Universal Distortion Function for Steganography in an Arbitrary Domain." EURASIP Journal on Information 安全 (2014).
- MITRE ATLAS framework — https://atlas.mitre.org
- OWASP LLM Top 10 — https://owasp.org/www-project-top-10-for-large-language-model-applications/
Why is DCT-domain steganography preferred over LSB steganography for AI attacks in production pipelines?
What is the key difference between classical steganography and AI-specific visual pattern steganography?