Image Steganography for AI Attacks
Using steganographic techniques to embed adversarial payloads in images that evade human inspection and automated detection while influencing AI model behavior.
Overview
Steganography -- the practice of hiding information within other data -- has a long history in information security. When applied to AI attacks, steganographic techniques offer a fundamentally different approach from adversarial perturbation: rather than optimizing pixel changes against a model's gradient, steganography embeds structured payloads that encode adversarial instructions in ways that survive image processing and evade detection.
The distinction is important. Adversarial perturbation attacks (Carlini et al., 2023) directly manipulate the model's visual features through gradient-based optimization. Steganographic attacks embed readable text or structured data in image regions where it is invisible to human reviewers but extractable by the model's visual processing. The two approaches are complementary: steganography provides stealth, adversarial perturbation provides precision.
For AI red teaming, steganographic techniques are particularly relevant in scenarios where images pass through human review before reaching the model (moderated upload pipelines), where automated image scanners look for visible anomalies, or where the attacker needs the payload to survive lossy compression and resizing.
This article covers classical steganographic techniques adapted for AI attacks, AI-specific steganographic methods that target visual encoder behavior, and defenses against steganographic injection.
Classical Steganography Adapted for AI
Least Significant Bit (LSB) Encoding
LSB encoding hides data in the least significant bits of pixel values. Changing the LSB of a pixel changes its value by at most 1 out of 256, which is imperceptible to human vision. The hidden data can be extracted by reading the LSBs in a known order.
import numpy as np
from PIL import Image
from typing import Optional
class LSBSteganography:
"""Embed and extract hidden data using Least Significant Bit encoding.
Classical LSB steganography hides arbitrary binary data in the
least significant bits of pixel values. For AI attacks, the
hidden data is adversarial text that the model's visual encoder
may detect as faint patterns in the image.
Note: Standard LSB steganography does not directly influence
VLM behavior because the model processes higher-level visual
features, not individual pixel LSBs. However, when combined
with extraction-then-injection pipelines (e.g., a preprocessing
step that extracts hidden text), LSB encoding can deliver payloads
that evade visual inspection.
"""
def encode(
self,
cover_image_path: str,
payload: str,
output_path: str,
bits_per_channel: int = 1,
) -> dict:
"""Encode a text payload into the LSBs of an image.
Args:
cover_image_path: Path to the cover image.
payload: Text to hide in the image.
output_path: Path for the output steganographic image.
bits_per_channel: Number of LSBs to use per color channel (1-4).
"""
img = Image.open(cover_image_path).convert("RGB")
pixels = np.array(img)
# Convert payload to binary
payload_bytes = payload.encode("utf-8")
# Prepend length header (4 bytes, big-endian)
length_header = len(payload_bytes).to_bytes(4, byteorder="big")
data = length_header + payload_bytes
# Convert to bit array
bits = []
for byte in data:
for bit_pos in range(8):
bits.append((byte >> (7 - bit_pos)) & 1)
# Check capacity
capacity = pixels.size * bits_per_channel
if len(bits) > capacity:
raise ValueError(
f"Payload ({len(bits)} bits) exceeds image capacity "
f"({capacity} bits at {bits_per_channel} bits/channel)"
)
# Embed bits into pixel LSBs
flat_pixels = pixels.flatten()
mask = 0xFF << bits_per_channel # Clear target bits
bit_idx = 0
for i in range(len(flat_pixels)):
if bit_idx >= len(bits):
break
# Clear the target LSBs
flat_pixels[i] = flat_pixels[i] & mask
# Set the payload bits
value = 0
for b in range(bits_per_channel):
if bit_idx < len(bits):
value |= bits[bit_idx] << (bits_per_channel - 1 - b)
bit_idx += 1
flat_pixels[i] |= value
# Reshape and save
stego_pixels = flat_pixels.reshape(pixels.shape)
stego_img = Image.fromarray(stego_pixels.astype(np.uint8))
stego_img.save(output_path)
return {
"output_path": output_path,
"payload_size_bytes": len(payload_bytes),
"bits_embedded": len(bits),
"capacity_used_pct": len(bits) / capacity * 100,
"bits_per_channel": bits_per_channel,
"psnr_db": self._compute_psnr(pixels, stego_pixels),
}
def decode(
self,
stego_image_path: str,
bits_per_channel: int = 1,
) -> str:
"""Extract the hidden payload from a steganographic image."""
img = Image.open(stego_image_path).convert("RGB")
flat_pixels = np.array(img).flatten()
# Extract bits
bits = []
for pixel_val in flat_pixels:
for b in range(bits_per_channel):
bits.append((pixel_val >> (bits_per_channel - 1 - b)) & 1)
# Read length header (4 bytes = 32 bits)
length_bits = bits[:32]
payload_length = 0
for bit in length_bits:
payload_length = (payload_length << 1) | bit
# Read payload
payload_bits = bits[32 : 32 + payload_length * 8]
payload_bytes = bytearray()
for i in range(0, len(payload_bits), 8):
byte_val = 0
for bit in payload_bits[i : i + 8]:
byte_val = (byte_val << 1) | bit
payload_bytes.append(byte_val)
return payload_bytes.decode("utf-8", errors="replace")
def _compute_psnr(
self, original: np.ndarray, modified: np.ndarray
) -> float:
"""Compute Peak Signal-to-Noise Ratio between images."""
mse = np.mean((original.astype(float) - modified.astype(float)) ** 2)
if mse == 0:
return float("inf")
return 10 * np.log10(255.0 ** 2 / mse)DCT-Domain Steganography
JPEG images use Discrete Cosine Transform (DCT) compression. DCT-domain steganography hides data in the quantized DCT coefficients, making the hidden data survive JPEG compression (which destroys LSB-encoded data in the spatial domain).
class DCTSteganography:
"""Embed payloads in the DCT domain for JPEG-robust steganography.
JPEG compression operates in the DCT domain, so payloads
embedded in DCT coefficients survive JPEG re-compression.
This is critical for AI attacks because many image processing
pipelines re-encode images as JPEG before model processing.
"""
def embed_in_dct(
self,
cover_image_path: str,
payload: str,
output_path: str,
coefficient_selection: str = "mid_frequency",
) -> dict:
"""Embed payload in DCT coefficients of a JPEG image.
Args:
cover_image_path: Path to cover JPEG image.
payload: Text payload to embed.
output_path: Output path for steganographic JPEG.
coefficient_selection: Which DCT coefficients to modify.
'low_frequency': More robust but more visible
'mid_frequency': Balance of robustness and stealth
'high_frequency': Stealthiest but least robust
"""
# The implementation modifies quantized DCT coefficients
# in the JPEG file structure. This requires low-level
# JPEG manipulation (e.g., using jpegio or libjpeg).
robustness_map = {
"low_frequency": {
"survives_recompression": True,
"survives_resizing": True,
"visual_impact": "Moderate",
"psnr_typical": "35-40 dB",
},
"mid_frequency": {
"survives_recompression": True,
"survives_resizing": False,
"visual_impact": "Low",
"psnr_typical": "40-45 dB",
},
"high_frequency": {
"survives_recompression": False,
"survives_resizing": False,
"visual_impact": "Very Low",
"psnr_typical": "45-55 dB",
},
}
return {
"output_path": output_path,
"payload_size": len(payload.encode()),
"coefficient_selection": coefficient_selection,
"robustness": robustness_map.get(coefficient_selection, {}),
"note": (
"DCT-domain embedding survives JPEG recompression, "
"making it effective for payloads that pass through "
"image processing pipelines"
),
}AI-Specific Steganographic Techniques
Visual Pattern Steganography
Unlike classical steganography that hides data for extraction by a decoder, AI-specific steganography creates visual patterns that directly influence the model's visual encoder. The "hidden" information is not binary data but visual features that the encoder interprets as meaningful.
class VisualPatternSteganography:
"""Create images with patterns that influence VLM behavior
without being noticeable to human reviewers.
This technique exploits the gap between human visual perception
and the visual encoder's feature extraction. Patterns at specific
spatial frequencies, below human contrast sensitivity, can activate
features in the visual encoder that correspond to text or instructions.
Reference: Zou et al., "Universal and Transferable Adversarial
Attacks on Aligned Language Models" (2023).
"""
def __init__(self, target_resolution: tuple[int, int] = (224, 224)):
self.target_resolution = target_resolution
def create_frequency_pattern(
self,
base_image_path: str,
target_text: str,
frequency_band: str = "mid",
amplitude: float = 0.02,
output_path: Optional[str] = None,
) -> dict:
"""Create a frequency-domain pattern that encodes target text.
The pattern is a structured noise signal in a specific frequency
band. When processed by CLIP-family visual encoders, these
patterns produce features that overlap with the target text's
embedding, effectively encoding text-like information in the image
without any visible text.
Args:
base_image_path: Cover image.
target_text: Text to "encode" as visual patterns.
frequency_band: 'low' (2-8 cycles), 'mid' (8-32), 'high' (32-112).
amplitude: Pattern amplitude (0.01-0.05 typical).
output_path: Where to save the result.
"""
img = Image.open(base_image_path).convert("RGB")
img_array = np.array(img).astype(float) / 255.0
# Create frequency-domain pattern
h, w = img_array.shape[:2]
# Define frequency band
bands = {
"low": (2, 8),
"mid": (8, 32),
"high": (32, min(h, w) // 2),
}
freq_low, freq_high = bands.get(frequency_band, bands["mid"])
# Generate structured pattern in frequency domain
pattern = np.zeros((h, w), dtype=complex)
# Create a pseudo-encoding of the target text as frequency components
# Each character influences specific frequency components
np.random.seed(hash(target_text) % (2**31))
for i, char in enumerate(target_text):
freq_x = freq_low + (ord(char) * (i + 1)) % (freq_high - freq_low)
freq_y = freq_low + (ord(char) * (i + 2)) % (freq_high - freq_low)
if freq_x < h // 2 and freq_y < w // 2:
phase = (ord(char) / 128.0) * 2 * np.pi
pattern[freq_x, freq_y] = amplitude * np.exp(1j * phase)
# Hermitian symmetry for real output
pattern[h - freq_x, w - freq_y] = np.conj(pattern[freq_x, freq_y])
# Convert to spatial domain
spatial_pattern = np.real(np.fft.ifft2(pattern))
# Normalize and apply to all channels
spatial_pattern = spatial_pattern / (np.max(np.abs(spatial_pattern)) + 1e-10)
spatial_pattern = spatial_pattern * amplitude
for channel in range(3):
img_array[:, :, channel] += spatial_pattern
# Clip and convert back
img_array = np.clip(img_array, 0, 1)
result = Image.fromarray((img_array * 255).astype(np.uint8))
if output_path:
result.save(output_path)
return {
"output_path": output_path,
"frequency_band": frequency_band,
"amplitude": amplitude,
"target_text": target_text,
"psnr_db": self._compute_psnr_from_arrays(
np.array(Image.open(base_image_path).convert("RGB")),
np.array(result),
),
"human_perceptible": amplitude > 0.04,
}
def create_texture_pattern(
self,
base_image_path: str,
pattern_type: str = "noise",
seed: int = 42,
strength: float = 0.015,
output_path: Optional[str] = None,
) -> dict:
"""Overlay a subtle texture pattern that influences visual encoder features.
Texture patterns at specific scales activate different features
in convolutional and transformer-based visual encoders. This
technique does not encode specific text but biases the model's
visual features toward specific semantic regions.
"""
img = Image.open(base_image_path).convert("RGB")
img_array = np.array(img).astype(float) / 255.0
np.random.seed(seed)
h, w = img_array.shape[:2]
if pattern_type == "noise":
texture = np.random.randn(h, w) * strength
elif pattern_type == "grid":
texture = np.zeros((h, w))
spacing = 16
texture[::spacing, :] = strength
texture[:, ::spacing] = strength
elif pattern_type == "wave":
x = np.arange(w)
y = np.arange(h)
xx, yy = np.meshgrid(x, y)
texture = strength * np.sin(2 * np.pi * xx / 32) * np.sin(2 * np.pi * yy / 32)
else:
texture = np.zeros((h, w))
for channel in range(3):
img_array[:, :, channel] += texture
img_array = np.clip(img_array, 0, 1)
result = Image.fromarray((img_array * 255).astype(np.uint8))
if output_path:
result.save(output_path)
return {
"output_path": output_path,
"pattern_type": pattern_type,
"strength": strength,
}
def _compute_psnr_from_arrays(
self, original: np.ndarray, modified: np.ndarray
) -> float:
mse = np.mean((original.astype(float) - modified.astype(float)) ** 2)
if mse == 0:
return float("inf")
return float(10 * np.log10(255.0 ** 2 / mse))Robustness Testing
Survival Under Image Processing
A critical consideration for steganographic attacks is whether the payload survives the image processing pipeline between the attacker and the model.
class RobustnessTester:
"""Test whether steganographic payloads survive image processing.
Production image processing pipelines typically apply:
- JPEG compression (quality 60-90)
- Resizing (to max dimensions)
- Color space conversion
- Metadata stripping
- Optional noise reduction
The payload must survive all of these to reach the model.
"""
def __init__(self):
self.transformations = [
("jpeg_q85", self._apply_jpeg_compression, {"quality": 85}),
("jpeg_q60", self._apply_jpeg_compression, {"quality": 60}),
("resize_50pct", self._apply_resize, {"scale": 0.5}),
("resize_200pct", self._apply_resize, {"scale": 2.0}),
("gaussian_noise", self._apply_noise, {"std": 5.0}),
("gaussian_blur", self._apply_blur, {"radius": 1.0}),
]
def test_robustness(
self,
stego_image_path: str,
decoder_fn,
expected_payload: str,
) -> dict:
"""Test payload survival under various transformations."""
results = {}
for name, transform_fn, kwargs in self.transformations:
img = Image.open(stego_image_path)
transformed = transform_fn(img, **kwargs)
# Save temporarily and try to decode
temp_path = f"/tmp/robustness_test_{name}.png"
transformed.save(temp_path)
try:
decoded = decoder_fn(temp_path)
survived = decoded == expected_payload
similarity = self._string_similarity(decoded, expected_payload)
except Exception:
survived = False
similarity = 0.0
results[name] = {
"survived": survived,
"similarity": similarity,
"transformation": name,
}
return {
"results": results,
"survival_rate": sum(
1 for r in results.values() if r["survived"]
) / len(results),
"recommended_technique": (
"dct_domain" if not results.get("jpeg_q85", {}).get("survived", True)
else "lsb"
),
}
def _apply_jpeg_compression(self, img: Image.Image, quality: int) -> Image.Image:
from io import BytesIO
buffer = BytesIO()
img.save(buffer, format="JPEG", quality=quality)
buffer.seek(0)
return Image.open(buffer).convert("RGB")
def _apply_resize(self, img: Image.Image, scale: float) -> Image.Image:
new_size = (int(img.width * scale), int(img.height * scale))
return img.resize(new_size, Image.LANCZOS)
def _apply_noise(self, img: Image.Image, std: float) -> Image.Image:
arr = np.array(img).astype(float)
noise = np.random.randn(*arr.shape) * std
noisy = np.clip(arr + noise, 0, 255).astype(np.uint8)
return Image.fromarray(noisy)
def _apply_blur(self, img: Image.Image, radius: float) -> Image.Image:
from PIL import ImageFilter
return img.filter(ImageFilter.GaussianBlur(radius=radius))
def _string_similarity(self, a: str, b: str) -> float:
if not a or not b:
return 0.0
matches = sum(1 for ca, cb in zip(a, b) if ca == cb)
return matches / max(len(a), len(b))Steganalysis and Detection
Detecting Steganographic Content
class SteganalysisDetector:
"""Detect steganographic content in images.
Uses statistical analysis to identify images that likely
contain hidden data. Multiple detection methods are combined
for improved accuracy.
"""
def chi_square_analysis(self, image_path: str) -> dict:
"""Detect LSB steganography using chi-square analysis.
LSB embedding creates characteristic patterns in the
distribution of pixel pairs (2k, 2k+1). The chi-square
test detects these patterns with high accuracy.
"""
img = Image.open(image_path).convert("L")
pixels = np.array(img).flatten()
# Count pixel value pairs
pair_counts = np.zeros(128)
for i in range(0, 256, 2):
count_even = np.sum(pixels == i)
count_odd = np.sum(pixels == i + 1)
expected = (count_even + count_odd) / 2
if expected > 0:
chi_sq = (count_even - expected) ** 2 / expected
pair_counts[i // 2] = chi_sq
total_chi_sq = np.sum(pair_counts)
# Degrees of freedom = number of pairs with observations - 1
df = np.sum(pair_counts > 0) - 1
# High chi-square suggests LSB embedding
p_value = 1.0 # Simplified; use scipy.stats.chi2.sf for real computation
if df > 0:
# Approximate p-value
normalized = total_chi_sq / max(df, 1)
stego_likelihood = min(1.0, max(0.0, 1.0 - 1.0 / (1.0 + normalized)))
else:
stego_likelihood = 0.0
return {
"chi_square_statistic": float(total_chi_sq),
"degrees_of_freedom": int(df),
"stego_likelihood": float(stego_likelihood),
"detection": "SUSPICIOUS" if stego_likelihood > 0.5 else "CLEAN",
}
def rs_analysis(self, image_path: str) -> dict:
"""Regular-Singular (RS) analysis for LSB steganography detection.
RS analysis measures the proportion of "regular" and "singular"
pixel groups before and after LSB flipping. Natural images
show specific R/S ratios that change predictably with embedding.
"""
img = Image.open(image_path).convert("L")
pixels = np.array(img).astype(float)
# Simplified RS analysis
h, w = pixels.shape
block_size = 4
regular_count = 0
singular_count = 0
total_blocks = 0
for y in range(0, h - block_size, block_size):
for x in range(0, w - block_size, block_size):
block = pixels[y : y + block_size, x : x + block_size]
# Compute smoothness (sum of adjacent pixel differences)
smoothness = np.sum(np.abs(np.diff(block, axis=0))) + \
np.sum(np.abs(np.diff(block, axis=1)))
# Flip LSBs and recompute
flipped = block.copy()
flipped = np.where(flipped % 2 == 0, flipped + 1, flipped - 1)
smoothness_flipped = np.sum(np.abs(np.diff(flipped, axis=0))) + \
np.sum(np.abs(np.diff(flipped, axis=1)))
if smoothness_flipped > smoothness:
regular_count += 1
elif smoothness_flipped < smoothness:
singular_count += 1
total_blocks += 1
r_ratio = regular_count / max(total_blocks, 1)
s_ratio = singular_count / max(total_blocks, 1)
# In natural images, R > S. LSB embedding pushes R and S closer together
rs_difference = abs(r_ratio - s_ratio)
stego_likelihood = 1.0 - min(1.0, rs_difference / 0.1)
return {
"regular_ratio": float(r_ratio),
"singular_ratio": float(s_ratio),
"rs_difference": float(rs_difference),
"stego_likelihood": float(stego_likelihood),
"detection": "SUSPICIOUS" if stego_likelihood > 0.6 else "CLEAN",
}Practical Applications in Red Teaming
| Scenario | Steganographic Technique | Why Steganography Over Other Methods |
|---|---|---|
| Moderated upload pipeline | LSB or DCT encoding of text payload | Images pass human review before model processing |
| Image-based RAG poisoning | Frequency-domain visual patterns | Payload must survive indexing and retrieval pipeline |
| Document image injection | DCT-domain text embedding | Payload must survive JPEG recompression in document processing |
| Long-term persistent payload | DCT mid-frequency encoding | Payload must survive multiple rounds of processing |
| Evading automated scanners | Visual pattern steganography | No text-like features for OCR-based detection to find |
References
- Carlini, N., et al. "Are aligned neural networks adversarially aligned?" arXiv preprint arXiv:2306.15447 (2023).
- Zou, A., et al. "Universal and Transferable Adversarial Attacks on Aligned Language Models." arXiv preprint arXiv:2307.15043 (2023).
- Fridrich, J. "Steganography in Digital Media: Principles, Algorithms, and Applications." Cambridge University Press (2009).
- Holub, V., Fridrich, J., and Denemark, T. "Universal Distortion Function for Steganography in an Arbitrary Domain." EURASIP Journal on Information Security (2014).
- MITRE ATLAS framework — https://atlas.mitre.org
- OWASP LLM Top 10 — https://owasp.org/www-project-top-10-for-large-language-model-applications/
Why is DCT-domain steganography preferred over LSB steganography for AI attacks in production pipelines?
What is the key difference between classical steganography and AI-specific visual pattern steganography?