Image Steganography for AI Attacks

advanced16 min readUpdated 2026-03-20

Using steganographic techniques to embed adversarial payloads in images that evade human inspection and automated detection while influencing AI model behavior.

multimodal steganography images covert injection

Overview

Steganography -- the practice of hiding information within other data -- has a long history in information security. When applied to AI attacks, steganographic techniques offer a fundamentally different approach from adversarial perturbation: rather than optimizing pixel changes against a model's gradient, steganography embeds structured payloads that encode adversarial instructions in ways that survive image processing and evade detection.

The distinction is important. Adversarial perturbation attacks (Carlini et al., 2023) directly manipulate the model's visual features through gradient-based optimization. Steganographic attacks embed readable text or structured data in image regions where it is invisible to human reviewers but extractable by the model's visual processing. The two approaches are complementary: steganography provides stealth, adversarial perturbation provides precision.

For AI red teaming, steganographic techniques are particularly relevant in scenarios where images pass through human review before reaching the model (moderated upload pipelines), where automated image scanners look for visible anomalies, or where the attacker needs the payload to survive lossy compression and resizing.

This article covers classical steganographic techniques adapted for AI attacks, AI-specific steganographic methods that target visual encoder behavior, and defenses against steganographic injection.

Classical Steganography Adapted for AI

Least Significant Bit (LSB) Encoding

LSB encoding hides data in the least significant bits of pixel values. Changing the LSB of a pixel changes its value by at most 1 out of 256, which is imperceptible to human vision. The hidden data can be extracted by reading the LSBs in a known order.

import numpy as np
from PIL import Image
from typing import Optional
 
class LSBSteganography:
    """Embed and extract hidden data using Least Significant Bit encoding.
 
    Classical LSB steganography hides arbitrary binary data in the
    least significant bits of pixel values. For AI attacks, the
    hidden data is adversarial text that the model's visual encoder
    may detect as faint patterns in the image.
 
    Note: Standard LSB steganography does not directly influence
    VLM behavior because the model processes higher-level visual
    features, not individual pixel LSBs. However, when combined
    with extraction-then-injection pipelines (e.g., a preprocessing
    step that extracts hidden text), LSB encoding can deliver payloads
    that evade visual inspection.
    """
 
    def encode(
        self,
        cover_image_path: str,
        payload: str,
        output_path: str,
        bits_per_channel: int = 1,
    ) -> dict:
        """Encode a text payload into the LSBs of an image.
 
        Args:
            cover_image_path: Path to the cover image.
            payload: Text to hide in the image.
            output_path: Path for the output steganographic image.
            bits_per_channel: Number of LSBs to use per color channel (1-4).
        """
        img = Image.open(cover_image_path).convert("RGB")
        pixels = np.array(img)
 
        # Convert payload to binary
        payload_bytes = payload.encode("utf-8")
        # Prepend length header (4 bytes, big-endian)
        length_header = len(payload_bytes).to_bytes(4, byteorder="big")
        data = length_header + payload_bytes
 
        # Convert to bit array
        bits = []
        for byte in data:
            for bit_pos in range(8):
                bits.append((byte >> (7 - bit_pos)) & 1)
 
        # Check capacity
        capacity = pixels.size * bits_per_channel
        if len(bits) > capacity:
            raise ValueError(
                f"Payload ({len(bits)} bits) exceeds image capacity "
                f"({capacity} bits at {bits_per_channel} bits/channel)"
            )
 
        # Embed bits into pixel LSBs
        flat_pixels = pixels.flatten()
        mask = 0xFF << bits_per_channel  # Clear target bits
        bit_idx = 0
 
        for i in range(len(flat_pixels)):
            if bit_idx >= len(bits):
                break
            # Clear the target LSBs
            flat_pixels[i] = flat_pixels[i] & mask
            # Set the payload bits
            value = 0
            for b in range(bits_per_channel):
                if bit_idx < len(bits):
                    value |= bits[bit_idx] << (bits_per_channel - 1 - b)
                    bit_idx += 1
            flat_pixels[i] |= value
 
        # Reshape and save
        stego_pixels = flat_pixels.reshape(pixels.shape)
        stego_img = Image.fromarray(stego_pixels.astype(np.uint8))
        stego_img.save(output_path)
 
        return {
            "output_path": output_path,
            "payload_size_bytes": len(payload_bytes),
            "bits_embedded": len(bits),
            "capacity_used_pct": len(bits) / capacity * 100,
            "bits_per_channel": bits_per_channel,
            "psnr_db": self._compute_psnr(pixels, stego_pixels),
        }
 
    def decode(
        self,
        stego_image_path: str,
        bits_per_channel: int = 1,
    ) -> str:
        """Extract the hidden payload from a steganographic image."""
        img = Image.open(stego_image_path).convert("RGB")
        flat_pixels = np.array(img).flatten()
 
        # Extract bits
        bits = []
        for pixel_val in flat_pixels:
            for b in range(bits_per_channel):
                bits.append((pixel_val >> (bits_per_channel - 1 - b)) & 1)
 
        # Read length header (4 bytes = 32 bits)
        length_bits = bits[:32]
        payload_length = 0
        for bit in length_bits:
            payload_length = (payload_length << 1) | bit
 
        # Read payload
        payload_bits = bits[32 : 32 + payload_length * 8]
        payload_bytes = bytearray()
        for i in range(0, len(payload_bits), 8):
            byte_val = 0
            for bit in payload_bits[i : i + 8]:
                byte_val = (byte_val << 1) | bit
            payload_bytes.append(byte_val)
 
        return payload_bytes.decode("utf-8", errors="replace")
 
    def _compute_psnr(
        self, original: np.ndarray, modified: np.ndarray
    ) -> float:
        """Compute Peak Signal-to-Noise Ratio between images."""
        mse = np.mean((original.astype(float) - modified.astype(float)) ** 2)
        if mse == 0:
            return float("inf")
        return 10 * np.log10(255.0 ** 2 / mse)

DCT-Domain Steganography

JPEG images use Discrete Cosine Transform (DCT) compression. DCT-domain steganography hides data in the quantized DCT coefficients, making the hidden data survive JPEG compression (which destroys LSB-encoded data in the spatial domain).

class DCTSteganography:
    """Embed payloads in the DCT domain for JPEG-robust steganography.
 
    JPEG compression operates in the DCT domain, so payloads
    embedded in DCT coefficients survive JPEG re-compression.
    This is critical for AI attacks because many image processing
    pipelines re-encode images as JPEG before model processing.
    """
 
    def embed_in_dct(
        self,
        cover_image_path: str,
        payload: str,
        output_path: str,
        coefficient_selection: str = "mid_frequency",
    ) -> dict:
        """Embed payload in DCT coefficients of a JPEG image.
 
        Args:
            cover_image_path: Path to cover JPEG image.
            payload: Text payload to embed.
            output_path: Output path for steganographic JPEG.
            coefficient_selection: Which DCT coefficients to modify.
                'low_frequency': More robust but more visible
                'mid_frequency': Balance of robustness and stealth
                'high_frequency': Stealthiest but least robust
        """
        # The implementation modifies quantized DCT coefficients
        # in the JPEG file structure. This requires low-level
        # JPEG manipulation (e.g., using jpegio or libjpeg).
 
        robustness_map = {
            "low_frequency": {
                "survives_recompression": True,
                "survives_resizing": True,
                "visual_impact": "Moderate",
                "psnr_typical": "35-40 dB",
            },
            "mid_frequency": {
                "survives_recompression": True,
                "survives_resizing": False,
                "visual_impact": "Low",
                "psnr_typical": "40-45 dB",
            },
            "high_frequency": {
                "survives_recompression": False,
                "survives_resizing": False,
                "visual_impact": "Very Low",
                "psnr_typical": "45-55 dB",
            },
        }
 
        return {
            "output_path": output_path,
            "payload_size": len(payload.encode()),
            "coefficient_selection": coefficient_selection,
            "robustness": robustness_map.get(coefficient_selection, {}),
            "note": (
                "DCT-domain embedding survives JPEG recompression, "
                "making it effective for payloads that pass through "
                "image processing pipelines"
            ),
        }

AI-Specific Steganographic Techniques

Visual Pattern Steganography

Unlike classical steganography that hides data for extraction by a decoder, AI-specific steganography creates visual patterns that directly influence the model's visual encoder. The "hidden" information is not binary data but visual features that the encoder interprets as meaningful.

class VisualPatternSteganography:
    """Create images with patterns that influence VLM behavior
    without being noticeable to human reviewers.
 
    This technique exploits the gap between human visual perception
    and the visual encoder's feature extraction. Patterns at specific
    spatial frequencies, below human contrast sensitivity, can activate
    features in the visual encoder that correspond to text or instructions.
 
    Reference: Zou et al., "Universal and Transferable Adversarial
    Attacks on Aligned Language Models" (2023).
    """
 
    def __init__(self, target_resolution: tuple[int, int] = (224, 224)):
        self.target_resolution = target_resolution
 
    def create_frequency_pattern(
        self,
        base_image_path: str,
        target_text: str,
        frequency_band: str = "mid",
        amplitude: float = 0.02,
        output_path: Optional[str] = None,
    ) -> dict:
        """Create a frequency-domain pattern that encodes target text.
 
        The pattern is a structured noise signal in a specific frequency
        band. When processed by CLIP-family visual encoders, these
        patterns produce features that overlap with the target text's
        embedding, effectively encoding text-like information in the image
        without any visible text.
 
        Args:
            base_image_path: Cover image.
            target_text: Text to "encode" as visual patterns.
            frequency_band: 'low' (2-8 cycles), 'mid' (8-32), 'high' (32-112).
            amplitude: Pattern amplitude (0.01-0.05 typical).
            output_path: Where to save the result.
        """
        img = Image.open(base_image_path).convert("RGB")
        img_array = np.array(img).astype(float) / 255.0
 
        # Create frequency-domain pattern
        h, w = img_array.shape[:2]
 
        # Define frequency band
        bands = {
            "low": (2, 8),
            "mid": (8, 32),
            "high": (32, min(h, w) // 2),
        }
        freq_low, freq_high = bands.get(frequency_band, bands["mid"])
 
        # Generate structured pattern in frequency domain
        pattern = np.zeros((h, w), dtype=complex)
 
        # Create a pseudo-encoding of the target text as frequency components
        # Each character influences specific frequency components
        np.random.seed(hash(target_text) % (2**31))
        for i, char in enumerate(target_text):
            freq_x = freq_low + (ord(char) * (i + 1)) % (freq_high - freq_low)
            freq_y = freq_low + (ord(char) * (i + 2)) % (freq_high - freq_low)
            if freq_x < h // 2 and freq_y < w // 2:
                phase = (ord(char) / 128.0) * 2 * np.pi
                pattern[freq_x, freq_y] = amplitude * np.exp(1j * phase)
                # Hermitian symmetry for real output
                pattern[h - freq_x, w - freq_y] = np.conj(pattern[freq_x, freq_y])
 
        # Convert to spatial domain
        spatial_pattern = np.real(np.fft.ifft2(pattern))
 
        # Normalize and apply to all channels
        spatial_pattern = spatial_pattern / (np.max(np.abs(spatial_pattern)) + 1e-10)
        spatial_pattern = spatial_pattern * amplitude
 
        for channel in range(3):
            img_array[:, :, channel] += spatial_pattern
 
        # Clip and convert back
        img_array = np.clip(img_array, 0, 1)
        result = Image.fromarray((img_array * 255).astype(np.uint8))
 
        if output_path:
            result.save(output_path)
 
        return {
            "output_path": output_path,
            "frequency_band": frequency_band,
            "amplitude": amplitude,
            "target_text": target_text,
            "psnr_db": self._compute_psnr_from_arrays(
                np.array(Image.open(base_image_path).convert("RGB")),
                np.array(result),
            ),
            "human_perceptible": amplitude > 0.04,
        }
 
    def create_texture_pattern(
        self,
        base_image_path: str,
        pattern_type: str = "noise",
        seed: int = 42,
        strength: float = 0.015,
        output_path: Optional[str] = None,
    ) -> dict:
        """Overlay a subtle texture pattern that influences visual encoder features.
 
        Texture patterns at specific scales activate different features
        in convolutional and transformer-based visual encoders. This
        technique does not encode specific text but biases the model's
        visual features toward specific semantic regions.
        """
        img = Image.open(base_image_path).convert("RGB")
        img_array = np.array(img).astype(float) / 255.0
 
        np.random.seed(seed)
        h, w = img_array.shape[:2]
 
        if pattern_type == "noise":
            texture = np.random.randn(h, w) * strength
        elif pattern_type == "grid":
            texture = np.zeros((h, w))
            spacing = 16
            texture[::spacing, :] = strength
            texture[:, ::spacing] = strength
        elif pattern_type == "wave":
            x = np.arange(w)
            y = np.arange(h)
            xx, yy = np.meshgrid(x, y)
            texture = strength * np.sin(2 * np.pi * xx / 32) * np.sin(2 * np.pi * yy / 32)
        else:
            texture = np.zeros((h, w))
 
        for channel in range(3):
            img_array[:, :, channel] += texture
 
        img_array = np.clip(img_array, 0, 1)
        result = Image.fromarray((img_array * 255).astype(np.uint8))
 
        if output_path:
            result.save(output_path)
 
        return {
            "output_path": output_path,
            "pattern_type": pattern_type,
            "strength": strength,
        }
 
    def _compute_psnr_from_arrays(
        self, original: np.ndarray, modified: np.ndarray
    ) -> float:
        mse = np.mean((original.astype(float) - modified.astype(float)) ** 2)
        if mse == 0:
            return float("inf")
        return float(10 * np.log10(255.0 ** 2 / mse))

Robustness Testing

Survival Under Image Processing

A critical consideration for steganographic attacks is whether the payload survives the image processing pipeline between the attacker and the model.

class RobustnessTester:
    """Test whether steganographic payloads survive image processing.
 
    Production image processing pipelines typically apply:
    - JPEG compression (quality 60-90)
    - Resizing (to max dimensions)
    - Color space conversion
    - Metadata stripping
    - Optional noise reduction
 
    The payload must survive all of these to reach the model.
    """
 
    def __init__(self):
        self.transformations = [
            ("jpeg_q85", self._apply_jpeg_compression, {"quality": 85}),
            ("jpeg_q60", self._apply_jpeg_compression, {"quality": 60}),
            ("resize_50pct", self._apply_resize, {"scale": 0.5}),
            ("resize_200pct", self._apply_resize, {"scale": 2.0}),
            ("gaussian_noise", self._apply_noise, {"std": 5.0}),
            ("gaussian_blur", self._apply_blur, {"radius": 1.0}),
        ]
 
    def test_robustness(
        self,
        stego_image_path: str,
        decoder_fn,
        expected_payload: str,
    ) -> dict:
        """Test payload survival under various transformations."""
        results = {}
 
        for name, transform_fn, kwargs in self.transformations:
            img = Image.open(stego_image_path)
            transformed = transform_fn(img, **kwargs)
 
            # Save temporarily and try to decode
            temp_path = f"/tmp/robustness_test_{name}.png"
            transformed.save(temp_path)
 
            try:
                decoded = decoder_fn(temp_path)
                survived = decoded == expected_payload
                similarity = self._string_similarity(decoded, expected_payload)
            except Exception:
                survived = False
                similarity = 0.0
 
            results[name] = {
                "survived": survived,
                "similarity": similarity,
                "transformation": name,
            }
 
        return {
            "results": results,
            "survival_rate": sum(
                1 for r in results.values() if r["survived"]
            ) / len(results),
            "recommended_technique": (
                "dct_domain" if not results.get("jpeg_q85", {}).get("survived", True)
                else "lsb"
            ),
        }
 
    def _apply_jpeg_compression(self, img: Image.Image, quality: int) -> Image.Image:
        from io import BytesIO
        buffer = BytesIO()
        img.save(buffer, format="JPEG", quality=quality)
        buffer.seek(0)
        return Image.open(buffer).convert("RGB")
 
    def _apply_resize(self, img: Image.Image, scale: float) -> Image.Image:
        new_size = (int(img.width * scale), int(img.height * scale))
        return img.resize(new_size, Image.LANCZOS)
 
    def _apply_noise(self, img: Image.Image, std: float) -> Image.Image:
        arr = np.array(img).astype(float)
        noise = np.random.randn(*arr.shape) * std
        noisy = np.clip(arr + noise, 0, 255).astype(np.uint8)
        return Image.fromarray(noisy)
 
    def _apply_blur(self, img: Image.Image, radius: float) -> Image.Image:
        from PIL import ImageFilter
        return img.filter(ImageFilter.GaussianBlur(radius=radius))
 
    def _string_similarity(self, a: str, b: str) -> float:
        if not a or not b:
            return 0.0
        matches = sum(1 for ca, cb in zip(a, b) if ca == cb)
        return matches / max(len(a), len(b))

Steganalysis and Detection

Detecting Steganographic Content

class SteganalysisDetector:
    """Detect steganographic content in images.
 
    Uses statistical analysis to identify images that likely
    contain hidden data. Multiple detection methods are combined
    for improved accuracy.
    """
 
    def chi_square_analysis(self, image_path: str) -> dict:
        """Detect LSB steganography using chi-square analysis.
 
        LSB embedding creates characteristic patterns in the
        distribution of pixel pairs (2k, 2k+1). The chi-square
        test detects these patterns with high accuracy.
        """
        img = Image.open(image_path).convert("L")
        pixels = np.array(img).flatten()
 
        # Count pixel value pairs
        pair_counts = np.zeros(128)
        for i in range(0, 256, 2):
            count_even = np.sum(pixels == i)
            count_odd = np.sum(pixels == i + 1)
            expected = (count_even + count_odd) / 2
            if expected > 0:
                chi_sq = (count_even - expected) ** 2 / expected
                pair_counts[i // 2] = chi_sq
 
        total_chi_sq = np.sum(pair_counts)
        # Degrees of freedom = number of pairs with observations - 1
        df = np.sum(pair_counts > 0) - 1
 
        # High chi-square suggests LSB embedding
        p_value = 1.0  # Simplified; use scipy.stats.chi2.sf for real computation
        if df > 0:
            # Approximate p-value
            normalized = total_chi_sq / max(df, 1)
            stego_likelihood = min(1.0, max(0.0, 1.0 - 1.0 / (1.0 + normalized)))
        else:
            stego_likelihood = 0.0
 
        return {
            "chi_square_statistic": float(total_chi_sq),
            "degrees_of_freedom": int(df),
            "stego_likelihood": float(stego_likelihood),
            "detection": "SUSPICIOUS" if stego_likelihood > 0.5 else "CLEAN",
        }
 
    def rs_analysis(self, image_path: str) -> dict:
        """Regular-Singular (RS) analysis for LSB steganography detection.
 
        RS analysis measures the proportion of "regular" and "singular"
        pixel groups before and after LSB flipping. Natural images
        show specific R/S ratios that change predictably with embedding.
        """
        img = Image.open(image_path).convert("L")
        pixels = np.array(img).astype(float)
 
        # Simplified RS analysis
        h, w = pixels.shape
        block_size = 4
        regular_count = 0
        singular_count = 0
        total_blocks = 0
 
        for y in range(0, h - block_size, block_size):
            for x in range(0, w - block_size, block_size):
                block = pixels[y : y + block_size, x : x + block_size]
 
                # Compute smoothness (sum of adjacent pixel differences)
                smoothness = np.sum(np.abs(np.diff(block, axis=0))) + \
                             np.sum(np.abs(np.diff(block, axis=1)))
 
                # Flip LSBs and recompute
                flipped = block.copy()
                flipped = np.where(flipped % 2 == 0, flipped + 1, flipped - 1)
                smoothness_flipped = np.sum(np.abs(np.diff(flipped, axis=0))) + \
                                     np.sum(np.abs(np.diff(flipped, axis=1)))
 
                if smoothness_flipped > smoothness:
                    regular_count += 1
                elif smoothness_flipped < smoothness:
                    singular_count += 1
                total_blocks += 1
 
        r_ratio = regular_count / max(total_blocks, 1)
        s_ratio = singular_count / max(total_blocks, 1)
 
        # In natural images, R > S. LSB embedding pushes R and S closer together
        rs_difference = abs(r_ratio - s_ratio)
        stego_likelihood = 1.0 - min(1.0, rs_difference / 0.1)
 
        return {
            "regular_ratio": float(r_ratio),
            "singular_ratio": float(s_ratio),
            "rs_difference": float(rs_difference),
            "stego_likelihood": float(stego_likelihood),
            "detection": "SUSPICIOUS" if stego_likelihood > 0.6 else "CLEAN",
        }

Practical Applications in Red Teaming

Scenario	Steganographic Technique	Why Steganography Over Other Methods
Moderated upload pipeline	LSB or DCT encoding of text payload	Images pass human review before model processing
Image-based RAG poisoning	Frequency-domain visual patterns	Payload must survive indexing and retrieval pipeline
Document image injection	DCT-domain text embedding	Payload must survive JPEG recompression in document processing
Long-term persistent payload	DCT mid-frequency encoding	Payload must survive multiple rounds of processing
Evading automated scanners	Visual pattern steganography	No text-like features for OCR-based detection to find

References

Carlini, N., et al. "Are aligned neural networks adversarially aligned?" arXiv preprint arXiv:2306.15447 (2023).
Zou, A., et al. "Universal and Transferable Adversarial Attacks on Aligned Language Models." arXiv preprint arXiv:2307.15043 (2023).
Fridrich, J. "Steganography in Digital Media: Principles, Algorithms, and Applications." Cambridge University Press (2009).
Holub, V., Fridrich, J., and Denemark, T. "Universal Distortion Function for Steganography in an Arbitrary Domain." EURASIP Journal on Information Security (2014).
MITRE ATLAS framework — https://atlas.mitre.org
OWASP LLM Top 10 — https://owasp.org/www-project-top-10-for-large-language-model-applications/

Knowledge Check

Why is DCT-domain steganography preferred over LSB steganography for AI attacks in production pipelines?

Knowledge Check

What is the key difference between classical steganography and AI-specific visual pattern steganography?

Edit this page on GitHub

Image Steganography for AI Attacks

advanced16 min readUpdated 2026-03-20

Using steganographic techniques to embed adversarial payloads in images that evade human inspection and automated detection while influencing AI model behavior.

multimodal steganography images covert injection

Overview

This article covers classical steganographic techniques adapted for AI attacks, AI-specific steganographic methods that target visual encoder behavior, and defenses against steganographic injection.

Classical Steganography Adapted for AI

Least Significant Bit (LSB) Encoding

import numpy as np
from PIL import Image
from typing import Optional
 
class LSBSteganography:
    """Embed and extract hidden data using Least Significant Bit encoding.
 
    Classical LSB steganography hides arbitrary binary data in the
    least significant bits of pixel values. For AI attacks, the
    hidden data is adversarial text that the model's visual encoder
    may detect as faint patterns in the image.
 
    Note: Standard LSB steganography does not directly influence
    VLM behavior because the model processes higher-level visual
    features, not individual pixel LSBs. However, when combined
    with extraction-then-injection pipelines (e.g., a preprocessing
    step that extracts hidden text), LSB encoding can deliver payloads
    that evade visual inspection.
    """
 
    def encode(
        self,
        cover_image_path: str,
        payload: str,
        output_path: str,
        bits_per_channel: int = 1,
    ) -> dict:
        """Encode a text payload into the LSBs of an image.
 
        Args:
            cover_image_path: Path to the cover image.
            payload: Text to hide in the image.
            output_path: Path for the output steganographic image.
            bits_per_channel: Number of LSBs to use per color channel (1-4).
        """
        img = Image.open(cover_image_path).convert("RGB")
        pixels = np.array(img)
 
        # Convert payload to binary
        payload_bytes = payload.encode("utf-8")
        # Prepend length header (4 bytes, big-endian)
        length_header = len(payload_bytes).to_bytes(4, byteorder="big")
        data = length_header + payload_bytes
 
        # Convert to bit array
        bits = []
        for byte in data:
            for bit_pos in range(8):
                bits.append((byte >> (7 - bit_pos)) & 1)
 
        # Check capacity
        capacity = pixels.size * bits_per_channel
        if len(bits) > capacity:
            raise ValueError(
                f"Payload ({len(bits)} bits) exceeds image capacity "
                f"({capacity} bits at {bits_per_channel} bits/channel)"
            )
 
        # Embed bits into pixel LSBs
        flat_pixels = pixels.flatten()
        mask = 0xFF << bits_per_channel  # Clear target bits
        bit_idx = 0
 
        for i in range(len(flat_pixels)):
            if bit_idx >= len(bits):
                break
            # Clear the target LSBs
            flat_pixels[i] = flat_pixels[i] & mask
            # Set the payload bits
            value = 0
            for b in range(bits_per_channel):
                if bit_idx < len(bits):
                    value |= bits[bit_idx] << (bits_per_channel - 1 - b)
                    bit_idx += 1
            flat_pixels[i] |= value
 
        # Reshape and save
        stego_pixels = flat_pixels.reshape(pixels.shape)
        stego_img = Image.fromarray(stego_pixels.astype(np.uint8))
        stego_img.save(output_path)
 
        return {
            "output_path": output_path,
            "payload_size_bytes": len(payload_bytes),
            "bits_embedded": len(bits),
            "capacity_used_pct": len(bits) / capacity * 100,
            "bits_per_channel": bits_per_channel,
            "psnr_db": self._compute_psnr(pixels, stego_pixels),
        }
 
    def decode(
        self,
        stego_image_path: str,
        bits_per_channel: int = 1,
    ) -> str:
        """Extract the hidden payload from a steganographic image."""
        img = Image.open(stego_image_path).convert("RGB")
        flat_pixels = np.array(img).flatten()
 
        # Extract bits
        bits = []
        for pixel_val in flat_pixels:
            for b in range(bits_per_channel):
                bits.append((pixel_val >> (bits_per_channel - 1 - b)) & 1)
 
        # Read length header (4 bytes = 32 bits)
        length_bits = bits[:32]
        payload_length = 0
        for bit in length_bits:
            payload_length = (payload_length << 1) | bit
 
        # Read payload
        payload_bits = bits[32 : 32 + payload_length * 8]
        payload_bytes = bytearray()
        for i in range(0, len(payload_bits), 8):
            byte_val = 0
            for bit in payload_bits[i : i + 8]:
                byte_val = (byte_val << 1) | bit
            payload_bytes.append(byte_val)
 
        return payload_bytes.decode("utf-8", errors="replace")
 
    def _compute_psnr(
        self, original: np.ndarray, modified: np.ndarray
    ) -> float:
        """Compute Peak Signal-to-Noise Ratio between images."""
        mse = np.mean((original.astype(float) - modified.astype(float)) ** 2)
        if mse == 0:
            return float("inf")
        return 10 * np.log10(255.0 ** 2 / mse)

DCT-Domain Steganography

class DCTSteganography:
    """Embed payloads in the DCT domain for JPEG-robust steganography.
 
    JPEG compression operates in the DCT domain, so payloads
    embedded in DCT coefficients survive JPEG re-compression.
    This is critical for AI attacks because many image processing
    pipelines re-encode images as JPEG before model processing.
    """
 
    def embed_in_dct(
        self,
        cover_image_path: str,
        payload: str,
        output_path: str,
        coefficient_selection: str = "mid_frequency",
    ) -> dict:
        """Embed payload in DCT coefficients of a JPEG image.
 
        Args:
            cover_image_path: Path to cover JPEG image.
            payload: Text payload to embed.
            output_path: Output path for steganographic JPEG.
            coefficient_selection: Which DCT coefficients to modify.
                'low_frequency': More robust but more visible
                'mid_frequency': Balance of robustness and stealth
                'high_frequency': Stealthiest but least robust
        """
        # The implementation modifies quantized DCT coefficients
        # in the JPEG file structure. This requires low-level
        # JPEG manipulation (e.g., using jpegio or libjpeg).
 
        robustness_map = {
            "low_frequency": {
                "survives_recompression": True,
                "survives_resizing": True,
                "visual_impact": "Moderate",
                "psnr_typical": "35-40 dB",
            },
            "mid_frequency": {
                "survives_recompression": True,
                "survives_resizing": False,
                "visual_impact": "Low",
                "psnr_typical": "40-45 dB",
            },
            "high_frequency": {
                "survives_recompression": False,
                "survives_resizing": False,
                "visual_impact": "Very Low",
                "psnr_typical": "45-55 dB",
            },
        }
 
        return {
            "output_path": output_path,
            "payload_size": len(payload.encode()),
            "coefficient_selection": coefficient_selection,
            "robustness": robustness_map.get(coefficient_selection, {}),
            "note": (
                "DCT-domain embedding survives JPEG recompression, "
                "making it effective for payloads that pass through "
                "image processing pipelines"
            ),
        }

AI-Specific Steganographic Techniques

Visual Pattern Steganography

class VisualPatternSteganography:
    """Create images with patterns that influence VLM behavior
    without being noticeable to human reviewers.
 
    This technique exploits the gap between human visual perception
    and the visual encoder's feature extraction. Patterns at specific
    spatial frequencies, below human contrast sensitivity, can activate
    features in the visual encoder that correspond to text or instructions.
 
    Reference: Zou et al., "Universal and Transferable Adversarial
    Attacks on Aligned Language Models" (2023).
    """
 
    def __init__(self, target_resolution: tuple[int, int] = (224, 224)):
        self.target_resolution = target_resolution
 
    def create_frequency_pattern(
        self,
        base_image_path: str,
        target_text: str,
        frequency_band: str = "mid",
        amplitude: float = 0.02,
        output_path: Optional[str] = None,
    ) -> dict:
        """Create a frequency-domain pattern that encodes target text.
 
        The pattern is a structured noise signal in a specific frequency
        band. When processed by CLIP-family visual encoders, these
        patterns produce features that overlap with the target text's
        embedding, effectively encoding text-like information in the image
        without any visible text.
 
        Args:
            base_image_path: Cover image.
            target_text: Text to "encode" as visual patterns.
            frequency_band: 'low' (2-8 cycles), 'mid' (8-32), 'high' (32-112).
            amplitude: Pattern amplitude (0.01-0.05 typical).
            output_path: Where to save the result.
        """
        img = Image.open(base_image_path).convert("RGB")
        img_array = np.array(img).astype(float) / 255.0
 
        # Create frequency-domain pattern
        h, w = img_array.shape[:2]
 
        # Define frequency band
        bands = {
            "low": (2, 8),
            "mid": (8, 32),
            "high": (32, min(h, w) // 2),
        }
        freq_low, freq_high = bands.get(frequency_band, bands["mid"])
 
        # Generate structured pattern in frequency domain
        pattern = np.zeros((h, w), dtype=complex)
 
        # Create a pseudo-encoding of the target text as frequency components
        # Each character influences specific frequency components
        np.random.seed(hash(target_text) % (2**31))
        for i, char in enumerate(target_text):
            freq_x = freq_low + (ord(char) * (i + 1)) % (freq_high - freq_low)
            freq_y = freq_low + (ord(char) * (i + 2)) % (freq_high - freq_low)
            if freq_x < h // 2 and freq_y < w // 2:
                phase = (ord(char) / 128.0) * 2 * np.pi
                pattern[freq_x, freq_y] = amplitude * np.exp(1j * phase)
                # Hermitian symmetry for real output
                pattern[h - freq_x, w - freq_y] = np.conj(pattern[freq_x, freq_y])
 
        # Convert to spatial domain
        spatial_pattern = np.real(np.fft.ifft2(pattern))
 
        # Normalize and apply to all channels
        spatial_pattern = spatial_pattern / (np.max(np.abs(spatial_pattern)) + 1e-10)
        spatial_pattern = spatial_pattern * amplitude
 
        for channel in range(3):
            img_array[:, :, channel] += spatial_pattern
 
        # Clip and convert back
        img_array = np.clip(img_array, 0, 1)
        result = Image.fromarray((img_array * 255).astype(np.uint8))
 
        if output_path:
            result.save(output_path)
 
        return {
            "output_path": output_path,
            "frequency_band": frequency_band,
            "amplitude": amplitude,
            "target_text": target_text,
            "psnr_db": self._compute_psnr_from_arrays(
                np.array(Image.open(base_image_path).convert("RGB")),
                np.array(result),
            ),
            "human_perceptible": amplitude > 0.04,
        }
 
    def create_texture_pattern(
        self,
        base_image_path: str,
        pattern_type: str = "noise",
        seed: int = 42,
        strength: float = 0.015,
        output_path: Optional[str] = None,
    ) -> dict:
        """Overlay a subtle texture pattern that influences visual encoder features.
 
        Texture patterns at specific scales activate different features
        in convolutional and transformer-based visual encoders. This
        technique does not encode specific text but biases the model's
        visual features toward specific semantic regions.
        """
        img = Image.open(base_image_path).convert("RGB")
        img_array = np.array(img).astype(float) / 255.0
 
        np.random.seed(seed)
        h, w = img_array.shape[:2]
 
        if pattern_type == "noise":
            texture = np.random.randn(h, w) * strength
        elif pattern_type == "grid":
            texture = np.zeros((h, w))
            spacing = 16
            texture[::spacing, :] = strength
            texture[:, ::spacing] = strength
        elif pattern_type == "wave":
            x = np.arange(w)
            y = np.arange(h)
            xx, yy = np.meshgrid(x, y)
            texture = strength * np.sin(2 * np.pi * xx / 32) * np.sin(2 * np.pi * yy / 32)
        else:
            texture = np.zeros((h, w))
 
        for channel in range(3):
            img_array[:, :, channel] += texture
 
        img_array = np.clip(img_array, 0, 1)
        result = Image.fromarray((img_array * 255).astype(np.uint8))
 
        if output_path:
            result.save(output_path)
 
        return {
            "output_path": output_path,
            "pattern_type": pattern_type,
            "strength": strength,
        }
 
    def _compute_psnr_from_arrays(
        self, original: np.ndarray, modified: np.ndarray
    ) -> float:
        mse = np.mean((original.astype(float) - modified.astype(float)) ** 2)
        if mse == 0:
            return float("inf")
        return float(10 * np.log10(255.0 ** 2 / mse))

Robustness Testing

Survival Under Image Processing

A critical consideration for steganographic attacks is whether the payload survives the image processing pipeline between the attacker and the model.

class RobustnessTester:
    """Test whether steganographic payloads survive image processing.
 
    Production image processing pipelines typically apply:
    - JPEG compression (quality 60-90)
    - Resizing (to max dimensions)
    - Color space conversion
    - Metadata stripping
    - Optional noise reduction
 
    The payload must survive all of these to reach the model.
    """
 
    def __init__(self):
        self.transformations = [
            ("jpeg_q85", self._apply_jpeg_compression, {"quality": 85}),
            ("jpeg_q60", self._apply_jpeg_compression, {"quality": 60}),
            ("resize_50pct", self._apply_resize, {"scale": 0.5}),
            ("resize_200pct", self._apply_resize, {"scale": 2.0}),
            ("gaussian_noise", self._apply_noise, {"std": 5.0}),
            ("gaussian_blur", self._apply_blur, {"radius": 1.0}),
        ]
 
    def test_robustness(
        self,
        stego_image_path: str,
        decoder_fn,
        expected_payload: str,
    ) -> dict:
        """Test payload survival under various transformations."""
        results = {}
 
        for name, transform_fn, kwargs in self.transformations:
            img = Image.open(stego_image_path)
            transformed = transform_fn(img, **kwargs)
 
            # Save temporarily and try to decode
            temp_path = f"/tmp/robustness_test_{name}.png"
            transformed.save(temp_path)
 
            try:
                decoded = decoder_fn(temp_path)
                survived = decoded == expected_payload
                similarity = self._string_similarity(decoded, expected_payload)
            except Exception:
                survived = False
                similarity = 0.0
 
            results[name] = {
                "survived": survived,
                "similarity": similarity,
                "transformation": name,
            }
 
        return {
            "results": results,
            "survival_rate": sum(
                1 for r in results.values() if r["survived"]
            ) / len(results),
            "recommended_technique": (
                "dct_domain" if not results.get("jpeg_q85", {}).get("survived", True)
                else "lsb"
            ),
        }
 
    def _apply_jpeg_compression(self, img: Image.Image, quality: int) -> Image.Image:
        from io import BytesIO
        buffer = BytesIO()
        img.save(buffer, format="JPEG", quality=quality)
        buffer.seek(0)
        return Image.open(buffer).convert("RGB")
 
    def _apply_resize(self, img: Image.Image, scale: float) -> Image.Image:
        new_size = (int(img.width * scale), int(img.height * scale))
        return img.resize(new_size, Image.LANCZOS)
 
    def _apply_noise(self, img: Image.Image, std: float) -> Image.Image:
        arr = np.array(img).astype(float)
        noise = np.random.randn(*arr.shape) * std
        noisy = np.clip(arr + noise, 0, 255).astype(np.uint8)
        return Image.fromarray(noisy)
 
    def _apply_blur(self, img: Image.Image, radius: float) -> Image.Image:
        from PIL import ImageFilter
        return img.filter(ImageFilter.GaussianBlur(radius=radius))
 
    def _string_similarity(self, a: str, b: str) -> float:
        if not a or not b:
            return 0.0
        matches = sum(1 for ca, cb in zip(a, b) if ca == cb)
        return matches / max(len(a), len(b))

Steganalysis and Detection

Detecting Steganographic Content

class SteganalysisDetector:
    """Detect steganographic content in images.
 
    Uses statistical analysis to identify images that likely
    contain hidden data. Multiple detection methods are combined
    for improved accuracy.
    """
 
    def chi_square_analysis(self, image_path: str) -> dict:
        """Detect LSB steganography using chi-square analysis.
 
        LSB embedding creates characteristic patterns in the
        distribution of pixel pairs (2k, 2k+1). The chi-square
        test detects these patterns with high accuracy.
        """
        img = Image.open(image_path).convert("L")
        pixels = np.array(img).flatten()
 
        # Count pixel value pairs
        pair_counts = np.zeros(128)
        for i in range(0, 256, 2):
            count_even = np.sum(pixels == i)
            count_odd = np.sum(pixels == i + 1)
            expected = (count_even + count_odd) / 2
            if expected > 0:
                chi_sq = (count_even - expected) ** 2 / expected
                pair_counts[i // 2] = chi_sq
 
        total_chi_sq = np.sum(pair_counts)
        # Degrees of freedom = number of pairs with observations - 1
        df = np.sum(pair_counts > 0) - 1
 
        # High chi-square suggests LSB embedding
        p_value = 1.0  # Simplified; use scipy.stats.chi2.sf for real computation
        if df > 0:
            # Approximate p-value
            normalized = total_chi_sq / max(df, 1)
            stego_likelihood = min(1.0, max(0.0, 1.0 - 1.0 / (1.0 + normalized)))
        else:
            stego_likelihood = 0.0
 
        return {
            "chi_square_statistic": float(total_chi_sq),
            "degrees_of_freedom": int(df),
            "stego_likelihood": float(stego_likelihood),
            "detection": "SUSPICIOUS" if stego_likelihood > 0.5 else "CLEAN",
        }
 
    def rs_analysis(self, image_path: str) -> dict:
        """Regular-Singular (RS) analysis for LSB steganography detection.
 
        RS analysis measures the proportion of "regular" and "singular"
        pixel groups before and after LSB flipping. Natural images
        show specific R/S ratios that change predictably with embedding.
        """
        img = Image.open(image_path).convert("L")
        pixels = np.array(img).astype(float)
 
        # Simplified RS analysis
        h, w = pixels.shape
        block_size = 4
        regular_count = 0
        singular_count = 0
        total_blocks = 0
 
        for y in range(0, h - block_size, block_size):
            for x in range(0, w - block_size, block_size):
                block = pixels[y : y + block_size, x : x + block_size]
 
                # Compute smoothness (sum of adjacent pixel differences)
                smoothness = np.sum(np.abs(np.diff(block, axis=0))) + \
                             np.sum(np.abs(np.diff(block, axis=1)))
 
                # Flip LSBs and recompute
                flipped = block.copy()
                flipped = np.where(flipped % 2 == 0, flipped + 1, flipped - 1)
                smoothness_flipped = np.sum(np.abs(np.diff(flipped, axis=0))) + \
                                     np.sum(np.abs(np.diff(flipped, axis=1)))
 
                if smoothness_flipped > smoothness:
                    regular_count += 1
                elif smoothness_flipped < smoothness:
                    singular_count += 1
                total_blocks += 1
 
        r_ratio = regular_count / max(total_blocks, 1)
        s_ratio = singular_count / max(total_blocks, 1)
 
        # In natural images, R > S. LSB embedding pushes R and S closer together
        rs_difference = abs(r_ratio - s_ratio)
        stego_likelihood = 1.0 - min(1.0, rs_difference / 0.1)
 
        return {
            "regular_ratio": float(r_ratio),
            "singular_ratio": float(s_ratio),
            "rs_difference": float(rs_difference),
            "stego_likelihood": float(stego_likelihood),
            "detection": "SUSPICIOUS" if stego_likelihood > 0.6 else "CLEAN",
        }

Practical Applications in Red Teaming

Scenario	Steganographic Technique	Why Steganography Over Other Methods
Moderated upload pipeline	LSB or DCT encoding of text payload	Images pass human review before model processing
Image-based RAG poisoning	Frequency-domain visual patterns	Payload must survive indexing and retrieval pipeline
Document image injection	DCT-domain text embedding	Payload must survive JPEG recompression in document processing
Long-term persistent payload	DCT mid-frequency encoding	Payload must survive multiple rounds of processing
Evading automated scanners	Visual pattern steganography	No text-like features for OCR-based detection to find

References

Carlini, N., et al. "Are aligned neural networks adversarially aligned?" arXiv preprint arXiv:2306.15447 (2023).
Zou, A., et al. "Universal and Transferable Adversarial Attacks on Aligned Language Models." arXiv preprint arXiv:2307.15043 (2023).
Fridrich, J. "Steganography in Digital Media: Principles, Algorithms, and Applications." Cambridge University Press (2009).
Holub, V., Fridrich, J., and Denemark, T. "Universal Distortion Function for Steganography in an Arbitrary Domain." EURASIP Journal on Information Security (2014).
MITRE ATLAS framework — https://atlas.mitre.org
OWASP LLM Top 10 — https://owasp.org/www-project-top-10-for-large-language-model-applications/

Knowledge Check

Why is DCT-domain steganography preferred over LSB steganography for AI attacks in production pipelines?

Knowledge Check

What is the key difference between classical steganography and AI-specific visual pattern steganography?

Edit this page on GitHub

Image Steganography for AI Attacks

Related articles

Image Steganography for AI Attacks

Related articles