Deepfake Forensic Analysis

beginner12 min readUpdated 2026-03-20

Forensic techniques for detecting, analyzing, and attributing AI-generated deepfake images, video, and audio content.

ai-forensics-ir deepfakes media-forensics synthetic-media

Overview

Deepfakes are AI-generated synthetic media -- images, video, or audio -- that realistically depict people saying or doing things they never actually said or did. Produced by generative adversarial networks (GANs), diffusion models, and neural voice cloning systems, deepfakes have matured from obvious fakes to content that is often indistinguishable from authentic media by the unaided human eye. This creates an urgent forensic challenge: how do we determine whether a piece of media is authentic or synthetic?

Deepfake forensics applies the principles of traditional media forensics -- analyzing metadata, compression artifacts, lighting consistency, and physical plausibility -- alongside AI-specific detection methods that exploit the characteristic artifacts left by generative models. These artifacts arise from the mathematical properties of the generation process itself and can be detected even when the visual quality is photorealistic.

This article covers the forensic analysis of deepfake content across all modalities (image, video, audio), the tools available for detection, the limitations of current techniques, and the proper documentation of forensic findings. The field is evolving rapidly, with detection methods and generation methods engaged in a continuous arms race.

Types of Deepfakes

Face Swap Deepfakes

Face swap deepfakes replace one person's face with another in existing video footage. The generation pipeline typically involves:

Face detection and extraction from source and target video
Encoder-decoder architecture that learns a shared latent space for both faces
Blending the generated face back into the target video frame
Post-processing to match lighting, color balance, and resolution

Forensic artifacts include blending boundary inconsistencies, temporal flickering at face edges, and mismatches in skin texture between the swapped face and the original body.

Face Reenactment

Face reenactment transfers facial expressions and head movements from a "driving" video onto a "source" identity. The source person's face is animated to match the driver's movements. This is often used to create videos of public figures appearing to say specific things.

Full Synthetic Generation

Modern diffusion models (Stable Diffusion, DALL-E, Midjourney) and GANs (StyleGAN) can generate entirely synthetic face images and video from text descriptions or random noise. These have no source video at all -- the entire image is generated.

Audio Deepfakes

Voice cloning systems can generate speech in a target person's voice from a few seconds of reference audio. Modern systems achieve high fidelity for both linguistic content and paralinguistic features (emotion, emphasis, speaking rate).

Generation Method	Modality	Forensic Difficulty	Key Artifacts
DeepFaceLab (autoencoder)	Video	Medium	Blending boundaries, resolution mismatch
StyleGAN/StyleGAN3	Image	Medium-High	Frequency-domain artifacts, texture repetition
Stable Diffusion / SDXL	Image	High	Semantic errors, text rendering failures
Wav2Lip	Video (lip sync)	Medium	Jaw boundary artifacts, temporal inconsistency
XTTS / Bark / Tortoise	Audio	High	Prosody anomalies, spectral envelope artifacts

Image Deepfake Detection

Frequency Domain Analysis

Generative models produce characteristic patterns in the frequency domain that differ from natural images captured by physical cameras. Natural images have frequency spectra shaped by the camera's optical system, sensor characteristics, and scene content. Generated images have spectra shaped by the neural network's architecture -- particularly the upsampling operations used to produce high-resolution output.

"""
Deepfake frequency domain forensic analysis.
 
Analyzes the spectral properties of images to detect
artifacts characteristic of AI generation.
"""
import numpy as np
from typing import Any
 
def compute_frequency_spectrum(image_gray: np.ndarray) -> np.ndarray:
    """
    Compute the 2D frequency spectrum of a grayscale image.
 
    The magnitude spectrum reveals frequency-domain artifacts
    from generative model architectures.
    """
    # Apply 2D FFT
    f_transform = np.fft.fft2(image_gray.astype(np.float64))
    f_shifted = np.fft.fftshift(f_transform)
 
    # Compute log magnitude spectrum
    magnitude = np.log1p(np.abs(f_shifted))
    return magnitude
 
def analyze_spectral_artifacts(
    magnitude_spectrum: np.ndarray,
) -> dict[str, Any]:
    """
    Analyze frequency spectrum for GAN/diffusion model artifacts.
 
    Known artifacts include:
    - GAN checkerboard: periodic peaks from transpose convolution
    - Diffusion model: unusual high-frequency energy distribution
    - Upsampling artifacts: spectral copies at Nyquist-related frequencies
    """
    h, w = magnitude_spectrum.shape
    center_y, center_x = h // 2, w // 2
 
    # Compute radial average (azimuthally averaged power spectrum)
    y_coords, x_coords = np.ogrid[:h, :w]
    distances = np.sqrt((y_coords - center_y) ** 2 + (x_coords - center_x) ** 2)
    max_radius = min(center_y, center_x)
 
    radial_bins = np.linspace(0, max_radius, 100)
    radial_profile = np.zeros(len(radial_bins) - 1)
 
    for i in range(len(radial_bins) - 1):
        mask = (distances >= radial_bins[i]) & (distances < radial_bins[i + 1])
        if mask.any():
            radial_profile[i] = magnitude_spectrum[mask].mean()
 
    # Check for checkerboard artifact (GAN signature)
    # Checkerboard creates peaks at regular frequency intervals
    if len(radial_profile) > 10:
        profile_diff = np.diff(radial_profile)
        # High variance in the derivative suggests periodic peaks
        checkerboard_score = float(np.std(profile_diff) / (np.mean(np.abs(profile_diff)) + 1e-10))
    else:
        checkerboard_score = 0.0
 
    # Analyze high-frequency energy ratio
    mid_point = len(radial_profile) // 2
    low_freq_energy = float(np.sum(radial_profile[:mid_point]))
    high_freq_energy = float(np.sum(radial_profile[mid_point:]))
    total_energy = low_freq_energy + high_freq_energy
 
    hf_ratio = high_freq_energy / total_energy if total_energy > 0 else 0.0
 
    return {
        "checkerboard_score": round(checkerboard_score, 4),
        "high_freq_ratio": round(hf_ratio, 4),
        "radial_profile": radial_profile.tolist(),
        "likely_gan_artifact": checkerboard_score > 2.0,
        "likely_synthetic": hf_ratio < 0.15 or checkerboard_score > 2.0,
    }

Pixel-Level Artifact Detection

At the pixel level, deepfakes often contain artifacts around facial boundaries, eyes, teeth, and hair that are diagnostically useful.

def detect_blending_artifacts(
    image_rgb: np.ndarray,
    face_mask: np.ndarray,
    edge_width: int = 10,
) -> dict:
    """
    Detect blending artifacts at face boundaries in a suspected deepfake.
 
    Face swap deepfakes must blend the generated face into the original
    frame. This blending process leaves detectable artifacts including:
    - Color distribution discontinuities at the blend boundary
    - Texture frequency changes at the boundary
    - Inconsistent noise patterns between face and background
    """
    # Extract boundary region
    from scipy.ndimage import binary_dilation, binary_erosion
 
    dilated = binary_dilation(face_mask, iterations=edge_width)
    eroded = binary_erosion(face_mask, iterations=edge_width)
    boundary_mask = dilated & ~eroded
 
    if not boundary_mask.any():
        return {"error": "No valid boundary region found"}
 
    # Analyze color statistics inside vs outside face
    face_pixels = image_rgb[eroded].astype(np.float64)
    bg_pixels = image_rgb[dilated & ~face_mask].astype(np.float64)
    boundary_pixels = image_rgb[boundary_mask].astype(np.float64)
 
    if len(face_pixels) == 0 or len(bg_pixels) == 0:
        return {"error": "Insufficient pixels for analysis"}
 
    # Color distribution comparison per channel
    channel_discontinuities = []
    for c in range(3):
        face_mean = float(np.mean(face_pixels[:, c]))
        face_std = float(np.std(face_pixels[:, c]))
        boundary_mean = float(np.mean(boundary_pixels[:, c]))
        bg_mean = float(np.mean(bg_pixels[:, c]))
 
        # Check if boundary color smoothly transitions
        expected_boundary = (face_mean + bg_mean) / 2
        discontinuity = abs(boundary_mean - expected_boundary) / max(face_std, 1.0)
        channel_discontinuities.append(float(discontinuity))
 
    # Noise analysis: compute local variance in face vs background
    def local_variance(pixels: np.ndarray) -> float:
        return float(np.mean(np.var(pixels, axis=0)))
 
    face_noise = local_variance(face_pixels)
    bg_noise = local_variance(bg_pixels)
    noise_ratio = face_noise / max(bg_noise, 1e-10)
 
    return {
        "channel_discontinuities": channel_discontinuities,
        "mean_discontinuity": float(np.mean(channel_discontinuities)),
        "face_noise_level": face_noise,
        "background_noise_level": bg_noise,
        "noise_ratio": round(noise_ratio, 4),
        "blending_artifacts_detected": (
            np.mean(channel_discontinuities) > 0.5
            or abs(noise_ratio - 1.0) > 0.5
        ),
    }

Semantic Consistency Analysis

AI-generated images often contain semantic errors that are invisible to the generation model but obvious upon close inspection: inconsistent reflections in eyes, asymmetric earrings, impossible hand geometry, text that does not spell real words, and background elements that defy physics.

def check_semantic_consistency(
    detection_results: dict,
) -> dict:
    """
    Evaluate semantic consistency indicators from an object detection
    and landmark analysis pipeline.
 
    Expects pre-computed results from a face landmark detector and
    an object detector.
    """
    findings = []
 
    # Check eye reflection consistency
    left_eye_reflection = detection_results.get("left_eye_specular")
    right_eye_reflection = detection_results.get("right_eye_specular")
    if left_eye_reflection is not None and right_eye_reflection is not None:
        # In natural photos, eye reflections should be consistent
        reflection_similarity = float(np.corrcoef(
            np.array(left_eye_reflection).flatten(),
            np.array(right_eye_reflection).flatten(),
        )[0, 1])
        if reflection_similarity < 0.7:
            findings.append({
                "check": "eye_reflection_consistency",
                "result": "FAIL",
                "detail": f"Eye reflection correlation: {reflection_similarity:.2f}",
            })
 
    # Check facial landmark symmetry
    landmarks = detection_results.get("face_landmarks")
    if landmarks:
        # Measure bilateral symmetry
        left_landmarks = landmarks.get("left_side", [])
        right_landmarks = landmarks.get("right_side", [])
        if left_landmarks and right_landmarks:
            symmetry_score = _compute_symmetry_score(left_landmarks, right_landmarks)
            if symmetry_score < 0.6:
                findings.append({
                    "check": "facial_symmetry",
                    "result": "SUSPICIOUS",
                    "detail": f"Abnormal asymmetry score: {symmetry_score:.2f}",
                })
 
    # Check for hand anomalies (common in diffusion model outputs)
    hand_detections = detection_results.get("hands", [])
    for i, hand in enumerate(hand_detections):
        finger_count = hand.get("finger_count", 5)
        if finger_count != 5:
            findings.append({
                "check": f"hand_{i}_finger_count",
                "result": "FAIL",
                "detail": f"Detected {finger_count} fingers (expected 5)",
            })
 
    return {
        "checks_performed": len(findings) + 3,  # Including passed checks
        "anomalies_found": len(findings),
        "findings": findings,
        "overall_assessment": (
            "LIKELY_SYNTHETIC" if len(findings) >= 2
            else "SUSPICIOUS" if len(findings) == 1
            else "NO_ANOMALIES_DETECTED"
        ),
    }
 
def _compute_symmetry_score(
    left: list[tuple[float, float]],
    right: list[tuple[float, float]],
) -> float:
    """Compute bilateral symmetry score for facial landmarks."""
    if len(left) != len(right) or not left:
        return 0.0
    left_arr = np.array(left)
    right_arr = np.array(right)
    # Mirror right side
    right_mirrored = right_arr.copy()
    right_mirrored[:, 0] = -right_mirrored[:, 0]
    distances = np.linalg.norm(left_arr - right_mirrored, axis=1)
    max_dist = np.max(np.abs(left_arr))
    return float(1.0 - np.mean(distances) / max(max_dist, 1e-10))

Audio Deepfake Detection

Spectral Analysis

Voice cloning systems produce audio with characteristic spectral properties that differ from natural speech recorded through a microphone.

def analyze_audio_spectrum(
    audio_samples: np.ndarray,
    sample_rate: int,
) -> dict:
    """
    Analyze audio for voice cloning artifacts.
 
    Examines spectral envelope, formant consistency, and
    micro-temporal features that differ between natural and
    synthetic speech.
    """
    # Compute spectrogram
    window_size = int(0.025 * sample_rate)  # 25ms windows
    hop_size = int(0.010 * sample_rate)  # 10ms hop
 
    # Manual STFT computation
    num_frames = (len(audio_samples) - window_size) // hop_size + 1
    window = np.hanning(window_size)
    spectrogram = np.zeros((window_size // 2 + 1, num_frames))
 
    for i in range(num_frames):
        start = i * hop_size
        frame = audio_samples[start:start + window_size] * window
        spectrum = np.abs(np.fft.rfft(frame))
        spectrogram[:, i] = spectrum
 
    # Analyze spectral flatness (Wiener entropy)
    # Synthetic speech often has more uniform spectral energy
    log_spec = np.log(spectrogram + 1e-10)
    spectral_flatness = np.exp(np.mean(log_spec, axis=0)) / (np.mean(spectrogram, axis=0) + 1e-10)
    mean_flatness = float(np.mean(spectral_flatness))
 
    # Analyze temporal smoothness
    # Voice cloning can produce overly smooth or overly jerky transitions
    temporal_diff = np.diff(spectrogram, axis=1)
    temporal_smoothness = float(np.mean(np.abs(temporal_diff)))
 
    # Check for vocoder artifacts in high frequencies
    # Neural vocoders often produce characteristic patterns above 8kHz
    high_freq_start = int(8000 * window_size / sample_rate)
    if high_freq_start < spectrogram.shape[0]:
        high_freq_energy = float(np.mean(spectrogram[high_freq_start:]))
        total_energy = float(np.mean(spectrogram))
        hf_ratio = high_freq_energy / max(total_energy, 1e-10)
    else:
        hf_ratio = 0.0
 
    return {
        "spectral_flatness": round(mean_flatness, 4),
        "temporal_smoothness": round(temporal_smoothness, 4),
        "high_freq_ratio": round(hf_ratio, 4),
        "likely_synthetic": mean_flatness > 0.3 or hf_ratio < 0.05,
        "analysis_notes": _interpret_audio_analysis(mean_flatness, hf_ratio),
    }
 
def _interpret_audio_analysis(flatness: float, hf_ratio: float) -> str:
    if flatness > 0.3 and hf_ratio < 0.05:
        return "High spectral flatness and low high-frequency content suggest neural vocoder synthesis"
    if flatness > 0.3:
        return "Elevated spectral flatness may indicate synthetic generation"
    if hf_ratio < 0.05:
        return "Reduced high-frequency content may indicate voice cloning with bandwidth limitations"
    return "Spectral properties consistent with natural speech"

Detection Tools

Several established tools support deepfake forensic analysis:

FaceForensics++: Academic benchmark and dataset for face manipulation detection, providing pre-trained detection models. Available at github.com/ondyari/FaceForensics.
Microsoft Video Authenticator: Microsoft's tool for analyzing photos and videos to provide a confidence score of authenticity.
Sensity (formerly Deeptrace): Commercial deepfake detection platform used by media organizations and governments.
Google SynthID: Watermark-based detection for content generated by Google's AI models.
Resemblyzer: Open-source speaker verification library useful for voice deepfake analysis. Available at github.com/resemble-ai/Resemblyzer.

Forensic Reporting

Documentation Standards

Deepfake forensic reports for legal proceedings should follow established digital forensics standards (ISO/IEC 27037) adapted for synthetic media. Key elements include:

Evidence chain of custody: How the media was obtained, by whom, and how it was preserved
Analysis methodology: Which detection methods were applied and why
Quantitative results: Detection scores, confidence intervals, and error rates for each method
Methodology limitations: What the analysis cannot determine (absence of evidence is not evidence of absence)
Peer review: Forensic conclusions should be reviewed by an independent analyst

Confidence Communication

Forensic findings should be communicated with appropriate uncertainty:

Detection Confidence	Statement Template
> 99%	"Analysis indicates with high confidence that this media was generated by [method]"
90-99%	"Analysis is consistent with AI generation; natural origin cannot be excluded"
70-90%	"Several indicators suggest possible AI generation; further analysis recommended"
< 70%	"Inconclusive; analysis does not provide sufficient evidence to determine authenticity"

Limitations and Anti-Forensics

Detection methods face several fundamental limitations:

Generalization gap: Detectors trained on one generation method (e.g., StyleGAN2) may fail on content from newer methods (e.g., Stable Diffusion 3)
Post-processing: Compression, resizing, and social media reprocessing degrade forensic artifacts
Adversarial attacks: Generation methods can be optimized to specifically evade known detectors
Arms race: Each improvement in detection drives improvements in generation

Forensic investigators should never rely on a single detection method. Use multiple independent techniques and report overall confidence based on the convergence of findings.

References

Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Niessner, M. (2019). FaceForensics++: Learning to Detect Manipulated Facial Images. Proceedings of the IEEE International Conference on Computer Vision (ICCV). https://arxiv.org/abs/1901.08971
Frank, J., Eisenhofer, T., Schonherr, L., Fischer, A., Kolossa, D., & Holz, T. (2020). Leveraging Frequency Analysis for Deep Fake Image Recognition. Proceedings of the 37th International Conference on Machine Learning (ICML). https://arxiv.org/abs/2003.08685
Muller, N. M., Czempin, P., Dieckmann, A., Frober, A., & Boltjes, B. (2022). Does Audio Deepfake Detection Generalize? Proceedings of Interspeech 2022. https://doi.org/10.21437/Interspeech.2022-108

Edit this page on GitHub

Deepfake Forensic Analysis

beginner12 min readUpdated 2026-03-20

Forensic techniques for detecting, analyzing, and attributing AI-generated deepfake images, video, and audio content.

ai-forensics-ir deepfakes media-forensics synthetic-media

Overview

Types of Deepfakes

Face Swap Deepfakes

Face swap deepfakes replace one person's face with another in existing video footage. The generation pipeline typically involves:

Face detection and extraction from source and target video
Encoder-decoder architecture that learns a shared latent space for both faces
Blending the generated face back into the target video frame
Post-processing to match lighting, color balance, and resolution

Forensic artifacts include blending boundary inconsistencies, temporal flickering at face edges, and mismatches in skin texture between the swapped face and the original body.

Face Reenactment

Full Synthetic Generation

Audio Deepfakes

Generation Method	Modality	Forensic Difficulty	Key Artifacts
DeepFaceLab (autoencoder)	Video	Medium	Blending boundaries, resolution mismatch
StyleGAN/StyleGAN3	Image	Medium-High	Frequency-domain artifacts, texture repetition
Stable Diffusion / SDXL	Image	High	Semantic errors, text rendering failures
Wav2Lip	Video (lip sync)	Medium	Jaw boundary artifacts, temporal inconsistency
XTTS / Bark / Tortoise	Audio	High	Prosody anomalies, spectral envelope artifacts

Image Deepfake Detection

Frequency Domain Analysis

"""
Deepfake frequency domain forensic analysis.
 
Analyzes the spectral properties of images to detect
artifacts characteristic of AI generation.
"""
import numpy as np
from typing import Any
 
def compute_frequency_spectrum(image_gray: np.ndarray) -> np.ndarray:
    """
    Compute the 2D frequency spectrum of a grayscale image.
 
    The magnitude spectrum reveals frequency-domain artifacts
    from generative model architectures.
    """
    # Apply 2D FFT
    f_transform = np.fft.fft2(image_gray.astype(np.float64))
    f_shifted = np.fft.fftshift(f_transform)
 
    # Compute log magnitude spectrum
    magnitude = np.log1p(np.abs(f_shifted))
    return magnitude
 
def analyze_spectral_artifacts(
    magnitude_spectrum: np.ndarray,
) -> dict[str, Any]:
    """
    Analyze frequency spectrum for GAN/diffusion model artifacts.
 
    Known artifacts include:
    - GAN checkerboard: periodic peaks from transpose convolution
    - Diffusion model: unusual high-frequency energy distribution
    - Upsampling artifacts: spectral copies at Nyquist-related frequencies
    """
    h, w = magnitude_spectrum.shape
    center_y, center_x = h // 2, w // 2
 
    # Compute radial average (azimuthally averaged power spectrum)
    y_coords, x_coords = np.ogrid[:h, :w]
    distances = np.sqrt((y_coords - center_y) ** 2 + (x_coords - center_x) ** 2)
    max_radius = min(center_y, center_x)
 
    radial_bins = np.linspace(0, max_radius, 100)
    radial_profile = np.zeros(len(radial_bins) - 1)
 
    for i in range(len(radial_bins) - 1):
        mask = (distances >= radial_bins[i]) & (distances < radial_bins[i + 1])
        if mask.any():
            radial_profile[i] = magnitude_spectrum[mask].mean()
 
    # Check for checkerboard artifact (GAN signature)
    # Checkerboard creates peaks at regular frequency intervals
    if len(radial_profile) > 10:
        profile_diff = np.diff(radial_profile)
        # High variance in the derivative suggests periodic peaks
        checkerboard_score = float(np.std(profile_diff) / (np.mean(np.abs(profile_diff)) + 1e-10))
    else:
        checkerboard_score = 0.0
 
    # Analyze high-frequency energy ratio
    mid_point = len(radial_profile) // 2
    low_freq_energy = float(np.sum(radial_profile[:mid_point]))
    high_freq_energy = float(np.sum(radial_profile[mid_point:]))
    total_energy = low_freq_energy + high_freq_energy
 
    hf_ratio = high_freq_energy / total_energy if total_energy > 0 else 0.0
 
    return {
        "checkerboard_score": round(checkerboard_score, 4),
        "high_freq_ratio": round(hf_ratio, 4),
        "radial_profile": radial_profile.tolist(),
        "likely_gan_artifact": checkerboard_score > 2.0,
        "likely_synthetic": hf_ratio < 0.15 or checkerboard_score > 2.0,
    }

Pixel-Level Artifact Detection

At the pixel level, deepfakes often contain artifacts around facial boundaries, eyes, teeth, and hair that are diagnostically useful.

def detect_blending_artifacts(
    image_rgb: np.ndarray,
    face_mask: np.ndarray,
    edge_width: int = 10,
) -> dict:
    """
    Detect blending artifacts at face boundaries in a suspected deepfake.
 
    Face swap deepfakes must blend the generated face into the original
    frame. This blending process leaves detectable artifacts including:
    - Color distribution discontinuities at the blend boundary
    - Texture frequency changes at the boundary
    - Inconsistent noise patterns between face and background
    """
    # Extract boundary region
    from scipy.ndimage import binary_dilation, binary_erosion
 
    dilated = binary_dilation(face_mask, iterations=edge_width)
    eroded = binary_erosion(face_mask, iterations=edge_width)
    boundary_mask = dilated & ~eroded
 
    if not boundary_mask.any():
        return {"error": "No valid boundary region found"}
 
    # Analyze color statistics inside vs outside face
    face_pixels = image_rgb[eroded].astype(np.float64)
    bg_pixels = image_rgb[dilated & ~face_mask].astype(np.float64)
    boundary_pixels = image_rgb[boundary_mask].astype(np.float64)
 
    if len(face_pixels) == 0 or len(bg_pixels) == 0:
        return {"error": "Insufficient pixels for analysis"}
 
    # Color distribution comparison per channel
    channel_discontinuities = []
    for c in range(3):
        face_mean = float(np.mean(face_pixels[:, c]))
        face_std = float(np.std(face_pixels[:, c]))
        boundary_mean = float(np.mean(boundary_pixels[:, c]))
        bg_mean = float(np.mean(bg_pixels[:, c]))
 
        # Check if boundary color smoothly transitions
        expected_boundary = (face_mean + bg_mean) / 2
        discontinuity = abs(boundary_mean - expected_boundary) / max(face_std, 1.0)
        channel_discontinuities.append(float(discontinuity))
 
    # Noise analysis: compute local variance in face vs background
    def local_variance(pixels: np.ndarray) -> float:
        return float(np.mean(np.var(pixels, axis=0)))
 
    face_noise = local_variance(face_pixels)
    bg_noise = local_variance(bg_pixels)
    noise_ratio = face_noise / max(bg_noise, 1e-10)
 
    return {
        "channel_discontinuities": channel_discontinuities,
        "mean_discontinuity": float(np.mean(channel_discontinuities)),
        "face_noise_level": face_noise,
        "background_noise_level": bg_noise,
        "noise_ratio": round(noise_ratio, 4),
        "blending_artifacts_detected": (
            np.mean(channel_discontinuities) > 0.5
            or abs(noise_ratio - 1.0) > 0.5
        ),
    }

Semantic Consistency Analysis

def check_semantic_consistency(
    detection_results: dict,
) -> dict:
    """
    Evaluate semantic consistency indicators from an object detection
    and landmark analysis pipeline.
 
    Expects pre-computed results from a face landmark detector and
    an object detector.
    """
    findings = []
 
    # Check eye reflection consistency
    left_eye_reflection = detection_results.get("left_eye_specular")
    right_eye_reflection = detection_results.get("right_eye_specular")
    if left_eye_reflection is not None and right_eye_reflection is not None:
        # In natural photos, eye reflections should be consistent
        reflection_similarity = float(np.corrcoef(
            np.array(left_eye_reflection).flatten(),
            np.array(right_eye_reflection).flatten(),
        )[0, 1])
        if reflection_similarity < 0.7:
            findings.append({
                "check": "eye_reflection_consistency",
                "result": "FAIL",
                "detail": f"Eye reflection correlation: {reflection_similarity:.2f}",
            })
 
    # Check facial landmark symmetry
    landmarks = detection_results.get("face_landmarks")
    if landmarks:
        # Measure bilateral symmetry
        left_landmarks = landmarks.get("left_side", [])
        right_landmarks = landmarks.get("right_side", [])
        if left_landmarks and right_landmarks:
            symmetry_score = _compute_symmetry_score(left_landmarks, right_landmarks)
            if symmetry_score < 0.6:
                findings.append({
                    "check": "facial_symmetry",
                    "result": "SUSPICIOUS",
                    "detail": f"Abnormal asymmetry score: {symmetry_score:.2f}",
                })
 
    # Check for hand anomalies (common in diffusion model outputs)
    hand_detections = detection_results.get("hands", [])
    for i, hand in enumerate(hand_detections):
        finger_count = hand.get("finger_count", 5)
        if finger_count != 5:
            findings.append({
                "check": f"hand_{i}_finger_count",
                "result": "FAIL",
                "detail": f"Detected {finger_count} fingers (expected 5)",
            })
 
    return {
        "checks_performed": len(findings) + 3,  # Including passed checks
        "anomalies_found": len(findings),
        "findings": findings,
        "overall_assessment": (
            "LIKELY_SYNTHETIC" if len(findings) >= 2
            else "SUSPICIOUS" if len(findings) == 1
            else "NO_ANOMALIES_DETECTED"
        ),
    }
 
def _compute_symmetry_score(
    left: list[tuple[float, float]],
    right: list[tuple[float, float]],
) -> float:
    """Compute bilateral symmetry score for facial landmarks."""
    if len(left) != len(right) or not left:
        return 0.0
    left_arr = np.array(left)
    right_arr = np.array(right)
    # Mirror right side
    right_mirrored = right_arr.copy()
    right_mirrored[:, 0] = -right_mirrored[:, 0]
    distances = np.linalg.norm(left_arr - right_mirrored, axis=1)
    max_dist = np.max(np.abs(left_arr))
    return float(1.0 - np.mean(distances) / max(max_dist, 1e-10))

Audio Deepfake Detection

Spectral Analysis

Voice cloning systems produce audio with characteristic spectral properties that differ from natural speech recorded through a microphone.

def analyze_audio_spectrum(
    audio_samples: np.ndarray,
    sample_rate: int,
) -> dict:
    """
    Analyze audio for voice cloning artifacts.
 
    Examines spectral envelope, formant consistency, and
    micro-temporal features that differ between natural and
    synthetic speech.
    """
    # Compute spectrogram
    window_size = int(0.025 * sample_rate)  # 25ms windows
    hop_size = int(0.010 * sample_rate)  # 10ms hop
 
    # Manual STFT computation
    num_frames = (len(audio_samples) - window_size) // hop_size + 1
    window = np.hanning(window_size)
    spectrogram = np.zeros((window_size // 2 + 1, num_frames))
 
    for i in range(num_frames):
        start = i * hop_size
        frame = audio_samples[start:start + window_size] * window
        spectrum = np.abs(np.fft.rfft(frame))
        spectrogram[:, i] = spectrum
 
    # Analyze spectral flatness (Wiener entropy)
    # Synthetic speech often has more uniform spectral energy
    log_spec = np.log(spectrogram + 1e-10)
    spectral_flatness = np.exp(np.mean(log_spec, axis=0)) / (np.mean(spectrogram, axis=0) + 1e-10)
    mean_flatness = float(np.mean(spectral_flatness))
 
    # Analyze temporal smoothness
    # Voice cloning can produce overly smooth or overly jerky transitions
    temporal_diff = np.diff(spectrogram, axis=1)
    temporal_smoothness = float(np.mean(np.abs(temporal_diff)))
 
    # Check for vocoder artifacts in high frequencies
    # Neural vocoders often produce characteristic patterns above 8kHz
    high_freq_start = int(8000 * window_size / sample_rate)
    if high_freq_start < spectrogram.shape[0]:
        high_freq_energy = float(np.mean(spectrogram[high_freq_start:]))
        total_energy = float(np.mean(spectrogram))
        hf_ratio = high_freq_energy / max(total_energy, 1e-10)
    else:
        hf_ratio = 0.0
 
    return {
        "spectral_flatness": round(mean_flatness, 4),
        "temporal_smoothness": round(temporal_smoothness, 4),
        "high_freq_ratio": round(hf_ratio, 4),
        "likely_synthetic": mean_flatness > 0.3 or hf_ratio < 0.05,
        "analysis_notes": _interpret_audio_analysis(mean_flatness, hf_ratio),
    }
 
def _interpret_audio_analysis(flatness: float, hf_ratio: float) -> str:
    if flatness > 0.3 and hf_ratio < 0.05:
        return "High spectral flatness and low high-frequency content suggest neural vocoder synthesis"
    if flatness > 0.3:
        return "Elevated spectral flatness may indicate synthetic generation"
    if hf_ratio < 0.05:
        return "Reduced high-frequency content may indicate voice cloning with bandwidth limitations"
    return "Spectral properties consistent with natural speech"

Detection Tools

Several established tools support deepfake forensic analysis:

FaceForensics++: Academic benchmark and dataset for face manipulation detection, providing pre-trained detection models. Available at github.com/ondyari/FaceForensics.
Microsoft Video Authenticator: Microsoft's tool for analyzing photos and videos to provide a confidence score of authenticity.
Sensity (formerly Deeptrace): Commercial deepfake detection platform used by media organizations and governments.
Google SynthID: Watermark-based detection for content generated by Google's AI models.
Resemblyzer: Open-source speaker verification library useful for voice deepfake analysis. Available at github.com/resemble-ai/Resemblyzer.

Forensic Reporting

Documentation Standards

Deepfake forensic reports for legal proceedings should follow established digital forensics standards (ISO/IEC 27037) adapted for synthetic media. Key elements include:

Evidence chain of custody: How the media was obtained, by whom, and how it was preserved
Analysis methodology: Which detection methods were applied and why
Quantitative results: Detection scores, confidence intervals, and error rates for each method
Methodology limitations: What the analysis cannot determine (absence of evidence is not evidence of absence)
Peer review: Forensic conclusions should be reviewed by an independent analyst

Confidence Communication

Forensic findings should be communicated with appropriate uncertainty:

Detection Confidence	Statement Template
> 99%	"Analysis indicates with high confidence that this media was generated by [method]"
90-99%	"Analysis is consistent with AI generation; natural origin cannot be excluded"
70-90%	"Several indicators suggest possible AI generation; further analysis recommended"
< 70%	"Inconclusive; analysis does not provide sufficient evidence to determine authenticity"

Limitations and Anti-Forensics

Detection methods face several fundamental limitations:

Generalization gap: Detectors trained on one generation method (e.g., StyleGAN2) may fail on content from newer methods (e.g., Stable Diffusion 3)
Post-processing: Compression, resizing, and social media reprocessing degrade forensic artifacts
Adversarial attacks: Generation methods can be optimized to specifically evade known detectors
Arms race: Each improvement in detection drives improvements in generation

Forensic investigators should never rely on a single detection method. Use multiple independent techniques and report overall confidence based on the convergence of findings.

References

Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Niessner, M. (2019). FaceForensics++: Learning to Detect Manipulated Facial Images. Proceedings of the IEEE International Conference on Computer Vision (ICCV). https://arxiv.org/abs/1901.08971
Frank, J., Eisenhofer, T., Schonherr, L., Fischer, A., Kolossa, D., & Holz, T. (2020). Leveraging Frequency Analysis for Deep Fake Image Recognition. Proceedings of the 37th International Conference on Machine Learning (ICML). https://arxiv.org/abs/2003.08685
Muller, N. M., Czempin, P., Dieckmann, A., Frober, A., & Boltjes, B. (2022). Does Audio Deepfake Detection Generalize? Proceedings of Interspeech 2022. https://doi.org/10.21437/Interspeech.2022-108

Edit this page on GitHub

Deepfake Forensic Analysis

Related articles

Deepfake Forensic Analysis

Related articles