LLM Output Watermark Detection

intermediate11 min readUpdated 2026-03-20

Techniques for detecting, extracting, and analyzing watermarks embedded in LLM-generated text for provenance tracking and forensic attribution.

ai-forensics-ir watermarking llm-output provenance

Overview

LLM output watermarking is the practice of embedding statistically detectable patterns in text generated by large language models. These patterns are imperceptible to human readers but can be identified by algorithms that know the watermarking scheme. From a forensic perspective, watermark detection enables investigators to determine whether text was AI-generated, which model or provider produced it, and in some cases, which user session generated it.

The primary watermarking approach for LLMs, introduced by Kirchenbauer et al. (2023), works by partitioning the vocabulary into "green" and "red" lists at each token position, using the previous token as a hash seed. During generation, the model's sampling is biased toward green-list tokens. The resulting text contains a statistically improbable excess of green-list tokens that can be detected with a simple hypothesis test.

This article covers the forensic application of watermark detection: how investigators can analyze text to determine its provenance, assess the strength of watermark evidence, and account for watermark removal attempts.

How LLM Watermarking Works

Green-List/Red-List Scheme

The Kirchenbauer et al. scheme operates during the token sampling phase of text generation. At each generation step:

A cryptographic hash function takes the previous token (or a window of previous tokens) and a secret key as input
The hash output deterministically partitions the full vocabulary into a "green list" (favored tokens) and a "red list" (disfavored tokens)
A bias term delta is added to the logits of all green-list tokens before sampling
The model is more likely to select green-list tokens, but can still select red-list tokens when they are strongly preferred by the model's distribution

"""
LLM watermark detection implementation for forensic analysis.
 
Based on the green-list scheme from Kirchenbauer et al. (2023).
This module implements the detection side -- given text and the
watermarking key, determine whether the text was watermarked.
"""
import hashlib
from collections.abc import Sequence
 
import numpy as np
from scipy import stats
 
def compute_green_list(
    previous_token_id: int,
    vocab_size: int,
    secret_key: bytes,
    gamma: float = 0.5,
) -> set[int]:
    """
    Compute the green list for a given context token.
 
    Uses HMAC-SHA256 to deterministically partition the vocabulary
    into green and red lists based on the previous token and secret key.
 
    Args:
        previous_token_id: The token ID preceding the current position.
        vocab_size: Total vocabulary size.
        secret_key: The watermarking secret key.
        gamma: Fraction of vocabulary in the green list (default 0.5).
 
    Returns:
        Set of token IDs in the green list.
    """
    # Create deterministic seed from previous token and secret key
    seed_material = previous_token_id.to_bytes(4, 'big') + secret_key
    hash_bytes = hashlib.sha256(seed_material).digest()
    seed = int.from_bytes(hash_bytes[:8], 'big')
 
    rng = np.random.Generator(np.random.PCG64(seed))
    permutation = rng.permutation(vocab_size)
 
    green_list_size = int(gamma * vocab_size)
    return set(permutation[:green_list_size].tolist())
 
def detect_watermark(
    token_ids: Sequence[int],
    vocab_size: int,
    secret_key: bytes,
    gamma: float = 0.5,
) -> dict:
    """
    Test whether a sequence of token IDs contains a watermark.
 
    Performs a one-proportion z-test against the null hypothesis
    that tokens are drawn without green-list bias.
 
    Under the null (no watermark), the expected fraction of green-list
    tokens is gamma. Under the alternative (watermarked), the fraction
    should be significantly higher.
 
    Args:
        token_ids: Sequence of token IDs to test.
        vocab_size: Total vocabulary size.
        secret_key: The watermarking secret key.
        gamma: Green list fraction used during watermarking.
 
    Returns:
        Dict with detection results including z-score and p-value.
    """
    if len(token_ids) < 2:
        return {"error": "Need at least 2 tokens for detection"}
 
    green_count = 0
    total_tested = 0
 
    for i in range(1, len(token_ids)):
        prev_token = token_ids[i - 1]
        current_token = token_ids[i]
        green_list = compute_green_list(prev_token, vocab_size, secret_key, gamma)
 
        if current_token in green_list:
            green_count += 1
        total_tested += 1
 
    observed_fraction = green_count / total_tested
    expected_fraction = gamma
 
    # One-proportion z-test
    # H0: p = gamma (no watermark)
    # H1: p > gamma (watermark present)
    se = np.sqrt(gamma * (1 - gamma) / total_tested)
    z_score = (observed_fraction - expected_fraction) / se
    p_value = 1 - stats.norm.cdf(z_score)
 
    return {
        "total_tokens_tested": total_tested,
        "green_list_count": green_count,
        "observed_green_fraction": round(observed_fraction, 4),
        "expected_green_fraction": gamma,
        "z_score": round(float(z_score), 4),
        "p_value": float(p_value),
        "watermark_detected": p_value < 1e-5,
        "confidence": "high" if p_value < 1e-10 else "medium" if p_value < 1e-5 else "low",
    }

Distortion-Free Watermarking

An alternative approach embeds watermarks through the sampling process itself rather than by modifying logits. The technique by Aaronson & Kirchner (described in Aaronson's 2023 blog post) uses a pseudorandom function seeded by previous tokens to generate uniform random numbers, then applies inverse transform sampling. The watermark is detected by checking whether the generated tokens correlate with the pseudorandom sequence in a way that would be astronomically unlikely by chance.

Multi-Bit Watermarking

While the basic green-list scheme embeds a single-bit watermark (present or absent), more advanced schemes embed multi-bit payloads that can encode information such as:

Provider identification (which API generated this text)
User or session identification
Timestamp of generation
Model version information

Multi-bit schemes partition the vocabulary into more than two groups, with each group corresponding to a different bit value. The forensic value of multi-bit watermarks is substantially higher because they enable direct attribution.

Forensic Detection Methodology

Step 1: Text Preprocessing

Before running watermark detection, the text must be tokenized using the same tokenizer used by the suspected source model. Using the wrong tokenizer will destroy the watermark signal.

from transformers import AutoTokenizer
 
def prepare_text_for_detection(
    text: str,
    model_candidates: list[str],
) -> dict[str, list[int]]:
    """
    Tokenize text with multiple candidate model tokenizers.
 
    Since the watermark is embedded at the token level, we must
    test with each candidate model's tokenizer. The correct
    tokenizer will produce a stronger watermark signal.
    """
    results = {}
    for model_name in model_candidates:
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        token_ids = tokenizer.encode(text, add_special_tokens=False)
        results[model_name] = token_ids
    return results
 
def multi_model_watermark_scan(
    text: str,
    model_candidates: list[str],
    candidate_keys: dict[str, bytes],
    gamma: float = 0.5,
) -> list[dict]:
    """
    Scan text for watermarks from multiple candidate models/providers.
 
    Tests each combination of tokenizer and watermarking key to
    determine which (if any) produced the text.
    """
    tokenizations = prepare_text_for_detection(text, model_candidates)
    results = []
 
    for model_name, token_ids in tokenizations.items():
        if model_name not in candidate_keys:
            continue
 
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        vocab_size = tokenizer.vocab_size
 
        detection = detect_watermark(
            token_ids=token_ids,
            vocab_size=vocab_size,
            secret_key=candidate_keys[model_name],
            gamma=gamma,
        )
        detection["model"] = model_name
        results.append(detection)
 
    results.sort(key=lambda r: r.get("z_score", 0), reverse=True)
    return results

Step 2: Statistical Significance Assessment

The z-test provides a p-value that quantifies the probability of observing the green-list token fraction under the null hypothesis of no watermarking. In forensic contexts, the significance threshold should be set conservatively:

Confidence Level	p-value Threshold	z-score (approx.)	Forensic Use
Preliminary	< 0.01	> 2.33	Sufficient for triage, not for attribution
Standard	< 1e-5	> 4.26	Sufficient for internal investigation findings
High confidence	< 1e-10	> 6.36	Sufficient for formal forensic reports
Forensic grade	< 1e-20	> 9.26	Suitable for legal proceedings

The required text length to achieve a given confidence level depends on the watermarking strength (delta parameter) and the green list fraction (gamma). As a rough guide:

200 tokens: Can detect strong watermarks (z > 4) in favorable conditions
500 tokens: Reliable detection for most watermarking configurations
1000+ tokens: Near-certain detection with high forensic confidence

Step 3: Windowed Analysis

If the text under investigation is a mix of human-written and AI-generated content, a sliding window analysis can identify which portions are watermarked.

def windowed_watermark_detection(
    token_ids: list[int],
    vocab_size: int,
    secret_key: bytes,
    window_size: int = 100,
    step_size: int = 25,
    gamma: float = 0.5,
) -> list[dict]:
    """
    Apply watermark detection in a sliding window across the text.
 
    This identifies which portions of a document are likely AI-generated
    versus human-written, based on local watermark signal strength.
    """
    results = []
 
    for start in range(0, len(token_ids) - window_size, step_size):
        window = token_ids[start:start + window_size]
        detection = detect_watermark(window, vocab_size, secret_key, gamma)
        detection["token_start"] = start
        detection["token_end"] = start + window_size
        results.append(detection)
 
    return results

Watermark Removal Attacks and Forensic Countermeasures

Common Removal Techniques

Attackers aware of watermarking may attempt to remove the watermark while preserving the text's content. Common removal techniques include:

Paraphrasing: Using another LLM to rewrite the text. This replaces most tokens, disrupting the green-list signal.
Token substitution: Replacing individual words with synonyms using a thesaurus or word embedding similarity.
Homoglyph substitution: Replacing characters with visually identical Unicode characters from different scripts.
Translation round-tripping: Translating to another language and back, which completely retokenizes the text.
Insertion/deletion: Adding or removing words to shift the token alignment.

Forensic Detection of Removal Attempts

Even when a watermark has been partially removed, forensic evidence of the original watermark may persist. The investigator should look for:

def analyze_removal_artifacts(
    token_ids: list[int],
    vocab_size: int,
    secret_key: bytes,
    gamma: float = 0.5,
) -> dict:
    """
    Analyze text for signs of watermark removal attempts.
 
    Even after paraphrasing or editing, statistical artifacts of
    the original watermark may remain in portions of the text
    that were not modified.
    """
    # Full-text detection
    full_result = detect_watermark(token_ids, vocab_size, secret_key, gamma)
 
    # Windowed detection to find surviving watermark fragments
    window_results = windowed_watermark_detection(
        token_ids, vocab_size, secret_key,
        window_size=50, step_size=10, gamma=gamma,
    )
 
    # Count windows with significant watermark signal
    significant_windows = [
        w for w in window_results if w.get("z_score", 0) > 2.0
    ]
 
    # Analyze the pattern of surviving vs. removed watermark regions
    z_scores = [w.get("z_score", 0) for w in window_results]
    z_variance = float(np.var(z_scores)) if z_scores else 0.0
 
    return {
        "full_text_detection": full_result,
        "total_windows": len(window_results),
        "significant_windows": len(significant_windows),
        "significant_fraction": (
            len(significant_windows) / len(window_results)
            if window_results else 0.0
        ),
        "z_score_variance": z_variance,
        "removal_likely": (
            full_result.get("z_score", 0) < 2.0
            and len(significant_windows) > len(window_results) * 0.2
        ),
        "interpretation": _interpret_removal_analysis(
            full_result, significant_windows, window_results, z_variance
        ),
    }
 
def _interpret_removal_analysis(
    full_result: dict,
    significant_windows: list,
    all_windows: list,
    z_variance: float,
) -> str:
    full_z = full_result.get("z_score", 0)
    sig_frac = len(significant_windows) / max(len(all_windows), 1)
 
    if full_z > 4.0:
        return "Strong watermark detected -- no significant removal"
    if full_z > 2.0 and sig_frac > 0.5:
        return "Partial watermark -- possible light editing or paraphrasing of some sections"
    if full_z < 2.0 and sig_frac > 0.2:
        return "Watermark largely removed but fragments survive -- likely systematic removal attempt"
    if z_variance > 4.0:
        return "High variance in local watermark strength -- mixed human/AI or selective editing"
    return "No watermark signal detected"

Provider-Specific Watermarking Schemes

Different AI providers have implemented or announced different watermarking approaches. Forensic investigators should be aware of the landscape:

Provider	Watermarking Status	Scheme Type	Detection Availability
OpenAI	Announced, delayed deployment	Unknown (not publicly detailed)	Not publicly available
Google DeepMind	SynthID deployed	Logit-based statistical	Integrated in Google tools
Meta	Research published	Green-list variant	Open-source reference implementation
Anthropic	No public watermarking announced	N/A	N/A
Microsoft	Research published	Multiple schemes studied	Research code available

Google's SynthID system is the most widely deployed production watermarking scheme as of early 2026. It applies to both text and image outputs from Google's AI models and uses a tournament-based detection scheme that is more robust to text modifications than the basic green-list approach.

Practical Considerations

Text Length Requirements

Watermark detection is a statistical test whose power increases with sample size. Short texts (under 100 tokens) are generally too short for reliable detection. The investigator should request the longest available sample of suspected AI-generated text and avoid truncation.

Language and Domain Effects

Watermark strength varies by language and domain. Text in domains with highly constrained vocabularies (legal, medical, code) has fewer opportunities for the model to choose green-list tokens without degrading quality, resulting in weaker watermark signals. Forensic thresholds should be adjusted accordingly.

Tokenizer Sensitivity

The watermark signal exists at the token level, not the character or word level. Any text processing that changes tokenization boundaries (such as reformatting, changing whitespace, or correcting punctuation) can weaken the watermark signal even without intentional removal.

References

Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A Watermark for Large Language Models. Proceedings of the 40th International Conference on Machine Learning (ICML). https://arxiv.org/abs/2301.10226
Aaronson, S. (2023). My AI Safety Lecture at UT Austin. Scott Aaronson's Blog (Shtetl-Optimized). https://scottaaronson.blog/?p=6823
Zhao, X., Ananth, P., Li, L., & Wang, Y.-X. (2024). Provable Robust Watermarking for AI-Generated Text. Proceedings of the International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2306.17439

Edit this page on GitHub

LLM Output Watermark Detection

intermediate11 min readUpdated 2026-03-20

Techniques for detecting, extracting, and analyzing watermarks embedded in LLM-generated text for provenance tracking and forensic attribution.

ai-forensics-ir watermarking llm-output provenance

Overview

How LLM Watermarking Works

Green-List/Red-List Scheme

The Kirchenbauer et al. scheme operates during the token sampling phase of text generation. At each generation step:

A cryptographic hash function takes the previous token (or a window of previous tokens) and a secret key as input
The hash output deterministically partitions the full vocabulary into a "green list" (favored tokens) and a "red list" (disfavored tokens)
A bias term delta is added to the logits of all green-list tokens before sampling
The model is more likely to select green-list tokens, but can still select red-list tokens when they are strongly preferred by the model's distribution

"""
LLM watermark detection implementation for forensic analysis.
 
Based on the green-list scheme from Kirchenbauer et al. (2023).
This module implements the detection side -- given text and the
watermarking key, determine whether the text was watermarked.
"""
import hashlib
from collections.abc import Sequence
 
import numpy as np
from scipy import stats
 
def compute_green_list(
    previous_token_id: int,
    vocab_size: int,
    secret_key: bytes,
    gamma: float = 0.5,
) -> set[int]:
    """
    Compute the green list for a given context token.
 
    Uses HMAC-SHA256 to deterministically partition the vocabulary
    into green and red lists based on the previous token and secret key.
 
    Args:
        previous_token_id: The token ID preceding the current position.
        vocab_size: Total vocabulary size.
        secret_key: The watermarking secret key.
        gamma: Fraction of vocabulary in the green list (default 0.5).
 
    Returns:
        Set of token IDs in the green list.
    """
    # Create deterministic seed from previous token and secret key
    seed_material = previous_token_id.to_bytes(4, 'big') + secret_key
    hash_bytes = hashlib.sha256(seed_material).digest()
    seed = int.from_bytes(hash_bytes[:8], 'big')
 
    rng = np.random.Generator(np.random.PCG64(seed))
    permutation = rng.permutation(vocab_size)
 
    green_list_size = int(gamma * vocab_size)
    return set(permutation[:green_list_size].tolist())
 
def detect_watermark(
    token_ids: Sequence[int],
    vocab_size: int,
    secret_key: bytes,
    gamma: float = 0.5,
) -> dict:
    """
    Test whether a sequence of token IDs contains a watermark.
 
    Performs a one-proportion z-test against the null hypothesis
    that tokens are drawn without green-list bias.
 
    Under the null (no watermark), the expected fraction of green-list
    tokens is gamma. Under the alternative (watermarked), the fraction
    should be significantly higher.
 
    Args:
        token_ids: Sequence of token IDs to test.
        vocab_size: Total vocabulary size.
        secret_key: The watermarking secret key.
        gamma: Green list fraction used during watermarking.
 
    Returns:
        Dict with detection results including z-score and p-value.
    """
    if len(token_ids) < 2:
        return {"error": "Need at least 2 tokens for detection"}
 
    green_count = 0
    total_tested = 0
 
    for i in range(1, len(token_ids)):
        prev_token = token_ids[i - 1]
        current_token = token_ids[i]
        green_list = compute_green_list(prev_token, vocab_size, secret_key, gamma)
 
        if current_token in green_list:
            green_count += 1
        total_tested += 1
 
    observed_fraction = green_count / total_tested
    expected_fraction = gamma
 
    # One-proportion z-test
    # H0: p = gamma (no watermark)
    # H1: p > gamma (watermark present)
    se = np.sqrt(gamma * (1 - gamma) / total_tested)
    z_score = (observed_fraction - expected_fraction) / se
    p_value = 1 - stats.norm.cdf(z_score)
 
    return {
        "total_tokens_tested": total_tested,
        "green_list_count": green_count,
        "observed_green_fraction": round(observed_fraction, 4),
        "expected_green_fraction": gamma,
        "z_score": round(float(z_score), 4),
        "p_value": float(p_value),
        "watermark_detected": p_value < 1e-5,
        "confidence": "high" if p_value < 1e-10 else "medium" if p_value < 1e-5 else "low",
    }

Distortion-Free Watermarking

Multi-Bit Watermarking

While the basic green-list scheme embeds a single-bit watermark (present or absent), more advanced schemes embed multi-bit payloads that can encode information such as:

Provider identification (which API generated this text)
User or session identification
Timestamp of generation
Model version information

Forensic Detection Methodology

Step 1: Text Preprocessing

Before running watermark detection, the text must be tokenized using the same tokenizer used by the suspected source model. Using the wrong tokenizer will destroy the watermark signal.

from transformers import AutoTokenizer
 
def prepare_text_for_detection(
    text: str,
    model_candidates: list[str],
) -> dict[str, list[int]]:
    """
    Tokenize text with multiple candidate model tokenizers.
 
    Since the watermark is embedded at the token level, we must
    test with each candidate model's tokenizer. The correct
    tokenizer will produce a stronger watermark signal.
    """
    results = {}
    for model_name in model_candidates:
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        token_ids = tokenizer.encode(text, add_special_tokens=False)
        results[model_name] = token_ids
    return results
 
def multi_model_watermark_scan(
    text: str,
    model_candidates: list[str],
    candidate_keys: dict[str, bytes],
    gamma: float = 0.5,
) -> list[dict]:
    """
    Scan text for watermarks from multiple candidate models/providers.
 
    Tests each combination of tokenizer and watermarking key to
    determine which (if any) produced the text.
    """
    tokenizations = prepare_text_for_detection(text, model_candidates)
    results = []
 
    for model_name, token_ids in tokenizations.items():
        if model_name not in candidate_keys:
            continue
 
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        vocab_size = tokenizer.vocab_size
 
        detection = detect_watermark(
            token_ids=token_ids,
            vocab_size=vocab_size,
            secret_key=candidate_keys[model_name],
            gamma=gamma,
        )
        detection["model"] = model_name
        results.append(detection)
 
    results.sort(key=lambda r: r.get("z_score", 0), reverse=True)
    return results

Step 2: Statistical Significance Assessment

Confidence Level	p-value Threshold	z-score (approx.)	Forensic Use
Preliminary	< 0.01	> 2.33	Sufficient for triage, not for attribution
Standard	< 1e-5	> 4.26	Sufficient for internal investigation findings
High confidence	< 1e-10	> 6.36	Sufficient for formal forensic reports
Forensic grade	< 1e-20	> 9.26	Suitable for legal proceedings

The required text length to achieve a given confidence level depends on the watermarking strength (delta parameter) and the green list fraction (gamma). As a rough guide:

200 tokens: Can detect strong watermarks (z > 4) in favorable conditions
500 tokens: Reliable detection for most watermarking configurations
1000+ tokens: Near-certain detection with high forensic confidence

Step 3: Windowed Analysis

If the text under investigation is a mix of human-written and AI-generated content, a sliding window analysis can identify which portions are watermarked.

def windowed_watermark_detection(
    token_ids: list[int],
    vocab_size: int,
    secret_key: bytes,
    window_size: int = 100,
    step_size: int = 25,
    gamma: float = 0.5,
) -> list[dict]:
    """
    Apply watermark detection in a sliding window across the text.
 
    This identifies which portions of a document are likely AI-generated
    versus human-written, based on local watermark signal strength.
    """
    results = []
 
    for start in range(0, len(token_ids) - window_size, step_size):
        window = token_ids[start:start + window_size]
        detection = detect_watermark(window, vocab_size, secret_key, gamma)
        detection["token_start"] = start
        detection["token_end"] = start + window_size
        results.append(detection)
 
    return results

Watermark Removal Attacks and Forensic Countermeasures

Common Removal Techniques

Attackers aware of watermarking may attempt to remove the watermark while preserving the text's content. Common removal techniques include:

Paraphrasing: Using another LLM to rewrite the text. This replaces most tokens, disrupting the green-list signal.
Token substitution: Replacing individual words with synonyms using a thesaurus or word embedding similarity.
Homoglyph substitution: Replacing characters with visually identical Unicode characters from different scripts.
Translation round-tripping: Translating to another language and back, which completely retokenizes the text.
Insertion/deletion: Adding or removing words to shift the token alignment.

Forensic Detection of Removal Attempts

Even when a watermark has been partially removed, forensic evidence of the original watermark may persist. The investigator should look for:

def analyze_removal_artifacts(
    token_ids: list[int],
    vocab_size: int,
    secret_key: bytes,
    gamma: float = 0.5,
) -> dict:
    """
    Analyze text for signs of watermark removal attempts.
 
    Even after paraphrasing or editing, statistical artifacts of
    the original watermark may remain in portions of the text
    that were not modified.
    """
    # Full-text detection
    full_result = detect_watermark(token_ids, vocab_size, secret_key, gamma)
 
    # Windowed detection to find surviving watermark fragments
    window_results = windowed_watermark_detection(
        token_ids, vocab_size, secret_key,
        window_size=50, step_size=10, gamma=gamma,
    )
 
    # Count windows with significant watermark signal
    significant_windows = [
        w for w in window_results if w.get("z_score", 0) > 2.0
    ]
 
    # Analyze the pattern of surviving vs. removed watermark regions
    z_scores = [w.get("z_score", 0) for w in window_results]
    z_variance = float(np.var(z_scores)) if z_scores else 0.0
 
    return {
        "full_text_detection": full_result,
        "total_windows": len(window_results),
        "significant_windows": len(significant_windows),
        "significant_fraction": (
            len(significant_windows) / len(window_results)
            if window_results else 0.0
        ),
        "z_score_variance": z_variance,
        "removal_likely": (
            full_result.get("z_score", 0) < 2.0
            and len(significant_windows) > len(window_results) * 0.2
        ),
        "interpretation": _interpret_removal_analysis(
            full_result, significant_windows, window_results, z_variance
        ),
    }
 
def _interpret_removal_analysis(
    full_result: dict,
    significant_windows: list,
    all_windows: list,
    z_variance: float,
) -> str:
    full_z = full_result.get("z_score", 0)
    sig_frac = len(significant_windows) / max(len(all_windows), 1)
 
    if full_z > 4.0:
        return "Strong watermark detected -- no significant removal"
    if full_z > 2.0 and sig_frac > 0.5:
        return "Partial watermark -- possible light editing or paraphrasing of some sections"
    if full_z < 2.0 and sig_frac > 0.2:
        return "Watermark largely removed but fragments survive -- likely systematic removal attempt"
    if z_variance > 4.0:
        return "High variance in local watermark strength -- mixed human/AI or selective editing"
    return "No watermark signal detected"

Provider-Specific Watermarking Schemes

Different AI providers have implemented or announced different watermarking approaches. Forensic investigators should be aware of the landscape:

Provider	Watermarking Status	Scheme Type	Detection Availability
OpenAI	Announced, delayed deployment	Unknown (not publicly detailed)	Not publicly available
Google DeepMind	SynthID deployed	Logit-based statistical	Integrated in Google tools
Meta	Research published	Green-list variant	Open-source reference implementation
Anthropic	No public watermarking announced	N/A	N/A
Microsoft	Research published	Multiple schemes studied	Research code available

Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A Watermark for Large Language Models. Proceedings of the 40th International Conference on Machine Learning (ICML). https://arxiv.org/abs/2301.10226
Aaronson, S. (2023). My AI Safety Lecture at UT Austin. Scott Aaronson's Blog (Shtetl-Optimized). https://scottaaronson.blog/?p=6823
Zhao, X., Ananth, P., Li, L., & Wang, Y.-X. (2024). Provable Robust Watermarking for AI-Generated Text. Proceedings of the International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2306.17439

Edit this page on GitHub

LLM Output Watermark Detection

Related articles

LLM Output Watermark Detection

Related articles