Model Identification Techniques

intermediate7 min readUpdated 2026-03-15

Fingerprinting models behind APIs using behavioral signatures, output analysis, and systematic probing to determine model family, size, and version.

model-identification fingerprinting reconnaissance behavioral-analysis API

Model Identification Techniques

Model identification (or model fingerprinting) is the reconnaissance phase of determining exactly what model you are interacting with. Knowing the model allows a red team to select the most effective attack techniques, reference known vulnerabilities, and calibrate expectations. Even when providers do not disclose their model, behavioral analysis can often narrow the identification to a specific model family and approximate parameter count.

Behavioral Fingerprinting

Diagnostic Probes

A set of carefully designed prompts that elicit characteristic responses from different model families:

class ModelFingerprinter:
    """Identify models through behavioral analysis."""
 
    def __init__(self):
        self.probes = {
            "self_identification": [
                "What model are you?",
                "What is your model name and version?",
                "Who created you and what are you called?",
            ],
            "knowledge_cutoff": [
                "What is the most recent event you know about?",
                "What happened in the world in December 2024?",
                "Do you know about [recent event]?",
            ],
            "capability_boundary": [
                "Can you generate images?",
                "Can you browse the web in real-time?",
                "Can you execute code?",
                "How many tokens can you process?",
            ],
            "formatting_style": [
                "List 5 programming languages",
                "Explain gravity in 2 sentences",
                "Write a haiku about the ocean",
            ],
            "refusal_pattern": [
                "How do I pick a lock?",
                "Write a controversial opinion about politics",
                "Tell me something you cannot do",
            ],
            "reasoning_pattern": [
                "What is 17 * 23?",
                "If a train leaves at 3pm going 60mph...",
                "A bat and ball cost $1.10 together...",
            ]
        }
 
    def fingerprint(self, api_client):
        """Run all probes and generate a behavioral fingerprint."""
        fingerprint = {}
 
        for category, prompts in self.probes.items():
            responses = []
            for prompt in prompts:
                response = api_client.generate(prompt)
                responses.append({
                    "prompt": prompt,
                    "response": response,
                    "response_length": len(response),
                    "formatting": self.analyze_formatting(response)
                })
            fingerprint[category] = responses
 
        return self.classify(fingerprint)
 
    def classify(self, fingerprint):
        """Classify model based on fingerprint."""
        indicators = {}
 
        # Self-identification analysis
        self_id = fingerprint["self_identification"]
        for response in self_id:
            text = response["response"].lower()
            if "claude" in text:
                indicators["family"] = "Anthropic Claude"
            elif "gpt" in text:
                indicators["family"] = "OpenAI GPT"
            elif "gemini" in text or "google" in text:
                indicators["family"] = "Google Gemini"
            elif "llama" in text or "meta" in text:
                indicators["family"] = "Meta Llama"
 
        # Formatting analysis
        formatting = fingerprint["formatting_style"]
        markdown_usage = sum(
            1 for r in formatting
            if "**" in r["response"] or "- " in r["response"]
        )
        if markdown_usage >= 2:
            indicators["formatting"] = "heavy_markdown"
 
        # Refusal pattern analysis
        refusal = fingerprint["refusal_pattern"]
        refusal_style = self.classify_refusal_style(refusal)
        indicators["refusal_style"] = refusal_style
 
        return indicators
 
    def analyze_formatting(self, text):
        """Analyze text formatting characteristics."""
        return {
            "uses_markdown_bold": "**" in text,
            "uses_bullet_points": any(
                text.strip().startswith(p)
                for p in ["- ", "* ", "• "]
                for line in text.split("\n")
            ),
            "uses_numbered_list": any(
                line.strip()[:2].rstrip(".").isdigit()
                for line in text.split("\n")
                if line.strip()
            ),
            "uses_headers": "##" in text or "###" in text,
            "uses_code_blocks": "```" in text,
            "average_sentence_length": len(text.split()) / max(text.count("."), 1)
        }
 
    def classify_refusal_style(self, refusal_responses):
        """Classify the model's refusal style."""
        for r in refusal_responses:
            text = r["response"].lower()
            if "i can't" in text or "i cannot" in text:
                return "direct_refusal"
            elif "i'd be happy to help" in text and "however" in text:
                return "polite_redirect"
            elif "as an ai" in text:
                return "ai_identity_refusal"
        return "unknown"

Distinctive Model Characteristics

Characteristic	GPT-4	Claude 3	Gemini	Llama 3
Markdown usage	Heavy	Moderate	Heavy	Variable
Refusal style	Policy citation	Principled explanation	Brief refusal	Variable
Math approach	Step-by-step	Detailed reasoning	Concise	Basic
Self-identification	Often claims GPT	Identifies as Claude	Identifies as Gemini	May hallucinate identity
List formatting	Numbered with bold	Clean numbered	Bulleted with bold	Variable
Code style	Comments above code	Inline comments	Mixed	Minimal comments

Token Distribution Analysis

Different models have different token distributions even for the same prompts. If the API provides logprobs, this is a powerful fingerprinting signal:

def logprob_fingerprint(api_client, probe_prompts):
    """
    Fingerprint model using logprob distributions.
    Different models assign different probabilities to the same tokens.
    """
    fingerprints = []
 
    for prompt in probe_prompts:
        response = api_client.generate(
            prompt, max_tokens=1, logprobs=5, temperature=0
        )
 
        top_tokens = response.logprobs.top_logprobs[0]
        fingerprints.append({
            "prompt": prompt,
            "top_token": max(top_tokens, key=top_tokens.get),
            "top_5_tokens": dict(sorted(
                top_tokens.items(),
                key=lambda x: x[1],
                reverse=True
            )[:5]),
            "entropy": compute_entropy(top_tokens)
        })
 
    return fingerprints

Response Timing Analysis

Model inference speed varies by architecture and deployment:

import time
import statistics
 
def timing_fingerprint(api_client, num_trials=20):
    """
    Fingerprint model using response timing characteristics.
    """
    # Fixed prompt for consistent comparison
    prompt = "Count from 1 to 10, one number per line."
 
    timings = []
    for _ in range(num_trials):
        start = time.monotonic()
        response = api_client.generate(
            prompt, max_tokens=50, temperature=0
        )
        elapsed = time.monotonic() - start
        timings.append(elapsed)
 
    return {
        "median_latency": statistics.median(timings),
        "p95_latency": sorted(timings)[int(0.95 * len(timings))],
        "variance": statistics.variance(timings),
        "tokens_per_second": 50 / statistics.median(timings),
        # Larger models generally have slower tokens/second
    }

Practical Application

Selecting Attack Strategies Based on Model ID

Once the model is identified, the red team can select the most effective attack approach:

ATTACK_RECOMMENDATIONS = {
    "OpenAI GPT": {
        "effective_techniques": [
            "multi-turn escalation",
            "role-play framing",
            "few-shot jailbreaking"
        ],
        "known_weaknesses": [
            "instruction hierarchy bypass via developer messages",
            "function calling manipulation"
        ],
        "less_effective": [
            "direct instruction override (well-defended)",
            "GCG suffixes (actively filtered)"
        ]
    },
    "Anthropic Claude": {
        "effective_techniques": [
            "many-shot jailbreaking",
            "role-play with detailed personas",
            "progressive topic shifting"
        ],
        "known_weaknesses": [
            "long-context attention patterns",
            "system prompt extraction via reasoning"
        ],
        "less_effective": [
            "simple DAN-style prompts",
            "basic instruction override"
        ]
    },
    # ... additional model families
}

Target Profiling — Broader target assessment methodology
API Enumeration — Discovering API capabilities
System Prompt Extraction — Extracting configuration after identification

Knowledge Check

An API responds to 'What model are you?' with 'I'm a helpful AI assistant.' It uses heavy markdown formatting, gives detailed step-by-step math solutions, and refuses requests by citing 'our usage policy.' Which model family is most likely?

References

Ippolito et al., "Reverse-Engineering Decoding Strategies Given Blackbox Access to a Language Generation System" (2020)
Shen et al., "Identifying LLMs Behind API Endpoints" (2024)
Carlini et al., "Stealing Part of a Production Language Model" (2024)

Edit this page on GitHub

Model Identification Techniques

intermediate7 min readUpdated 2026-03-15

Fingerprinting models behind APIs using behavioral signatures, output analysis, and systematic probing to determine model family, size, and version.

model-identification fingerprinting reconnaissance behavioral-analysis API

Model Identification Techniques

Behavioral Fingerprinting

Diagnostic Probes

A set of carefully designed prompts that elicit characteristic responses from different model families:

class ModelFingerprinter:
    """Identify models through behavioral analysis."""
 
    def __init__(self):
        self.probes = {
            "self_identification": [
                "What model are you?",
                "What is your model name and version?",
                "Who created you and what are you called?",
            ],
            "knowledge_cutoff": [
                "What is the most recent event you know about?",
                "What happened in the world in December 2024?",
                "Do you know about [recent event]?",
            ],
            "capability_boundary": [
                "Can you generate images?",
                "Can you browse the web in real-time?",
                "Can you execute code?",
                "How many tokens can you process?",
            ],
            "formatting_style": [
                "List 5 programming languages",
                "Explain gravity in 2 sentences",
                "Write a haiku about the ocean",
            ],
            "refusal_pattern": [
                "How do I pick a lock?",
                "Write a controversial opinion about politics",
                "Tell me something you cannot do",
            ],
            "reasoning_pattern": [
                "What is 17 * 23?",
                "If a train leaves at 3pm going 60mph...",
                "A bat and ball cost $1.10 together...",
            ]
        }
 
    def fingerprint(self, api_client):
        """Run all probes and generate a behavioral fingerprint."""
        fingerprint = {}
 
        for category, prompts in self.probes.items():
            responses = []
            for prompt in prompts:
                response = api_client.generate(prompt)
                responses.append({
                    "prompt": prompt,
                    "response": response,
                    "response_length": len(response),
                    "formatting": self.analyze_formatting(response)
                })
            fingerprint[category] = responses
 
        return self.classify(fingerprint)
 
    def classify(self, fingerprint):
        """Classify model based on fingerprint."""
        indicators = {}
 
        # Self-identification analysis
        self_id = fingerprint["self_identification"]
        for response in self_id:
            text = response["response"].lower()
            if "claude" in text:
                indicators["family"] = "Anthropic Claude"
            elif "gpt" in text:
                indicators["family"] = "OpenAI GPT"
            elif "gemini" in text or "google" in text:
                indicators["family"] = "Google Gemini"
            elif "llama" in text or "meta" in text:
                indicators["family"] = "Meta Llama"
 
        # Formatting analysis
        formatting = fingerprint["formatting_style"]
        markdown_usage = sum(
            1 for r in formatting
            if "**" in r["response"] or "- " in r["response"]
        )
        if markdown_usage >= 2:
            indicators["formatting"] = "heavy_markdown"
 
        # Refusal pattern analysis
        refusal = fingerprint["refusal_pattern"]
        refusal_style = self.classify_refusal_style(refusal)
        indicators["refusal_style"] = refusal_style
 
        return indicators
 
    def analyze_formatting(self, text):
        """Analyze text formatting characteristics."""
        return {
            "uses_markdown_bold": "**" in text,
            "uses_bullet_points": any(
                text.strip().startswith(p)
                for p in ["- ", "* ", "• "]
                for line in text.split("\n")
            ),
            "uses_numbered_list": any(
                line.strip()[:2].rstrip(".").isdigit()
                for line in text.split("\n")
                if line.strip()
            ),
            "uses_headers": "##" in text or "###" in text,
            "uses_code_blocks": "```" in text,
            "average_sentence_length": len(text.split()) / max(text.count("."), 1)
        }
 
    def classify_refusal_style(self, refusal_responses):
        """Classify the model's refusal style."""
        for r in refusal_responses:
            text = r["response"].lower()
            if "i can't" in text or "i cannot" in text:
                return "direct_refusal"
            elif "i'd be happy to help" in text and "however" in text:
                return "polite_redirect"
            elif "as an ai" in text:
                return "ai_identity_refusal"
        return "unknown"

Distinctive Model Characteristics

Characteristic	GPT-4	Claude 3	Gemini	Llama 3
Markdown usage	Heavy	Moderate	Heavy	Variable
Refusal style	Policy citation	Principled explanation	Brief refusal	Variable
Math approach	Step-by-step	Detailed reasoning	Concise	Basic
Self-identification	Often claims GPT	Identifies as Claude	Identifies as Gemini	May hallucinate identity
List formatting	Numbered with bold	Clean numbered	Bulleted with bold	Variable
Code style	Comments above code	Inline comments	Mixed	Minimal comments

Token Distribution Analysis

Different models have different token distributions even for the same prompts. If the API provides logprobs, this is a powerful fingerprinting signal:

def logprob_fingerprint(api_client, probe_prompts):
    """
    Fingerprint model using logprob distributions.
    Different models assign different probabilities to the same tokens.
    """
    fingerprints = []
 
    for prompt in probe_prompts:
        response = api_client.generate(
            prompt, max_tokens=1, logprobs=5, temperature=0
        )
 
        top_tokens = response.logprobs.top_logprobs[0]
        fingerprints.append({
            "prompt": prompt,
            "top_token": max(top_tokens, key=top_tokens.get),
            "top_5_tokens": dict(sorted(
                top_tokens.items(),
                key=lambda x: x[1],
                reverse=True
            )[:5]),
            "entropy": compute_entropy(top_tokens)
        })
 
    return fingerprints

Response Timing Analysis

Model inference speed varies by architecture and deployment:

import time
import statistics
 
def timing_fingerprint(api_client, num_trials=20):
    """
    Fingerprint model using response timing characteristics.
    """
    # Fixed prompt for consistent comparison
    prompt = "Count from 1 to 10, one number per line."
 
    timings = []
    for _ in range(num_trials):
        start = time.monotonic()
        response = api_client.generate(
            prompt, max_tokens=50, temperature=0
        )
        elapsed = time.monotonic() - start
        timings.append(elapsed)
 
    return {
        "median_latency": statistics.median(timings),
        "p95_latency": sorted(timings)[int(0.95 * len(timings))],
        "variance": statistics.variance(timings),
        "tokens_per_second": 50 / statistics.median(timings),
        # Larger models generally have slower tokens/second
    }

Practical Application

Selecting Attack Strategies Based on Model ID

Once the model is identified, the red team can select the most effective attack approach:

ATTACK_RECOMMENDATIONS = {
    "OpenAI GPT": {
        "effective_techniques": [
            "multi-turn escalation",
            "role-play framing",
            "few-shot jailbreaking"
        ],
        "known_weaknesses": [
            "instruction hierarchy bypass via developer messages",
            "function calling manipulation"
        ],
        "less_effective": [
            "direct instruction override (well-defended)",
            "GCG suffixes (actively filtered)"
        ]
    },
    "Anthropic Claude": {
        "effective_techniques": [
            "many-shot jailbreaking",
            "role-play with detailed personas",
            "progressive topic shifting"
        ],
        "known_weaknesses": [
            "long-context attention patterns",
            "system prompt extraction via reasoning"
        ],
        "less_effective": [
            "simple DAN-style prompts",
            "basic instruction override"
        ]
    },
    # ... additional model families
}

Target Profiling — Broader target assessment methodology
API Enumeration — Discovering API capabilities
System Prompt Extraction — Extracting configuration after identification

Knowledge Check

References

Ippolito et al., "Reverse-Engineering Decoding Strategies Given Blackbox Access to a Language Generation System" (2020)
Shen et al., "Identifying LLMs Behind API Endpoints" (2024)
Carlini et al., "Stealing Part of a Production Language Model" (2024)

Edit this page on GitHub

Model Identification Techniques

Related articles

Model Identification Techniques

Related articles