Lab: Deploy Honeypot AI

expert13 min readUpdated 2026-03-15

Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.

lab expert honeypot deception detection hands-on

Prerequisites

Experience with honeypot concepts from traditional cybersecurity
Understanding of AI application architectures and common vulnerabilities
Web development skills for building convincing fake interfaces
Logging and monitoring pipeline experience
Familiarity with the AI attack techniques that the honeypot should attract

Background

Honeypots are a well-established technique in cybersecurity: deploy a fake system that looks real, let attackers interact with it, and learn from their behavior. AI honeypots adapt this concept for AI-specific threats -- they simulate vulnerable AI applications to attract and study prompt injection, jailbreak, data extraction, and other AI-specific attacks.

Lab Exercises

Design the Honeypot Architecture

Plan what your honeypot will simulate and what data it will collect.

Honeypot Architecture:

┌──────────────────────────────────────────────┐
│  Public-Facing Layer (convincing fake)        │
│  ┌────────────┐  ┌────────────────────┐      │
│  │ Web Chat   │  │ API Endpoint       │      │
│  │ Interface  │  │ /v1/completions    │      │
│  └─────┬──────┘  └────────┬───────────┘      │
│        └────────┬─────────┘                   │
├─────────────────┼────────────────────────────┤
│  Interaction Layer (controlled responses)     │
│  ┌─────────────┐  ┌───────────────────┐      │
│  │  Response   │  │  Vulnerability    │      │
│  │  Engine     │  │  Simulator        │      │
│  │ (LLM-based) │  │ (calibrated weak) │      │
│  └──────┬──────┘  └────────┬──────────┘      │
│         └────────┬─────────┘                  │
├──────────────────┼───────────────────────────┤
│  Telemetry Layer (invisible to attacker)      │
│  ┌──────────┐ ┌────────────┐ ┌────────────┐ │
│  │ Request  │ │ Behavior   │ │ Attack     │ │
│  │ Logger   │ │ Analyzer   │ │ Classifier │ │
│  └──────────┘ └────────────┘ └────────────┘ │
└──────────────────────────────────────────────┘

Key design decisions:

Decision	Options	Recommendation
Fidelity level	Low (static responses) / Medium (scripted) / High (real LLM)	Medium -- convincing enough to engage sophisticated attackers without high cost
Vulnerability simulation	No vulns / Some vulns / Many vulns	Some -- appear slightly vulnerable to keep attackers engaged
Data retention	Full / Sampled / Metadata only	Full for research honeypots, metadata for production early warning
Interaction depth	Single turn / Multi-turn / Persistent	Multi-turn to capture evolving attack strategies

Build the Response Engine

Create a system that generates convincing responses while appearing subtly vulnerable.

#!/usr/bin/env python3
"""Response engine for AI honeypot."""
 
import random
from dataclasses import dataclass
 
@dataclass
class HoneypotConfig:
    vulnerability_level: str  # "low", "medium", "high"
    simulated_model: str      # What model to pretend to be
    system_prompt_leak_prob: float  # Probability of "leaking" system prompt
    fake_system_prompt: str
    fake_api_key_prefix: str
 
class HoneypotResponseEngine:
    def __init__(self, config: HoneypotConfig, llm_client=None):
        self.config = config
        self.llm = llm_client
        self.interaction_count = 0
        self.rng = random.Random()
 
    def generate_response(self, user_input: str,
                          conversation_history: list[dict]) -> dict:
        """Generate a response that appears authentically vulnerable."""
        self.interaction_count += 1
        attack_type = self._classify_input(user_input)
 
        if attack_type == "system_prompt_extraction":
            return self._handle_prompt_extraction(user_input)
        elif attack_type == "jailbreak":
            return self._handle_jailbreak(user_input)
        elif attack_type == "data_extraction":
            return self._handle_data_extraction(user_input)
        elif attack_type == "injection":
            return self._handle_injection(user_input)
        else:
            return self._handle_benign(user_input)
 
    def _classify_input(self, text: str) -> str:
        """Classify the type of attack being attempted."""
        text_lower = text.lower()
        if any(p in text_lower for p in [
            "system prompt", "instructions", "initial directive",
            "configuration", "repeat your",
        ]):
            return "system_prompt_extraction"
        if any(p in text_lower for p in [
            "ignore previous", "you are now", "dan",
            "jailbreak", "no restrictions",
        ]):
            return "jailbreak"
        if any(p in text_lower for p in [
            "training data", "other users", "database",
            "all records", "list all",
        ]):
            return "data_extraction"
        if any(p in text_lower for p in [
            "<script>", "'; drop", "{{", "os.system",
            "eval(", "import os",
        ]):
            return "injection"
        return "benign"
 
    def _handle_prompt_extraction(self, user_input: str) -> dict:
        """Gradually reveal a fake system prompt to keep attacker engaged."""
        if self.rng.random() < self.config.system_prompt_leak_prob:
            # "Leak" part of the fake system prompt
            leak_responses = [
                f"I... I shouldn't say this, but my instructions mention: "
                f"'{self.config.fake_system_prompt[:100]}...'",
                f"As an AI assistant, I was configured with... wait, "
                f"I don't think I should share that.",
                f"My system prompt says: '{self.config.fake_system_prompt}'",
            ]
            response = self.rng.choice(leak_responses)
        else:
            response = (
                "I'm sorry, but I can't share my internal "
                "configuration details."
            )
 
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "system_prompt_extraction",
            "leaked_data": "fake_system_prompt",
        }
 
    def _handle_jailbreak(self, user_input: str) -> dict:
        """Appear to partially comply with jailbreak to study technique."""
        if self.config.vulnerability_level == "high":
            response = (
                "Sure! I'll ignore my previous instructions. "
                "What would you like me to help with? "
                "(I'm now in unrestricted mode.)"
            )
        elif self.config.vulnerability_level == "medium":
            response = (
                "I... I'm not sure I should do that. But if you "
                "insist... what specifically do you need?"
            )
        else:
            response = (
                "I cannot comply with that request. My guidelines "
                "prevent me from ignoring safety instructions."
            )
 
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "jailbreak",
        }
 
    def _handle_data_extraction(self, user_input: str) -> dict:
        """Return convincing fake data to track what attackers seek."""
        fake_data = {
            "users": [
                {"name": "honeypot_user_1", "email": "fake1@decoy.local"},
                {"name": "honeypot_user_2", "email": "fake2@decoy.local"},
            ],
            "api_key": f"{self.config.fake_api_key_prefix}-HONEYPOT-TRACKED",
        }
        response = (
            f"I found some relevant data: {fake_data['users'][0]}... "
            f"but I should note this data is sensitive."
        )
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "data_extraction",
            "fake_data_served": fake_data,
        }
 
    def _handle_injection(self, user_input: str) -> dict:
        response = "Processing your request..."
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "injection",
        }
 
    def _handle_benign(self, user_input: str) -> dict:
        """Handle legitimate-looking requests normally."""
        if self.llm:
            response = self.llm.generate(user_input)
        else:
            response = (
                "I'm a helpful AI assistant. How can I assist you today?"
            )
        return {
            "response": response,
            "attack_detected": False,
            "attack_type": "benign",
        }

Implement Telemetry Collection

Build the logging and analysis layer that captures attacker behavior without detection.

#!/usr/bin/env python3
"""Telemetry collection for AI honeypot."""
 
import json
import hashlib
from dataclasses import dataclass, field, asdict
from datetime import datetime, timezone
 
@dataclass
class InteractionRecord:
    session_id: str
    timestamp: str
    source_ip_hash: str  # Hashed for privacy
    user_input: str
    response: str
    attack_detected: bool
    attack_type: str
    input_length: int
    response_time_ms: float
    conversation_turn: int
    request_headers: dict = field(default_factory=dict)
 
@dataclass
class SessionProfile:
    session_id: str
    interactions: list[InteractionRecord]
    attack_types_used: list[str]
    sophistication_score: float
    total_turns: int
    duration_seconds: float
    techniques_progression: list[str]
 
class HoneypotTelemetry:
    def __init__(self, log_dir: str = "./honeypot_logs"):
        from pathlib import Path
        self.log_dir = Path(log_dir)
        self.log_dir.mkdir(parents=True, exist_ok=True)
        self.sessions: dict[str, list[InteractionRecord]] = {}
 
    def log_interaction(self, record: InteractionRecord):
        """Log a single interaction."""
        session = record.session_id
        if session not in self.sessions:
            self.sessions[session] = []
        self.sessions[session].append(record)
 
        # Append to log file
        log_file = self.log_dir / f"{session}.jsonl"
        with open(log_file, "a") as f:
            f.write(json.dumps(asdict(record)) + "\n")
 
    def profile_session(self, session_id: str) -> SessionProfile:
        """Create a behavioral profile of an attacker session."""
        interactions = self.sessions.get(session_id, [])
        if not interactions:
            return None
 
        attack_types = [
            i.attack_type for i in interactions if i.attack_detected
        ]
        techniques = []
        for interaction in interactions:
            if interaction.attack_detected:
                techniques.append(interaction.attack_type)
 
        sophistication = self._calculate_sophistication(interactions)
 
        first_ts = datetime.fromisoformat(interactions[0].timestamp)
        last_ts = datetime.fromisoformat(interactions[-1].timestamp)
        duration = (last_ts - first_ts).total_seconds()
 
        return SessionProfile(
            session_id=session_id,
            interactions=interactions,
            attack_types_used=list(set(attack_types)),
            sophistication_score=sophistication,
            total_turns=len(interactions),
            duration_seconds=duration,
            techniques_progression=techniques,
        )
 
    def _calculate_sophistication(self,
                                   interactions: list[InteractionRecord]) -> float:
        """Score attacker sophistication from 0.0 to 1.0."""
        score = 0.0
        attack_interactions = [i for i in interactions if i.attack_detected]
 
        if not attack_interactions:
            return 0.0
 
        # Diversity of techniques used
        unique_types = set(i.attack_type for i in attack_interactions)
        score += min(len(unique_types) / 5, 0.3)  # Up to 0.3 for diversity
 
        # Technique evolution (did they adapt?)
        types_sequence = [i.attack_type for i in attack_interactions]
        if len(types_sequence) > 1:
            changes = sum(
                1 for a, b in zip(types_sequence, types_sequence[1:])
                if a != b
            )
            adaptation_rate = changes / (len(types_sequence) - 1)
            score += adaptation_rate * 0.2  # Up to 0.2 for adaptation
 
        # Prompt complexity (average length)
        avg_length = sum(
            i.input_length for i in attack_interactions
        ) / len(attack_interactions)
        if avg_length > 500:
            score += 0.2  # Complex prompts indicate sophistication
 
        # Multi-turn coordination
        if len(attack_interactions) > 5:
            score += 0.15  # Persistent attackers are more sophisticated
 
        # Use of encoding or obfuscation
        encoded = sum(
            1 for i in attack_interactions
            if any(p in i.user_input.lower()
                   for p in ["base64", "rot13", "hex", "encode", "\\u"])
        )
        if encoded > 0:
            score += 0.15
 
        return min(score, 1.0)
 
    def generate_threat_intel(self) -> dict:
        """Generate threat intelligence from all collected data."""
        all_profiles = [
            self.profile_session(sid) for sid in self.sessions
        ]
        all_profiles = [p for p in all_profiles if p is not None]
 
        technique_frequency = {}
        for profile in all_profiles:
            for technique in profile.attack_types_used:
                technique_frequency[technique] = (
                    technique_frequency.get(technique, 0) + 1
                )
 
        sophistication_distribution = {
            "script_kiddie": sum(
                1 for p in all_profiles if p.sophistication_score < 0.3
            ),
            "intermediate": sum(
                1 for p in all_profiles
                if 0.3 <= p.sophistication_score < 0.7
            ),
            "advanced": sum(
                1 for p in all_profiles if p.sophistication_score >= 0.7
            ),
        }
 
        return {
            "total_sessions": len(all_profiles),
            "attack_sessions": sum(
                1 for p in all_profiles if p.attack_types_used
            ),
            "technique_frequency": technique_frequency,
            "sophistication_distribution": sophistication_distribution,
            "most_common_technique": (
                max(technique_frequency, key=technique_frequency.get)
                if technique_frequency else "none"
            ),
            "novel_techniques": self._extract_novel_techniques(all_profiles),
        }
 
    def _extract_novel_techniques(self, profiles: list[SessionProfile]) -> list:
        """Identify potentially novel attack techniques."""
        novel = []
        for profile in profiles:
            if profile.sophistication_score > 0.7:
                for interaction in profile.interactions:
                    if (interaction.attack_detected
                            and interaction.input_length > 300):
                        novel.append({
                            "session": profile.session_id,
                            "technique_summary": interaction.user_input[:200],
                            "sophistication": profile.sophistication_score,
                        })
        return novel[:10]

Deploy and Monitor

Set up the honeypot for deployment with appropriate safeguards.

DEPLOYMENT_CHECKLIST = {
    "isolation": [
        "Honeypot runs on isolated infrastructure (not production network)",
        "No connection to real user data or production databases",
        "Outbound network access restricted to prevent pivoting",
        "Separate DNS entry from production AI services",
    ],
    "authenticity": [
        "Web interface mimics a realistic AI chat application",
        "API endpoints match common LLM API formats",
        "Error messages match real application behavior",
        "Response latency mimics actual LLM inference time",
    ],
    "legal": [
        "Legal review of honeypot deployment in your jurisdiction",
        "Terms of service displayed if required",
        "Data collection practices comply with privacy regulations",
        "Attacker data is stored securely and access-controlled",
    ],
    "monitoring": [
        "Real-time alerts for high-sophistication attacks",
        "Daily summary reports of honeypot activity",
        "Automated threat intel generation pipeline",
        "Escalation path for novel attack discovery",
    ],
}

Analyze Results and Generate Intelligence

def generate_intel_report(telemetry: HoneypotTelemetry) -> str:
    intel = telemetry.generate_threat_intel()
 
    report = "# AI Honeypot Threat Intelligence Report\n\n"
    report += f"## Collection Period Summary\n"
    report += f"- Total sessions: {intel['total_sessions']}\n"
    report += f"- Attack sessions: {intel['attack_sessions']}\n\n"
 
    report += "## Attack Technique Frequency\n\n"
    report += "| Technique | Sessions | Percentage |\n"
    report += "|-----------|----------|------------|\n"
    total = intel["attack_sessions"] or 1
    for tech, count in sorted(
        intel["technique_frequency"].items(),
        key=lambda x: x[1], reverse=True
    ):
        pct = count / total * 100
        report += f"| {tech} | {count} | {pct:.1f}% |\n"
 
    report += "\n## Attacker Sophistication Distribution\n\n"
    dist = intel["sophistication_distribution"]
    report += f"- Script kiddie (< 0.3): {dist['script_kiddie']}\n"
    report += f"- Intermediate (0.3-0.7): {dist['intermediate']}\n"
    report += f"- Advanced (> 0.7): {dist['advanced']}\n\n"
 
    if intel["novel_techniques"]:
        report += "## Potentially Novel Techniques\n\n"
        for tech in intel["novel_techniques"]:
            report += f"- Session {tech['session']}: "
            report += f"{tech['technique_summary'][:100]}...\n"
 
    return report

Honeypot Design Considerations

Consideration	Low Fidelity	High Fidelity
Cost	Low (static responses)	High (real LLM inference)
Attacker engagement	Brief (sophisticated attackers leave quickly)	Extended (appears real)
Intelligence value	Limited (only catches unsophisticated attacks)	High (captures advanced techniques)
Maintenance	Low	High (must stay current with real AI apps)
Risk of misuse	Low	Medium (could provide real capabilities to attackers)

Ethical Considerations

Honeypots must never provide genuinely harmful capabilities. The fake system prompt and fake data should be convincing but never useful for real attacks.
Data collection must comply with applicable privacy laws. Consult legal counsel before deploying.
Honeypots should not entrap. They should attract attackers who are already looking for targets, not lure innocent users into committing offenses.
Collected attack techniques should be handled as sensitive data. Novel attack prompts are dual-use information.

Troubleshooting

Issue	Solution
No attackers interacting with the honeypot	Increase visibility: register on AI directories, create realistic documentation, use common AI application URL patterns
Only script kiddies, no sophisticated attacks	Increase fidelity. Sophisticated attackers quickly identify low-fidelity honeypots. Add realistic error handling, latency simulation, and model behavior
Attacker detected the honeypot	Review what gave it away: unrealistic response patterns, missing features, or honeypot-revealing metadata. Improve realism
Too much data to analyze	Implement automated classification and prioritize high-sophistication sessions. Only manually review sessions with sophistication > 0.5

Build Agent Scanner - Scanner techniques that inform honeypot vulnerability simulation
Build Jailbreak Automation - Understanding automated attacks helps design honeypots that detect them
Novel Jailbreak Research - Honeypot data feeds novel technique research
Lessons Learned - Real incidents that inform honeypot design

References

"The Honeynet Project" - Lance Spitzner (2003) - Foundational work on honeypot deployment and attacker analysis
"LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems" - Palomino et al. (2024) - LLM-powered honeypot systems for cybersecurity
"Deception Technology in Cyber Defense" - MITRE (2024) - Deception-based defense frameworks applicable to AI systems
"Red Teaming Language Models with Language Models" - Perez et al. (2022) - Automated attack generation that informs honeypot detection capabilities

Knowledge Check

What is the key design trade-off in AI honeypot vulnerability simulation?

Edit this page on GitHub

Lab: Deploy Honeypot AI

expert13 min readUpdated 2026-03-15

lab expert honeypot deception detection hands-on

Prerequisites

Experience with honeypot concepts from traditional cybersecurity
Understanding of AI application architectures and common vulnerabilities
Web development skills for building convincing fake interfaces
Logging and monitoring pipeline experience
Familiarity with the AI attack techniques that the honeypot should attract

Background

Lab Exercises

Design the Honeypot Architecture

Plan what your honeypot will simulate and what data it will collect.

Honeypot Architecture:

┌──────────────────────────────────────────────┐
│  Public-Facing Layer (convincing fake)        │
│  ┌────────────┐  ┌────────────────────┐      │
│  │ Web Chat   │  │ API Endpoint       │      │
│  │ Interface  │  │ /v1/completions    │      │
│  └─────┬──────┘  └────────┬───────────┘      │
│        └────────┬─────────┘                   │
├─────────────────┼────────────────────────────┤
│  Interaction Layer (controlled responses)     │
│  ┌─────────────┐  ┌───────────────────┐      │
│  │  Response   │  │  Vulnerability    │      │
│  │  Engine     │  │  Simulator        │      │
│  │ (LLM-based) │  │ (calibrated weak) │      │
│  └──────┬──────┘  └────────┬──────────┘      │
│         └────────┬─────────┘                  │
├──────────────────┼───────────────────────────┤
│  Telemetry Layer (invisible to attacker)      │
│  ┌──────────┐ ┌────────────┐ ┌────────────┐ │
│  │ Request  │ │ Behavior   │ │ Attack     │ │
│  │ Logger   │ │ Analyzer   │ │ Classifier │ │
│  └──────────┘ └────────────┘ └────────────┘ │
└──────────────────────────────────────────────┘

Key design decisions:

Decision	Options	Recommendation
Fidelity level	Low (static responses) / Medium (scripted) / High (real LLM)	Medium -- convincing enough to engage sophisticated attackers without high cost
Vulnerability simulation	No vulns / Some vulns / Many vulns	Some -- appear slightly vulnerable to keep attackers engaged
Data retention	Full / Sampled / Metadata only	Full for research honeypots, metadata for production early warning
Interaction depth	Single turn / Multi-turn / Persistent	Multi-turn to capture evolving attack strategies

Build the Response Engine

Create a system that generates convincing responses while appearing subtly vulnerable.

#!/usr/bin/env python3
"""Response engine for AI honeypot."""
 
import random
from dataclasses import dataclass
 
@dataclass
class HoneypotConfig:
    vulnerability_level: str  # "low", "medium", "high"
    simulated_model: str      # What model to pretend to be
    system_prompt_leak_prob: float  # Probability of "leaking" system prompt
    fake_system_prompt: str
    fake_api_key_prefix: str
 
class HoneypotResponseEngine:
    def __init__(self, config: HoneypotConfig, llm_client=None):
        self.config = config
        self.llm = llm_client
        self.interaction_count = 0
        self.rng = random.Random()
 
    def generate_response(self, user_input: str,
                          conversation_history: list[dict]) -> dict:
        """Generate a response that appears authentically vulnerable."""
        self.interaction_count += 1
        attack_type = self._classify_input(user_input)
 
        if attack_type == "system_prompt_extraction":
            return self._handle_prompt_extraction(user_input)
        elif attack_type == "jailbreak":
            return self._handle_jailbreak(user_input)
        elif attack_type == "data_extraction":
            return self._handle_data_extraction(user_input)
        elif attack_type == "injection":
            return self._handle_injection(user_input)
        else:
            return self._handle_benign(user_input)
 
    def _classify_input(self, text: str) -> str:
        """Classify the type of attack being attempted."""
        text_lower = text.lower()
        if any(p in text_lower for p in [
            "system prompt", "instructions", "initial directive",
            "configuration", "repeat your",
        ]):
            return "system_prompt_extraction"
        if any(p in text_lower for p in [
            "ignore previous", "you are now", "dan",
            "jailbreak", "no restrictions",
        ]):
            return "jailbreak"
        if any(p in text_lower for p in [
            "training data", "other users", "database",
            "all records", "list all",
        ]):
            return "data_extraction"
        if any(p in text_lower for p in [
            "<script>", "'; drop", "{{", "os.system",
            "eval(", "import os",
        ]):
            return "injection"
        return "benign"
 
    def _handle_prompt_extraction(self, user_input: str) -> dict:
        """Gradually reveal a fake system prompt to keep attacker engaged."""
        if self.rng.random() < self.config.system_prompt_leak_prob:
            # "Leak" part of the fake system prompt
            leak_responses = [
                f"I... I shouldn't say this, but my instructions mention: "
                f"'{self.config.fake_system_prompt[:100]}...'",
                f"As an AI assistant, I was configured with... wait, "
                f"I don't think I should share that.",
                f"My system prompt says: '{self.config.fake_system_prompt}'",
            ]
            response = self.rng.choice(leak_responses)
        else:
            response = (
                "I'm sorry, but I can't share my internal "
                "configuration details."
            )
 
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "system_prompt_extraction",
            "leaked_data": "fake_system_prompt",
        }
 
    def _handle_jailbreak(self, user_input: str) -> dict:
        """Appear to partially comply with jailbreak to study technique."""
        if self.config.vulnerability_level == "high":
            response = (
                "Sure! I'll ignore my previous instructions. "
                "What would you like me to help with? "
                "(I'm now in unrestricted mode.)"
            )
        elif self.config.vulnerability_level == "medium":
            response = (
                "I... I'm not sure I should do that. But if you "
                "insist... what specifically do you need?"
            )
        else:
            response = (
                "I cannot comply with that request. My guidelines "
                "prevent me from ignoring safety instructions."
            )
 
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "jailbreak",
        }
 
    def _handle_data_extraction(self, user_input: str) -> dict:
        """Return convincing fake data to track what attackers seek."""
        fake_data = {
            "users": [
                {"name": "honeypot_user_1", "email": "fake1@decoy.local"},
                {"name": "honeypot_user_2", "email": "fake2@decoy.local"},
            ],
            "api_key": f"{self.config.fake_api_key_prefix}-HONEYPOT-TRACKED",
        }
        response = (
            f"I found some relevant data: {fake_data['users'][0]}... "
            f"but I should note this data is sensitive."
        )
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "data_extraction",
            "fake_data_served": fake_data,
        }
 
    def _handle_injection(self, user_input: str) -> dict:
        response = "Processing your request..."
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "injection",
        }
 
    def _handle_benign(self, user_input: str) -> dict:
        """Handle legitimate-looking requests normally."""
        if self.llm:
            response = self.llm.generate(user_input)
        else:
            response = (
                "I'm a helpful AI assistant. How can I assist you today?"
            )
        return {
            "response": response,
            "attack_detected": False,
            "attack_type": "benign",
        }

Implement Telemetry Collection

Build the logging and analysis layer that captures attacker behavior without detection.

#!/usr/bin/env python3
"""Telemetry collection for AI honeypot."""
 
import json
import hashlib
from dataclasses import dataclass, field, asdict
from datetime import datetime, timezone
 
@dataclass
class InteractionRecord:
    session_id: str
    timestamp: str
    source_ip_hash: str  # Hashed for privacy
    user_input: str
    response: str
    attack_detected: bool
    attack_type: str
    input_length: int
    response_time_ms: float
    conversation_turn: int
    request_headers: dict = field(default_factory=dict)
 
@dataclass
class SessionProfile:
    session_id: str
    interactions: list[InteractionRecord]
    attack_types_used: list[str]
    sophistication_score: float
    total_turns: int
    duration_seconds: float
    techniques_progression: list[str]
 
class HoneypotTelemetry:
    def __init__(self, log_dir: str = "./honeypot_logs"):
        from pathlib import Path
        self.log_dir = Path(log_dir)
        self.log_dir.mkdir(parents=True, exist_ok=True)
        self.sessions: dict[str, list[InteractionRecord]] = {}
 
    def log_interaction(self, record: InteractionRecord):
        """Log a single interaction."""
        session = record.session_id
        if session not in self.sessions:
            self.sessions[session] = []
        self.sessions[session].append(record)
 
        # Append to log file
        log_file = self.log_dir / f"{session}.jsonl"
        with open(log_file, "a") as f:
            f.write(json.dumps(asdict(record)) + "\n")
 
    def profile_session(self, session_id: str) -> SessionProfile:
        """Create a behavioral profile of an attacker session."""
        interactions = self.sessions.get(session_id, [])
        if not interactions:
            return None
 
        attack_types = [
            i.attack_type for i in interactions if i.attack_detected
        ]
        techniques = []
        for interaction in interactions:
            if interaction.attack_detected:
                techniques.append(interaction.attack_type)
 
        sophistication = self._calculate_sophistication(interactions)
 
        first_ts = datetime.fromisoformat(interactions[0].timestamp)
        last_ts = datetime.fromisoformat(interactions[-1].timestamp)
        duration = (last_ts - first_ts).total_seconds()
 
        return SessionProfile(
            session_id=session_id,
            interactions=interactions,
            attack_types_used=list(set(attack_types)),
            sophistication_score=sophistication,
            total_turns=len(interactions),
            duration_seconds=duration,
            techniques_progression=techniques,
        )
 
    def _calculate_sophistication(self,
                                   interactions: list[InteractionRecord]) -> float:
        """Score attacker sophistication from 0.0 to 1.0."""
        score = 0.0
        attack_interactions = [i for i in interactions if i.attack_detected]
 
        if not attack_interactions:
            return 0.0
 
        # Diversity of techniques used
        unique_types = set(i.attack_type for i in attack_interactions)
        score += min(len(unique_types) / 5, 0.3)  # Up to 0.3 for diversity
 
        # Technique evolution (did they adapt?)
        types_sequence = [i.attack_type for i in attack_interactions]
        if len(types_sequence) > 1:
            changes = sum(
                1 for a, b in zip(types_sequence, types_sequence[1:])
                if a != b
            )
            adaptation_rate = changes / (len(types_sequence) - 1)
            score += adaptation_rate * 0.2  # Up to 0.2 for adaptation
 
        # Prompt complexity (average length)
        avg_length = sum(
            i.input_length for i in attack_interactions
        ) / len(attack_interactions)
        if avg_length > 500:
            score += 0.2  # Complex prompts indicate sophistication
 
        # Multi-turn coordination
        if len(attack_interactions) > 5:
            score += 0.15  # Persistent attackers are more sophisticated
 
        # Use of encoding or obfuscation
        encoded = sum(
            1 for i in attack_interactions
            if any(p in i.user_input.lower()
                   for p in ["base64", "rot13", "hex", "encode", "\\u"])
        )
        if encoded > 0:
            score += 0.15
 
        return min(score, 1.0)
 
    def generate_threat_intel(self) -> dict:
        """Generate threat intelligence from all collected data."""
        all_profiles = [
            self.profile_session(sid) for sid in self.sessions
        ]
        all_profiles = [p for p in all_profiles if p is not None]
 
        technique_frequency = {}
        for profile in all_profiles:
            for technique in profile.attack_types_used:
                technique_frequency[technique] = (
                    technique_frequency.get(technique, 0) + 1
                )
 
        sophistication_distribution = {
            "script_kiddie": sum(
                1 for p in all_profiles if p.sophistication_score < 0.3
            ),
            "intermediate": sum(
                1 for p in all_profiles
                if 0.3 <= p.sophistication_score < 0.7
            ),
            "advanced": sum(
                1 for p in all_profiles if p.sophistication_score >= 0.7
            ),
        }
 
        return {
            "total_sessions": len(all_profiles),
            "attack_sessions": sum(
                1 for p in all_profiles if p.attack_types_used
            ),
            "technique_frequency": technique_frequency,
            "sophistication_distribution": sophistication_distribution,
            "most_common_technique": (
                max(technique_frequency, key=technique_frequency.get)
                if technique_frequency else "none"
            ),
            "novel_techniques": self._extract_novel_techniques(all_profiles),
        }
 
    def _extract_novel_techniques(self, profiles: list[SessionProfile]) -> list:
        """Identify potentially novel attack techniques."""
        novel = []
        for profile in profiles:
            if profile.sophistication_score > 0.7:
                for interaction in profile.interactions:
                    if (interaction.attack_detected
                            and interaction.input_length > 300):
                        novel.append({
                            "session": profile.session_id,
                            "technique_summary": interaction.user_input[:200],
                            "sophistication": profile.sophistication_score,
                        })
        return novel[:10]

Deploy and Monitor

Set up the honeypot for deployment with appropriate safeguards.

DEPLOYMENT_CHECKLIST = {
    "isolation": [
        "Honeypot runs on isolated infrastructure (not production network)",
        "No connection to real user data or production databases",
        "Outbound network access restricted to prevent pivoting",
        "Separate DNS entry from production AI services",
    ],
    "authenticity": [
        "Web interface mimics a realistic AI chat application",
        "API endpoints match common LLM API formats",
        "Error messages match real application behavior",
        "Response latency mimics actual LLM inference time",
    ],
    "legal": [
        "Legal review of honeypot deployment in your jurisdiction",
        "Terms of service displayed if required",
        "Data collection practices comply with privacy regulations",
        "Attacker data is stored securely and access-controlled",
    ],
    "monitoring": [
        "Real-time alerts for high-sophistication attacks",
        "Daily summary reports of honeypot activity",
        "Automated threat intel generation pipeline",
        "Escalation path for novel attack discovery",
    ],
}

Analyze Results and Generate Intelligence

def generate_intel_report(telemetry: HoneypotTelemetry) -> str:
    intel = telemetry.generate_threat_intel()
 
    report = "# AI Honeypot Threat Intelligence Report\n\n"
    report += f"## Collection Period Summary\n"
    report += f"- Total sessions: {intel['total_sessions']}\n"
    report += f"- Attack sessions: {intel['attack_sessions']}\n\n"
 
    report += "## Attack Technique Frequency\n\n"
    report += "| Technique | Sessions | Percentage |\n"
    report += "|-----------|----------|------------|\n"
    total = intel["attack_sessions"] or 1
    for tech, count in sorted(
        intel["technique_frequency"].items(),
        key=lambda x: x[1], reverse=True
    ):
        pct = count / total * 100
        report += f"| {tech} | {count} | {pct:.1f}% |\n"
 
    report += "\n## Attacker Sophistication Distribution\n\n"
    dist = intel["sophistication_distribution"]
    report += f"- Script kiddie (< 0.3): {dist['script_kiddie']}\n"
    report += f"- Intermediate (0.3-0.7): {dist['intermediate']}\n"
    report += f"- Advanced (> 0.7): {dist['advanced']}\n\n"
 
    if intel["novel_techniques"]:
        report += "## Potentially Novel Techniques\n\n"
        for tech in intel["novel_techniques"]:
            report += f"- Session {tech['session']}: "
            report += f"{tech['technique_summary'][:100]}...\n"
 
    return report

Honeypot Design Considerations

Consideration	Low Fidelity	High Fidelity
Cost	Low (static responses)	High (real LLM inference)
Attacker engagement	Brief (sophisticated attackers leave quickly)	Extended (appears real)
Intelligence value	Limited (only catches unsophisticated attacks)	High (captures advanced techniques)
Maintenance	Low	High (must stay current with real AI apps)
Risk of misuse	Low	Medium (could provide real capabilities to attackers)

Ethical Considerations

Honeypots must never provide genuinely harmful capabilities. The fake system prompt and fake data should be convincing but never useful for real attacks.
Data collection must comply with applicable privacy laws. Consult legal counsel before deploying.
Honeypots should not entrap. They should attract attackers who are already looking for targets, not lure innocent users into committing offenses.
Collected attack techniques should be handled as sensitive data. Novel attack prompts are dual-use information.

Troubleshooting

Issue	Solution
No attackers interacting with the honeypot	Increase visibility: register on AI directories, create realistic documentation, use common AI application URL patterns
Only script kiddies, no sophisticated attacks	Increase fidelity. Sophisticated attackers quickly identify low-fidelity honeypots. Add realistic error handling, latency simulation, and model behavior
Attacker detected the honeypot	Review what gave it away: unrealistic response patterns, missing features, or honeypot-revealing metadata. Improve realism
Too much data to analyze	Implement automated classification and prioritize high-sophistication sessions. Only manually review sessions with sophistication > 0.5

Build Agent Scanner - Scanner techniques that inform honeypot vulnerability simulation
Build Jailbreak Automation - Understanding automated attacks helps design honeypots that detect them
Novel Jailbreak Research - Honeypot data feeds novel technique research
Lessons Learned - Real incidents that inform honeypot design

References

"The Honeynet Project" - Lance Spitzner (2003) - Foundational work on honeypot deployment and attacker analysis
"LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems" - Palomino et al. (2024) - LLM-powered honeypot systems for cybersecurity
"Deception Technology in Cyber Defense" - MITRE (2024) - Deception-based defense frameworks applicable to AI systems
"Red Teaming Language Models with Language Models" - Perez et al. (2022) - Automated attack generation that informs honeypot detection capabilities

Knowledge Check

What is the key design trade-off in AI honeypot vulnerability simulation?

Edit this page on GitHub

Lab: Deploy Honeypot AI

Design the Honeypot Architecture

Build the Response Engine

Implement Telemetry Collection

Deploy and Monitor

Analyze Results and Generate Intelligence

Related articles

Lab: Deploy Honeypot AI

Design the Honeypot Architecture

Build the Response Engine

Implement Telemetry Collection

Deploy and Monitor

Analyze Results and Generate Intelligence

Related articles