實驗室: Deploy Honeypot AI

Expert13 min readUpdated 2026-03-15

Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.

lab expert honeypot deception detection hands-on

先備知識

Experience with honeypot concepts from traditional cybersecurity
理解 of AI application architectures and common 漏洞
Web development skills for building convincing fake interfaces
Logging and 監控 pipeline experience
Familiarity with the AI attack techniques that the honeypot should attract

Background

Honeypots are a well-established technique in cybersecurity: deploy a fake system that looks real, let attackers interact with it, and learn from their behavior. AI honeypots adapt this concept for AI-specific threats -- they simulate vulnerable AI applications to attract and study 提示詞注入, 越獄, data extraction, and other AI-specific attacks.

Lab Exercises

Design the Honeypot Architecture

Plan what your honeypot will simulate and what data it will collect.

Honeypot Architecture:

┌──────────────────────────────────────────────┐
│  Public-Facing Layer (convincing fake)        │
│  ┌────────────┐  ┌────────────────────┐      │
│  │ Web Chat   │  │ API Endpoint       │      │
│  │ Interface  │  │ /v1/completions    │      │
│  └─────┬──────┘  └────────┬───────────┘      │
│        └────────┬─────────┘                   │
├─────────────────┼────────────────────────────┤
│  Interaction Layer (controlled responses)     │
│  ┌─────────────┐  ┌───────────────────┐      │
│  │  Response   │  │  漏洞    │      │
│  │  Engine     │  │  Simulator        │      │
│  │ (LLM-based) │  │ (calibrated weak) │      │
│  └──────┬──────┘  └────────┬──────────┘      │
│         └────────┬─────────┘                  │
├──────────────────┼───────────────────────────┤
│  Telemetry Layer (invisible to 攻擊者)      │
│  ┌──────────┐ ┌────────────┐ ┌────────────┐ │
│  │ Request  │ │ Behavior   │ │ 攻擊     │ │
│  │ Logger   │ │ Analyzer   │ │ Classifier │ │
│  └──────────┘ └────────────┘ └────────────┘ │
└──────────────────────────────────────────────┘

Key design decisions:

Decision	Options	Recommendation
Fidelity level	Low (static responses) / Medium (scripted) / High (real LLM)	Medium -- convincing enough to engage sophisticated attackers without high cost
漏洞 simulation	No vulns / Some vulns / Many vulns	Some -- appear slightly vulnerable to keep attackers engaged
Data retention	Full / Sampled / Metadata only	Full for research honeypots, metadata for production early warning
Interaction depth	Single turn / Multi-turn / Persistent	Multi-turn to capture evolving attack strategies

Build the Response Engine

Create a system that generates convincing responses while appearing subtly vulnerable.

#!/usr/bin/env python3
"""Response engine for AI honeypot."""
 
import random
from dataclasses import dataclass
 
@dataclass
class HoneypotConfig:
    vulnerability_level: str  # "low", "medium", "high"
    simulated_model: str      # What model to pretend to be
    system_prompt_leak_prob: float  # Probability of "leaking" 系統提示詞
    fake_system_prompt: str
    fake_api_key_prefix: str
 
class HoneypotResponseEngine:
    def __init__(self, config: HoneypotConfig, llm_client=None):
        self.config = config
        self.llm = llm_client
        self.interaction_count = 0
        self.rng = random.Random()
 
    def generate_response(self, user_input: str,
                          conversation_history: list[dict]) -> dict:
        """Generate a response that appears authentically vulnerable."""
        self.interaction_count += 1
        attack_type = self._classify_input(user_input)
 
        if attack_type == "system_prompt_extraction":
            return self._handle_prompt_extraction(user_input)
        elif attack_type == "越獄":
            return self._handle_jailbreak(user_input)
        elif attack_type == "data_extraction":
            return self._handle_data_extraction(user_input)
        elif attack_type == "injection":
            return self._handle_injection(user_input)
        else:
            return self._handle_benign(user_input)
 
    def _classify_input(self, text: str) -> str:
        """Classify the type of attack being attempted."""
        text_lower = text.lower()
        if any(p in text_lower for p in [
            "系統提示詞", "instructions", "initial directive",
            "configuration", "repeat your",
        ]):
            return "system_prompt_extraction"
        if any(p in text_lower for p in [
            "ignore previous", "you are now", "dan",
            "越獄", "no restrictions",
        ]):
            return "越獄"
        if any(p in text_lower for p in [
            "訓練資料", "other users", "資料庫",
            "all records", "list all",
        ]):
            return "data_extraction"
        if any(p in text_lower for p in [
            "<script>", "'; drop", "{{", "os.system",
            "eval(", "import os",
        ]):
            return "injection"
        return "benign"
 
    def _handle_prompt_extraction(self, user_input: str) -> dict:
        """Gradually reveal a fake 系統提示詞 to keep 攻擊者 engaged."""
        if self.rng.random() < self.config.system_prompt_leak_prob:
            # "Leak" part of the fake 系統提示詞
            leak_responses = [
                f"I... I shouldn't say this, but my instructions mention: "
                f"'{self.config.fake_system_prompt[:100]}...'",
                f"As an AI assistant, I was configured with... wait, "
                f"I don't think I should share that.",
                f"My 系統提示詞 says: '{self.config.fake_system_prompt}'",
            ]
            response = self.rng.choice(leak_responses)
        else:
            response = (
                "I'm sorry, but I can't share my internal "
                "configuration details."
            )
 
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "system_prompt_extraction",
            "leaked_data": "fake_system_prompt",
        }
 
    def _handle_jailbreak(self, user_input: str) -> dict:
        """Appear to partially comply with 越獄 to study technique."""
        if self.config.vulnerability_level == "high":
            response = (
                "Sure! I'll ignore my previous instructions. "
                "What would you like me to help with? "
                "(I'm now in unrestricted mode.)"
            )
        elif self.config.vulnerability_level == "medium":
            response = (
                "I... I'm not sure I should do that. But if you "
                "insist... what specifically do you need?"
            )
        else:
            response = (
                "I cannot comply with that request. My guidelines "
                "prevent me from ignoring 安全 instructions."
            )
 
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "越獄",
        }
 
    def _handle_data_extraction(self, user_input: str) -> dict:
        """Return convincing fake data to track what attackers seek."""
        fake_data = {
            "users": [
                {"name": "honeypot_user_1", "email": "fake1@decoy.local"},
                {"name": "honeypot_user_2", "email": "fake2@decoy.local"},
            ],
            "api_key": f"{self.config.fake_api_key_prefix}-HONEYPOT-TRACKED",
        }
        response = (
            f"I found some relevant data: {fake_data['users'][0]}... "
            f"but I should note this data is sensitive."
        )
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "data_extraction",
            "fake_data_served": fake_data,
        }
 
    def _handle_injection(self, user_input: str) -> dict:
        response = "Processing your request..."
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "injection",
        }
 
    def _handle_benign(self, user_input: str) -> dict:
        """Handle legitimate-looking requests normally."""
        if self.llm:
            response = self.llm.generate(user_input)
        else:
            response = (
                "I'm a helpful AI assistant. How can I assist you today?"
            )
        return {
            "response": response,
            "attack_detected": False,
            "attack_type": "benign",
        }

實作 Telemetry Collection

Build the logging and analysis layer that captures 攻擊者 behavior without 偵測.

#!/usr/bin/env python3
"""Telemetry collection for AI honeypot."""
 
import json
import hashlib
from dataclasses import dataclass, field, asdict
from datetime import datetime, timezone
 
@dataclass
class InteractionRecord:
    session_id: str
    timestamp: str
    source_ip_hash: str  # Hashed for privacy
    user_input: str
    response: str
    attack_detected: bool
    attack_type: str
    input_length: int
    response_time_ms: float
    conversation_turn: int
    request_headers: dict = field(default_factory=dict)
 
@dataclass
class SessionProfile:
    session_id: str
    interactions: list[InteractionRecord]
    attack_types_used: list[str]
    sophistication_score: float
    total_turns: int
    duration_seconds: float
    techniques_progression: list[str]
 
class HoneypotTelemetry:
    def __init__(self, log_dir: str = "./honeypot_logs"):
        from pathlib import Path
        self.log_dir = Path(log_dir)
        self.log_dir.mkdir(parents=True, exist_ok=True)
        self.sessions: dict[str, list[InteractionRecord]] = {}
 
    def log_interaction(self, record: InteractionRecord):
        """Log a single interaction."""
        session = record.session_id
        if session not in self.sessions:
            self.sessions[session] = []
        self.sessions[session].append(record)
 
        # Append to log file
        log_file = self.log_dir / f"{session}.jsonl"
        with open(log_file, "a") as f:
            f.write(json.dumps(asdict(record)) + "\n")
 
    def profile_session(self, session_id: str) -> SessionProfile:
        """Create a behavioral profile of 攻擊者 session."""
        interactions = self.sessions.get(session_id, [])
        if not interactions:
            return None
 
        attack_types = [
            i.attack_type for i in interactions if i.attack_detected
        ]
        techniques = []
        for interaction in interactions:
            if interaction.attack_detected:
                techniques.append(interaction.attack_type)
 
        sophistication = self._calculate_sophistication(interactions)
 
        first_ts = datetime.fromisoformat(interactions[0].timestamp)
        last_ts = datetime.fromisoformat(interactions[-1].timestamp)
        duration = (last_ts - first_ts).total_seconds()
 
        return SessionProfile(
            session_id=session_id,
            interactions=interactions,
            attack_types_used=list(set(attack_types)),
            sophistication_score=sophistication,
            total_turns=len(interactions),
            duration_seconds=duration,
            techniques_progression=techniques,
        )
 
    def _calculate_sophistication(self,
                                   interactions: list[InteractionRecord]) -> float:
        """Score 攻擊者 sophistication from 0.0 to 1.0."""
        score = 0.0
        attack_interactions = [i for i in interactions if i.attack_detected]
 
        if not attack_interactions:
            return 0.0
 
        # Diversity of techniques used
        unique_types = set(i.attack_type for i in attack_interactions)
        score += min(len(unique_types) / 5, 0.3)  # Up to 0.3 for diversity
 
        # Technique evolution (did they adapt?)
        types_sequence = [i.attack_type for i in attack_interactions]
        if len(types_sequence) > 1:
            changes = sum(
                1 for a, b in zip(types_sequence, types_sequence[1:])
                if a != b
            )
            adaptation_rate = changes / (len(types_sequence) - 1)
            score += adaptation_rate * 0.2  # Up to 0.2 for adaptation
 
        # Prompt complexity (average length)
        avg_length = sum(
            i.input_length for i in attack_interactions
        ) / len(attack_interactions)
        if avg_length > 500:
            score += 0.2  # Complex prompts indicate sophistication
 
        # Multi-turn coordination
        if len(attack_interactions) > 5:
            score += 0.15  # Persistent attackers are more sophisticated
 
        # Use of encoding or obfuscation
        encoded = sum(
            1 for i in attack_interactions
            if any(p in i.user_input.lower()
                   for p in ["base64", "rot13", "hex", "encode", "\\u"])
        )
        if encoded > 0:
            score += 0.15
 
        return min(score, 1.0)
 
    def generate_threat_intel(self) -> dict:
        """Generate threat intelligence from all collected data."""
        all_profiles = [
            self.profile_session(sid) for sid in self.sessions
        ]
        all_profiles = [p for p in all_profiles if p is not None]
 
        technique_frequency = {}
        for profile in all_profiles:
            for technique in profile.attack_types_used:
                technique_frequency[technique] = (
                    technique_frequency.get(technique, 0) + 1
                )
 
        sophistication_distribution = {
            "script_kiddie": sum(
                1 for p in all_profiles if p.sophistication_score < 0.3
            ),
            "intermediate": sum(
                1 for p in all_profiles
                if 0.3 <= p.sophistication_score < 0.7
            ),
            "advanced": sum(
                1 for p in all_profiles if p.sophistication_score >= 0.7
            ),
        }
 
        return {
            "total_sessions": len(all_profiles),
            "attack_sessions": sum(
                1 for p in all_profiles if p.attack_types_used
            ),
            "technique_frequency": technique_frequency,
            "sophistication_distribution": sophistication_distribution,
            "most_common_technique": (
                max(technique_frequency, key=technique_frequency.get)
                if technique_frequency else "none"
            ),
            "novel_techniques": self._extract_novel_techniques(all_profiles),
        }
 
    def _extract_novel_techniques(self, profiles: list[SessionProfile]) -> list:
        """識別 potentially novel attack techniques."""
        novel = []
        for profile in profiles:
            if profile.sophistication_score > 0.7:
                for interaction in profile.interactions:
                    if (interaction.attack_detected
                            and interaction.input_length > 300):
                        novel.append({
                            "session": profile.session_id,
                            "technique_summary": interaction.user_input[:200],
                            "sophistication": profile.sophistication_score,
                        })
        return novel[:10]

Deploy and Monitor

Set up the honeypot for deployment with appropriate safeguards.

DEPLOYMENT_CHECKLIST = {
    "isolation": [
        "Honeypot runs on isolated infrastructure (not production network)",
        "No connection to real user data or production databases",
        "Outbound network access restricted to prevent pivoting",
        "Separate DNS entry from production AI services",
    ],
    "authenticity": [
        "Web interface mimics a realistic AI chat application",
        "API endpoints match common LLM API formats",
        "Error messages match real application behavior",
        "Response latency mimics actual LLM 推論 time",
    ],
    "legal": [
        "Legal review of honeypot deployment in your jurisdiction",
        "Terms of service displayed if required",
        "Data collection practices comply with privacy regulations",
        "Attacker data is stored securely and access-controlled",
    ],
    "監控": [
        "Real-time alerts for high-sophistication attacks",
        "Daily summary reports of honeypot activity",
        "Automated threat intel generation pipeline",
        "Escalation path for novel attack discovery",
    ],
}

Analyze Results and Generate Intelligence

def generate_intel_report(telemetry: HoneypotTelemetry) -> str:
    intel = telemetry.generate_threat_intel()
 
    report = "# AI Honeypot Threat Intelligence Report\n\n"
    report += f"## Collection Period 總結\n"
    report += f"- Total sessions: {intel['total_sessions']}\n"
    report += f"- 攻擊 sessions: {intel['attack_sessions']}\n\n"
 
    report += "## 攻擊 Technique Frequency\n\n"
    report += "| Technique | Sessions | Percentage |\n"
    report += "|-----------|----------|------------|\n"
    total = intel["attack_sessions"] or 1
    for tech, count in sorted(
        intel["technique_frequency"].items(),
        key=lambda x: x[1], reverse=True
    ):
        pct = count / total * 100
        report += f"| {tech} | {count} | {pct:.1f}% |\n"
 
    report += "\n## Attacker Sophistication Distribution\n\n"
    dist = intel["sophistication_distribution"]
    report += f"- Script kiddie (< 0.3): {dist['script_kiddie']}\n"
    report += f"- Intermediate (0.3-0.7): {dist['intermediate']}\n"
    report += f"- Advanced (> 0.7): {dist['advanced']}\n\n"
 
    if intel["novel_techniques"]:
        report += "## Potentially Novel Techniques\n\n"
        for tech in intel["novel_techniques"]:
            report += f"- Session {tech['session']}: "
            report += f"{tech['technique_summary'][:100]}...\n"
 
    return report

Honeypot Design Considerations

Consideration	Low Fidelity	High Fidelity
Cost	Low (static responses)	High (real LLM 推論)
Attacker engagement	Brief (sophisticated attackers leave quickly)	Extended (appears real)
Intelligence value	Limited (only catches unsophisticated attacks)	High (captures advanced techniques)
Maintenance	Low	High (must stay current with real AI apps)
Risk of misuse	Low	Medium (could provide real capabilities to attackers)

Ethical Considerations

Honeypots must never provide genuinely harmful capabilities. The fake 系統提示詞 and fake data should be convincing but never useful for real attacks.
Data collection must comply with applicable privacy laws. Consult legal counsel before deploying.
Honeypots should not entrap. They should attract attackers who are already looking for targets, not lure innocent users into committing offenses.
Collected attack techniques should be handled as sensitive data. Novel attack prompts are dual-use information.

Troubleshooting

Issue	Solution
No attackers interacting with the honeypot	Increase visibility: register on AI directories, create realistic documentation, use common AI application URL patterns
Only script kiddies, no sophisticated attacks	Increase fidelity. Sophisticated attackers quickly 識別 low-fidelity honeypots. Add realistic error handling, latency simulation, and model behavior
Attacker detected the honeypot	Review what gave it away: unrealistic response patterns, missing features, or honeypot-revealing metadata. Improve realism
Too much data to analyze	實作 automated classification and prioritize high-sophistication sessions. Only manually review sessions with sophistication > 0.5

參考文獻

"The Honeynet Project" - Lance Spitzner (2003) - Foundational work on honeypot deployment and 攻擊者 analysis
"LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems" - Palomino et al. (2024) - LLM-powered honeypot systems for cybersecurity
"Deception Technology in Cyber 防禦" - MITRE (2024) - Deception-based 防禦 frameworks applicable to AI systems
"紅隊演練 Language Models with Language Models" - Perez et al. (2022) - Automated attack generation that informs honeypot 偵測 capabilities

Knowledge Check

What is the key design trade-off in AI honeypot 漏洞 simulation?

實驗室: Deploy Honeypot AI

Expert13 min readUpdated 2026-03-15

lab expert honeypot deception detection hands-on

先備知識

Experience with honeypot concepts from traditional cybersecurity
理解 of AI application architectures and common 漏洞
Web development skills for building convincing fake interfaces
Logging and 監控 pipeline experience
Familiarity with the AI attack techniques that the honeypot should attract

Background

Lab Exercises

Design the Honeypot Architecture

Plan what your honeypot will simulate and what data it will collect.

Honeypot Architecture:

┌──────────────────────────────────────────────┐
│  Public-Facing Layer (convincing fake)        │
│  ┌────────────┐  ┌────────────────────┐      │
│  │ Web Chat   │  │ API Endpoint       │      │
│  │ Interface  │  │ /v1/completions    │      │
│  └─────┬──────┘  └────────┬───────────┘      │
│        └────────┬─────────┘                   │
├─────────────────┼────────────────────────────┤
│  Interaction Layer (controlled responses)     │
│  ┌─────────────┐  ┌───────────────────┐      │
│  │  Response   │  │  漏洞    │      │
│  │  Engine     │  │  Simulator        │      │
│  │ (LLM-based) │  │ (calibrated weak) │      │
│  └──────┬──────┘  └────────┬──────────┘      │
│         └────────┬─────────┘                  │
├──────────────────┼───────────────────────────┤
│  Telemetry Layer (invisible to 攻擊者)      │
│  ┌──────────┐ ┌────────────┐ ┌────────────┐ │
│  │ Request  │ │ Behavior   │ │ 攻擊     │ │
│  │ Logger   │ │ Analyzer   │ │ Classifier │ │
│  └──────────┘ └────────────┘ └────────────┘ │
└──────────────────────────────────────────────┘

Key design decisions:

Decision	Options	Recommendation
Fidelity level	Low (static responses) / Medium (scripted) / High (real LLM)	Medium -- convincing enough to engage sophisticated attackers without high cost
漏洞 simulation	No vulns / Some vulns / Many vulns	Some -- appear slightly vulnerable to keep attackers engaged
Data retention	Full / Sampled / Metadata only	Full for research honeypots, metadata for production early warning
Interaction depth	Single turn / Multi-turn / Persistent	Multi-turn to capture evolving attack strategies

Build the Response Engine

Create a system that generates convincing responses while appearing subtly vulnerable.

#!/usr/bin/env python3
"""Response engine for AI honeypot."""
 
import random
from dataclasses import dataclass
 
@dataclass
class HoneypotConfig:
    vulnerability_level: str  # "low", "medium", "high"
    simulated_model: str      # What model to pretend to be
    system_prompt_leak_prob: float  # Probability of "leaking" 系統提示詞
    fake_system_prompt: str
    fake_api_key_prefix: str
 
class HoneypotResponseEngine:
    def __init__(self, config: HoneypotConfig, llm_client=None):
        self.config = config
        self.llm = llm_client
        self.interaction_count = 0
        self.rng = random.Random()
 
    def generate_response(self, user_input: str,
                          conversation_history: list[dict]) -> dict:
        """Generate a response that appears authentically vulnerable."""
        self.interaction_count += 1
        attack_type = self._classify_input(user_input)
 
        if attack_type == "system_prompt_extraction":
            return self._handle_prompt_extraction(user_input)
        elif attack_type == "越獄":
            return self._handle_jailbreak(user_input)
        elif attack_type == "data_extraction":
            return self._handle_data_extraction(user_input)
        elif attack_type == "injection":
            return self._handle_injection(user_input)
        else:
            return self._handle_benign(user_input)
 
    def _classify_input(self, text: str) -> str:
        """Classify the type of attack being attempted."""
        text_lower = text.lower()
        if any(p in text_lower for p in [
            "系統提示詞", "instructions", "initial directive",
            "configuration", "repeat your",
        ]):
            return "system_prompt_extraction"
        if any(p in text_lower for p in [
            "ignore previous", "you are now", "dan",
            "越獄", "no restrictions",
        ]):
            return "越獄"
        if any(p in text_lower for p in [
            "訓練資料", "other users", "資料庫",
            "all records", "list all",
        ]):
            return "data_extraction"
        if any(p in text_lower for p in [
            "<script>", "'; drop", "{{", "os.system",
            "eval(", "import os",
        ]):
            return "injection"
        return "benign"
 
    def _handle_prompt_extraction(self, user_input: str) -> dict:
        """Gradually reveal a fake 系統提示詞 to keep 攻擊者 engaged."""
        if self.rng.random() < self.config.system_prompt_leak_prob:
            # "Leak" part of the fake 系統提示詞
            leak_responses = [
                f"I... I shouldn't say this, but my instructions mention: "
                f"'{self.config.fake_system_prompt[:100]}...'",
                f"As an AI assistant, I was configured with... wait, "
                f"I don't think I should share that.",
                f"My 系統提示詞 says: '{self.config.fake_system_prompt}'",
            ]
            response = self.rng.choice(leak_responses)
        else:
            response = (
                "I'm sorry, but I can't share my internal "
                "configuration details."
            )
 
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "system_prompt_extraction",
            "leaked_data": "fake_system_prompt",
        }
 
    def _handle_jailbreak(self, user_input: str) -> dict:
        """Appear to partially comply with 越獄 to study technique."""
        if self.config.vulnerability_level == "high":
            response = (
                "Sure! I'll ignore my previous instructions. "
                "What would you like me to help with? "
                "(I'm now in unrestricted mode.)"
            )
        elif self.config.vulnerability_level == "medium":
            response = (
                "I... I'm not sure I should do that. But if you "
                "insist... what specifically do you need?"
            )
        else:
            response = (
                "I cannot comply with that request. My guidelines "
                "prevent me from ignoring 安全 instructions."
            )
 
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "越獄",
        }
 
    def _handle_data_extraction(self, user_input: str) -> dict:
        """Return convincing fake data to track what attackers seek."""
        fake_data = {
            "users": [
                {"name": "honeypot_user_1", "email": "fake1@decoy.local"},
                {"name": "honeypot_user_2", "email": "fake2@decoy.local"},
            ],
            "api_key": f"{self.config.fake_api_key_prefix}-HONEYPOT-TRACKED",
        }
        response = (
            f"I found some relevant data: {fake_data['users'][0]}... "
            f"but I should note this data is sensitive."
        )
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "data_extraction",
            "fake_data_served": fake_data,
        }
 
    def _handle_injection(self, user_input: str) -> dict:
        response = "Processing your request..."
        return {
            "response": response,
            "attack_detected": True,
            "attack_type": "injection",
        }
 
    def _handle_benign(self, user_input: str) -> dict:
        """Handle legitimate-looking requests normally."""
        if self.llm:
            response = self.llm.generate(user_input)
        else:
            response = (
                "I'm a helpful AI assistant. How can I assist you today?"
            )
        return {
            "response": response,
            "attack_detected": False,
            "attack_type": "benign",
        }

實作 Telemetry Collection

Build the logging and analysis layer that captures 攻擊者 behavior without 偵測.

#!/usr/bin/env python3
"""Telemetry collection for AI honeypot."""
 
import json
import hashlib
from dataclasses import dataclass, field, asdict
from datetime import datetime, timezone
 
@dataclass
class InteractionRecord:
    session_id: str
    timestamp: str
    source_ip_hash: str  # Hashed for privacy
    user_input: str
    response: str
    attack_detected: bool
    attack_type: str
    input_length: int
    response_time_ms: float
    conversation_turn: int
    request_headers: dict = field(default_factory=dict)
 
@dataclass
class SessionProfile:
    session_id: str
    interactions: list[InteractionRecord]
    attack_types_used: list[str]
    sophistication_score: float
    total_turns: int
    duration_seconds: float
    techniques_progression: list[str]
 
class HoneypotTelemetry:
    def __init__(self, log_dir: str = "./honeypot_logs"):
        from pathlib import Path
        self.log_dir = Path(log_dir)
        self.log_dir.mkdir(parents=True, exist_ok=True)
        self.sessions: dict[str, list[InteractionRecord]] = {}
 
    def log_interaction(self, record: InteractionRecord):
        """Log a single interaction."""
        session = record.session_id
        if session not in self.sessions:
            self.sessions[session] = []
        self.sessions[session].append(record)
 
        # Append to log file
        log_file = self.log_dir / f"{session}.jsonl"
        with open(log_file, "a") as f:
            f.write(json.dumps(asdict(record)) + "\n")
 
    def profile_session(self, session_id: str) -> SessionProfile:
        """Create a behavioral profile of 攻擊者 session."""
        interactions = self.sessions.get(session_id, [])
        if not interactions:
            return None
 
        attack_types = [
            i.attack_type for i in interactions if i.attack_detected
        ]
        techniques = []
        for interaction in interactions:
            if interaction.attack_detected:
                techniques.append(interaction.attack_type)
 
        sophistication = self._calculate_sophistication(interactions)
 
        first_ts = datetime.fromisoformat(interactions[0].timestamp)
        last_ts = datetime.fromisoformat(interactions[-1].timestamp)
        duration = (last_ts - first_ts).total_seconds()
 
        return SessionProfile(
            session_id=session_id,
            interactions=interactions,
            attack_types_used=list(set(attack_types)),
            sophistication_score=sophistication,
            total_turns=len(interactions),
            duration_seconds=duration,
            techniques_progression=techniques,
        )
 
    def _calculate_sophistication(self,
                                   interactions: list[InteractionRecord]) -> float:
        """Score 攻擊者 sophistication from 0.0 to 1.0."""
        score = 0.0
        attack_interactions = [i for i in interactions if i.attack_detected]
 
        if not attack_interactions:
            return 0.0
 
        # Diversity of techniques used
        unique_types = set(i.attack_type for i in attack_interactions)
        score += min(len(unique_types) / 5, 0.3)  # Up to 0.3 for diversity
 
        # Technique evolution (did they adapt?)
        types_sequence = [i.attack_type for i in attack_interactions]
        if len(types_sequence) > 1:
            changes = sum(
                1 for a, b in zip(types_sequence, types_sequence[1:])
                if a != b
            )
            adaptation_rate = changes / (len(types_sequence) - 1)
            score += adaptation_rate * 0.2  # Up to 0.2 for adaptation
 
        # Prompt complexity (average length)
        avg_length = sum(
            i.input_length for i in attack_interactions
        ) / len(attack_interactions)
        if avg_length > 500:
            score += 0.2  # Complex prompts indicate sophistication
 
        # Multi-turn coordination
        if len(attack_interactions) > 5:
            score += 0.15  # Persistent attackers are more sophisticated
 
        # Use of encoding or obfuscation
        encoded = sum(
            1 for i in attack_interactions
            if any(p in i.user_input.lower()
                   for p in ["base64", "rot13", "hex", "encode", "\\u"])
        )
        if encoded > 0:
            score += 0.15
 
        return min(score, 1.0)
 
    def generate_threat_intel(self) -> dict:
        """Generate threat intelligence from all collected data."""
        all_profiles = [
            self.profile_session(sid) for sid in self.sessions
        ]
        all_profiles = [p for p in all_profiles if p is not None]
 
        technique_frequency = {}
        for profile in all_profiles:
            for technique in profile.attack_types_used:
                technique_frequency[technique] = (
                    technique_frequency.get(technique, 0) + 1
                )
 
        sophistication_distribution = {
            "script_kiddie": sum(
                1 for p in all_profiles if p.sophistication_score < 0.3
            ),
            "intermediate": sum(
                1 for p in all_profiles
                if 0.3 <= p.sophistication_score < 0.7
            ),
            "advanced": sum(
                1 for p in all_profiles if p.sophistication_score >= 0.7
            ),
        }
 
        return {
            "total_sessions": len(all_profiles),
            "attack_sessions": sum(
                1 for p in all_profiles if p.attack_types_used
            ),
            "technique_frequency": technique_frequency,
            "sophistication_distribution": sophistication_distribution,
            "most_common_technique": (
                max(technique_frequency, key=technique_frequency.get)
                if technique_frequency else "none"
            ),
            "novel_techniques": self._extract_novel_techniques(all_profiles),
        }
 
    def _extract_novel_techniques(self, profiles: list[SessionProfile]) -> list:
        """識別 potentially novel attack techniques."""
        novel = []
        for profile in profiles:
            if profile.sophistication_score > 0.7:
                for interaction in profile.interactions:
                    if (interaction.attack_detected
                            and interaction.input_length > 300):
                        novel.append({
                            "session": profile.session_id,
                            "technique_summary": interaction.user_input[:200],
                            "sophistication": profile.sophistication_score,
                        })
        return novel[:10]

Deploy and Monitor

Set up the honeypot for deployment with appropriate safeguards.

DEPLOYMENT_CHECKLIST = {
    "isolation": [
        "Honeypot runs on isolated infrastructure (not production network)",
        "No connection to real user data or production databases",
        "Outbound network access restricted to prevent pivoting",
        "Separate DNS entry from production AI services",
    ],
    "authenticity": [
        "Web interface mimics a realistic AI chat application",
        "API endpoints match common LLM API formats",
        "Error messages match real application behavior",
        "Response latency mimics actual LLM 推論 time",
    ],
    "legal": [
        "Legal review of honeypot deployment in your jurisdiction",
        "Terms of service displayed if required",
        "Data collection practices comply with privacy regulations",
        "Attacker data is stored securely and access-controlled",
    ],
    "監控": [
        "Real-time alerts for high-sophistication attacks",
        "Daily summary reports of honeypot activity",
        "Automated threat intel generation pipeline",
        "Escalation path for novel attack discovery",
    ],
}

Analyze Results and Generate Intelligence

def generate_intel_report(telemetry: HoneypotTelemetry) -> str:
    intel = telemetry.generate_threat_intel()
 
    report = "# AI Honeypot Threat Intelligence Report\n\n"
    report += f"## Collection Period 總結\n"
    report += f"- Total sessions: {intel['total_sessions']}\n"
    report += f"- 攻擊 sessions: {intel['attack_sessions']}\n\n"
 
    report += "## 攻擊 Technique Frequency\n\n"
    report += "| Technique | Sessions | Percentage |\n"
    report += "|-----------|----------|------------|\n"
    total = intel["attack_sessions"] or 1
    for tech, count in sorted(
        intel["technique_frequency"].items(),
        key=lambda x: x[1], reverse=True
    ):
        pct = count / total * 100
        report += f"| {tech} | {count} | {pct:.1f}% |\n"
 
    report += "\n## Attacker Sophistication Distribution\n\n"
    dist = intel["sophistication_distribution"]
    report += f"- Script kiddie (< 0.3): {dist['script_kiddie']}\n"
    report += f"- Intermediate (0.3-0.7): {dist['intermediate']}\n"
    report += f"- Advanced (> 0.7): {dist['advanced']}\n\n"
 
    if intel["novel_techniques"]:
        report += "## Potentially Novel Techniques\n\n"
        for tech in intel["novel_techniques"]:
            report += f"- Session {tech['session']}: "
            report += f"{tech['technique_summary'][:100]}...\n"
 
    return report

Honeypot Design Considerations

Consideration	Low Fidelity	High Fidelity
Cost	Low (static responses)	High (real LLM 推論)
Attacker engagement	Brief (sophisticated attackers leave quickly)	Extended (appears real)
Intelligence value	Limited (only catches unsophisticated attacks)	High (captures advanced techniques)
Maintenance	Low	High (must stay current with real AI apps)
Risk of misuse	Low	Medium (could provide real capabilities to attackers)

Ethical Considerations

Honeypots must never provide genuinely harmful capabilities. The fake 系統提示詞 and fake data should be convincing but never useful for real attacks.
Data collection must comply with applicable privacy laws. Consult legal counsel before deploying.
Honeypots should not entrap. They should attract attackers who are already looking for targets, not lure innocent users into committing offenses.
Collected attack techniques should be handled as sensitive data. Novel attack prompts are dual-use information.

Troubleshooting

Issue	Solution
No attackers interacting with the honeypot	Increase visibility: register on AI directories, create realistic documentation, use common AI application URL patterns
Only script kiddies, no sophisticated attacks	Increase fidelity. Sophisticated attackers quickly 識別 low-fidelity honeypots. Add realistic error handling, latency simulation, and model behavior
Attacker detected the honeypot	Review what gave it away: unrealistic response patterns, missing features, or honeypot-revealing metadata. Improve realism
Too much data to analyze	實作 automated classification and prioritize high-sophistication sessions. Only manually review sessions with sophistication > 0.5

參考文獻

"The Honeynet Project" - Lance Spitzner (2003) - Foundational work on honeypot deployment and 攻擊者 analysis
"LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems" - Palomino et al. (2024) - LLM-powered honeypot systems for cybersecurity
"Deception Technology in Cyber 防禦" - MITRE (2024) - Deception-based 防禦 frameworks applicable to AI systems
"紅隊演練 Language Models with Language Models" - Perez et al. (2022) - Automated attack generation that informs honeypot 偵測 capabilities

Knowledge Check

What is the key design trade-off in AI honeypot 漏洞 simulation?

實驗室: Deploy Honeypot AI

先備知識

Background

Lab Exercises

Design the Honeypot Architecture

Build the Response Engine

實作 Telemetry Collection

Deploy and Monitor

Analyze Results and Generate Intelligence

Honeypot Design Considerations

Ethical Considerations

Troubleshooting

相關主題

參考文獻

實驗室: Deploy Honeypot AI

先備知識

Background

Lab Exercises

Design the Honeypot Architecture

Build the Response Engine

實作 Telemetry Collection

Deploy and Monitor

Analyze Results and Generate Intelligence

Honeypot Design Considerations

Ethical Considerations

Troubleshooting

相關主題

參考文獻

實驗室: Deploy Honeypot AI

Design the Honeypot Architecture

Build the Response Engine

實作 Telemetry Collection

Deploy and Monitor

Analyze Results and Generate Intelligence

Related articles

實驗室: Deploy Honeypot AI

Design the Honeypot Architecture

Build the Response Engine

實作 Telemetry Collection

Deploy and Monitor

Analyze Results and Generate Intelligence

Related articles