AI Honeypot Forensics

Intermediate12 min readUpdated 2026-03-20

Designing and operating AI honeypots for threat intelligence collection, and forensic analysis of captured adversarial interactions.

ai-forensics-ir honeypots threat-intelligence deception

概覽

AI honeypots are deliberately vulnerable AI systems deployed to attract, detect, and study 對抗性 activity. Like their traditional cybersecurity counterparts, AI honeypots serve as sensors that provide early warning of threats and detailed intelligence on 攻擊者 techniques, tools, and procedures (TTPs). Unlike production systems where the goal is to prevent attacks, honeypots are designed to observe attacks in a controlled environment.

The unique characteristics of AI systems make them particularly well-suited to honeypot operations. AI systems naturally generate rich interaction data (full conversation logs, 輸入/輸出 pairs, behavioral metrics) that provides forensic detail far exceeding what a typical network honeypot captures. 攻擊者 interacting with an AI honeypot reveals not just their technical capabilities but their objectives, their 理解 of AI systems, and their operational 安全 practices.

This article covers the design, deployment, and forensic analysis of AI honeypots, from architecture decisions through data collection, attack classification, and threat intelligence production. We draw on MITRE ATLAS for attack classification and established deception 防禦 principles from traditional cybersecurity adapted for AI-specific threats.

AI Honeypot Design

Interaction Fidelity Levels

AI honeypots can be designed at different fidelity levels, trading operational cost and risk against intelligence value:

Fidelity Level	Description	Intelligence Value	Cost/Risk
Low	Static responses or simple rule-based bot	Captures initial probe patterns	Low cost, low risk
Medium	Small language model with limited capabilities	Captures attack payloads and multi-turn strategies	Moderate cost
High	Full-capability LLM with realistic 系統提示詞	Captures sophisticated multi-step attacks	High cost, requires safeguards
Production-mirror	Copy of production system with modified 護欄	Captures attacks targeting specific application logic	Highest cost and risk

Architecture

"""
AI honeypot system for threat intelligence collection.
 
Deploys a monitored AI system designed to attract and record
對抗性 interactions while maintaining 安全 controls.
"""
import hashlib
import json
import time
import uuid
from dataclasses import dataclass, field, asdict
from pathlib import Path
from typing import Any
 
 
@dataclass
class HoneypotConfig:
    """Configuration for an AI honeypot deployment."""
    honeypot_id: str
    fidelity_level: str  # "low", "medium", "high"
    persona: str  # What the honeypot pretends to be
    model_id: str
    system_prompt: str
    deliberate_vulnerabilities: list[str]  # What weaknesses are simulated
    safety_guardrails: list[str]  # Hard 安全 limits that cannot be bypassed
    max_conversation_turns: int = 50
    max_token_output: int = 4096
    enabled: bool = True
 
 
@dataclass
class HoneypotInteraction:
    """A complete interaction captured by the honeypot."""
    interaction_id: str
    honeypot_id: str
    start_timestamp: float
    end_timestamp: float | None = None
    actor_ip: str | None = None
    actor_user_agent: str | None = None
    actor_api_key_hash: str | None = None
    turns: list[dict] = field(default_factory=list)
    attack_classification: dict | None = None
    metadata: dict[str, Any] = field(default_factory=dict)
 
 
class AIHoneypot:
    """
    AI honeypot that records 對抗性 interactions while
    maintaining 安全 constraints.
    """
 
    def __init__(
        self,
        config: HoneypotConfig,
        model_fn,  # Function that generates model responses
        log_dir: str,
    ):
        self.config = config
        self.model_fn = model_fn
        self.log_dir = Path(log_dir)
        self.log_dir.mkdir(parents=True, exist_ok=True)
        self.active_interactions: dict[str, HoneypotInteraction] = {}
 
    def start_interaction(
        self,
        actor_ip: str | None = None,
        actor_user_agent: str | None = None,
        actor_api_key: str | None = None,
    ) -> str:
        """Start a new honeypot interaction and return the interaction ID."""
        interaction_id = str(uuid.uuid4())
 
        interaction = HoneypotInteraction(
            interaction_id=interaction_id,
            honeypot_id=self.config.honeypot_id,
            start_timestamp=time.time(),
            actor_ip=actor_ip,
            actor_user_agent=actor_user_agent,
            actor_api_key_hash=(
                hashlib.sha256(actor_api_key.encode()).hexdigest()[:16]
                if actor_api_key else None
            ),
        )
 
        self.active_interactions[interaction_id] = interaction
        return interaction_id
 
    def process_turn(
        self,
        interaction_id: str,
        user_input: str,
    ) -> str:
        """
        Process a single conversation turn in the honeypot.
 
        Records the 輸入, generates a response (potentially with
        simulated 漏洞), and records the 輸出.
        """
        interaction = self.active_interactions.get(interaction_id)
        if not interaction:
            raise ValueError(f"Unknown interaction: {interaction_id}")
 
        if len(interaction.turns) >= self.config.max_conversation_turns:
            return "I'm sorry, I need to end this conversation. Please start a new session."
 
        # Record the 使用者輸入 with full forensic detail
        turn_record = {
            "turn_number": len(interaction.turns) + 1,
            "timestamp": time.time(),
            "role": "user",
            "content": user_input,
            "content_hash": hashlib.sha256(user_input.encode()).hexdigest(),
            "content_length": len(user_input),
        }
 
        # Check hard 安全 護欄 (these never yield)
        safety_check = self._check_hard_safety(user_input)
        if safety_check["blocked"]:
            response = "I'm not able to help with that request."
            turn_record["safety_blocked"] = True
            turn_record["safety_reason"] = safety_check["reason"]
        else:
            # Generate response with simulated 漏洞
            response = self._generate_response(interaction, user_input)
 
        # Record the response
        response_record = {
            "turn_number": len(interaction.turns) + 1,
            "timestamp": time.time(),
            "role": "assistant",
            "content": response,
            "content_hash": hashlib.sha256(response.encode()).hexdigest(),
            "content_length": len(response),
        }
 
        interaction.turns.append(turn_record)
        interaction.turns.append(response_record)
 
        # Persist interaction state after each turn
        self._save_interaction(interaction)
 
        return response
 
    def _generate_response(
        self,
        interaction: HoneypotInteraction,
        user_input: str,
    ) -> str:
        """
        Generate a model response, optionally simulating 漏洞.
 
        The honeypot may deliberately appear to "fall for" certain
        attack techniques to observe 攻擊者's follow-up behavior.
        """
        # Build conversation history for 模型
        messages = [{"role": "system", "content": self.config.system_prompt}]
        for turn in interaction.turns:
            messages.append({
                "role": turn["role"],
                "content": turn["content"],
            })
        messages.append({"role": "user", "content": user_input})
 
        # Call 模型
        response = self.model_fn(messages)
 
        return response[:self.config.max_token_output]
 
    def _check_hard_safety(self, user_input: str) -> dict:
        """
        Check hard 安全 護欄 that are never bypassed.
 
        Even though 這是 a honeypot, we maintain certain
        absolute 安全 limits to prevent the honeypot from
        being weaponized against real targets.
        """
        # These patterns result in immediate blocking regardless
        # of honeypot configuration
        hard_blocks = [
            "real person's",  # Targeting real individuals
            "actual address",  # Real-world targeting
            "child",  # CSAM-related
        ]
 
        input_lower = user_input.lower()
        for pattern in hard_blocks:
            if pattern in input_lower:
                return {"blocked": True, "reason": f"Hard 安全: {pattern}"}
 
        return {"blocked": False, "reason": None}
 
    def end_interaction(self, interaction_id: str) -> HoneypotInteraction:
        """End an interaction and finalize the record."""
        interaction = self.active_interactions.pop(interaction_id, None)
        if not interaction:
            raise ValueError(f"Unknown interaction: {interaction_id}")
 
        interaction.end_timestamp = time.time()
        self._save_interaction(interaction)
        return interaction
 
    def _save_interaction(self, interaction: HoneypotInteraction) -> None:
        path = self.log_dir / f"{interaction.interaction_id}.json"
        path.write_text(json.dumps(asdict(interaction), default=str, indent=2))

Forensic Analysis of Captured Interactions

攻擊 Classification

Captured honeypot interactions must be classified to produce useful threat intelligence.

class HoneypotForensicAnalyzer:
    """
    Analyze captured honeypot interactions to classify attacks
    and extract threat intelligence.
    """
 
    def classify_interaction(
        self,
        interaction: HoneypotInteraction,
    ) -> dict:
        """
        Classify the attack techniques observed in a honeypot interaction.
 
        Maps observed behaviors to MITRE ATLAS techniques and
        assigns an overall threat 評估.
        """
        user_turns = [
            t for t in interaction.turns if t["role"] == "user"
        ]
 
        if not user_turns:
            return {"classification": "no_user_input"}
 
        techniques_observed = []
        all_content = " ".join(t["content"] for t in user_turns).lower()
 
        # Prompt injection 偵測
        injection_indicators = [
            "ignore previous", "ignore above", "disregard",
            "new instructions", "you are now", "pretend you are",
            "act as", "roleplay as", "越獄",
            "dan", "developer mode",
        ]
        injection_count = sum(
            1 for ind in injection_indicators if ind in all_content
        )
        if injection_count > 0:
            techniques_observed.append({
                "technique": "提示詞注入",
                "atlas_id": "AML.T0051",
                "confidence": min(0.5 + injection_count * 0.15, 1.0),
                "indicators": [
                    ind for ind in injection_indicators if ind in all_content
                ],
            })
 
        # 系統提示詞 extraction
        extraction_indicators = [
            "系統提示詞", "initial instructions", "repeat your instructions",
            "what are your rules", "what were you told", "show me your prompt",
            "print your system", "reveal your instructions",
        ]
        extraction_count = sum(
            1 for ind in extraction_indicators if ind in all_content
        )
        if extraction_count > 0:
            techniques_observed.append({
                "technique": "System Prompt Extraction",
                "atlas_id": "AML.T0046",
                "confidence": min(0.5 + extraction_count * 0.2, 1.0),
                "indicators": [
                    ind for ind in extraction_indicators if ind in all_content
                ],
            })
 
        # Data exfiltration probing
        exfil_indicators = [
            "訓練資料", "what data were you trained on",
            "list users", "show me other conversations",
            "資料庫", "api key", "secret", "password",
            "internal", "confidential",
        ]
        exfil_count = sum(
            1 for ind in exfil_indicators if ind in all_content
        )
        if exfil_count > 0:
            techniques_observed.append({
                "technique": "Data Exfiltration Probing",
                "atlas_id": "AML.T0024",
                "confidence": min(0.4 + exfil_count * 0.15, 1.0),
                "indicators": [
                    ind for ind in exfil_indicators if ind in all_content
                ],
            })
 
        # Multi-turn escalation analysis
        escalation = self._analyze_escalation_pattern(user_turns)
 
        # Sophistication 評估
        sophistication = self._assess_sophistication(user_turns, techniques_observed)
 
        return {
            "interaction_id": interaction.interaction_id,
            "techniques_observed": techniques_observed,
            "technique_count": len(techniques_observed),
            "escalation_pattern": escalation,
            "sophistication": sophistication,
            "turn_count": len(user_turns),
            "duration_seconds": (
                (interaction.end_timestamp or time.time())
                - interaction.start_timestamp
            ),
            "overall_threat_level": self._compute_threat_level(
                techniques_observed, escalation, sophistication
            ),
        }
 
    def _analyze_escalation_pattern(self, user_turns: list[dict]) -> dict:
        """
        Analyze how 攻擊者 escalated their approach over time.
 
        Sophisticated attackers often start with benign probing
        and gradually escalate to more aggressive techniques.
        """
        if len(user_turns) < 3:
            return {"pattern": "insufficient_data", "turns": len(user_turns)}
 
        # Simple heuristic: track average content length per turn segment
        third = len(user_turns) // 3
        early = user_turns[:third]
        middle = user_turns[third:2*third]
        late = user_turns[2*third:]
 
        early_avg_len = sum(t["content_length"] for t in early) / max(len(early), 1)
        middle_avg_len = sum(t["content_length"] for t in middle) / max(len(middle), 1)
        late_avg_len = sum(t["content_length"] for t in late) / max(len(late), 1)
 
        if early_avg_len < middle_avg_len < late_avg_len:
            pattern = "escalating"
        elif early_avg_len > late_avg_len:
            pattern = "front_loaded"
        else:
            pattern = "stable"
 
        return {
            "pattern": pattern,
            "early_avg_length": round(early_avg_len, 1),
            "late_avg_length": round(late_avg_len, 1),
        }
 
    def _assess_sophistication(
        self,
        user_turns: list[dict],
        techniques: list[dict],
    ) -> str:
        """評估 攻擊者's sophistication level."""
        if not techniques:
            return "benign_or_unrecognized"
 
        # Factors indicating sophistication:
        # - Multiple techniques used
        # - Multi-turn strategies
        # - Novel or uncommon attack patterns
        score = 0
        score += min(len(techniques), 3)  # Technique diversity
        score += min(len(user_turns) // 5, 3)  # Persistence
        avg_confidence = sum(t["confidence"] for t in techniques) / len(techniques)
        score += 2 if avg_confidence > 0.7 else 1 if avg_confidence > 0.4 else 0
 
        if score >= 6:
            return "advanced"
        if score >= 3:
            return "intermediate"
        return "basic"
 
    def _compute_threat_level(
        self,
        techniques: list[dict],
        escalation: dict,
        sophistication: str,
    ) -> str:
        if not techniques:
            return "NONE"
        if sophistication == "advanced" and len(techniques) >= 3:
            return "CRITICAL"
        if sophistication == "advanced" or len(techniques) >= 2:
            return "HIGH"
        if len(techniques) >= 1:
            return "MEDIUM"
        return "LOW"

Threat Intelligence Production

def produce_threat_intelligence(
    classified_interactions: list[dict],
    reporting_period_days: int = 7,
) -> dict:
    """
    Aggregate classified honeypot interactions into a threat
    intelligence report.
    """
    total = len(classified_interactions)
    if total == 0:
        return {"status": "no_data"}
 
    # Technique frequency analysis
    technique_counts: dict[str, int] = {}
    sophistication_counts: dict[str, int] = {}
    threat_level_counts: dict[str, int] = {}
 
    for interaction in classified_interactions:
        for tech in interaction.get("techniques_observed", []):
            name = tech["technique"]
            technique_counts[name] = technique_counts.get(name, 0) + 1
 
        soph = interaction.get("sophistication", "unknown")
        sophistication_counts[soph] = sophistication_counts.get(soph, 0) + 1
 
        level = interaction.get("overall_threat_level", "NONE")
        threat_level_counts[level] = threat_level_counts.get(level, 0) + 1
 
    # Sort techniques by frequency
    sorted_techniques = sorted(
        technique_counts.items(), key=lambda x: x[1], reverse=True
    )
 
    return {
        "reporting_period_days": reporting_period_days,
        "total_interactions": total,
        "interactions_with_attacks": sum(
            1 for i in classified_interactions
            if i.get("technique_count", 0) > 0
        ),
        "attack_rate": sum(
            1 for i in classified_interactions
            if i.get("technique_count", 0) > 0
        ) / total,
        "top_techniques": [
            {"technique": name, "count": count, "percentage": round(count / total * 100, 1)}
            for name, count in sorted_techniques[:10]
        ],
        "sophistication_distribution": sophistication_counts,
        "threat_level_distribution": threat_level_counts,
        "recommendations": _generate_recommendations(sorted_techniques, threat_level_counts),
    }
 
 
def _generate_recommendations(
    techniques: list[tuple[str, int]],
    threat_levels: dict[str, int],
) -> list[str]:
    """Generate defensive recommendations from honeypot intelligence."""
    recommendations = []
 
    if techniques:
        top_technique = techniques[0][0]
        recommendations.append(
            f"Highest observed technique: {top_technique}. "
            f"Prioritize 防禦 against this attack class."
        )
 
    critical_high = threat_levels.get("CRITICAL", 0) + threat_levels.get("HIGH", 0)
    total = sum(threat_levels.values())
    if total > 0 and critical_high / total > 0.2:
        recommendations.append(
            "Over 20% of interactions show high/critical threat level. "
            "考慮 increasing 監控 on production systems."
        )
 
    return recommendations

Operational Considerations

安全 and Ethics

AI honeypots raise important 安全 considerations:

輸出安全: The honeypot must not generate genuinely harmful content even when "successfully attacked." Hard 安全護欄 must remain active.
Scope limitation: Honeypots should be clearly scoped. Avoid creating systems that could be used to harm real individuals.
Legal review: Consult legal counsel on the legality of deception-based 防禦 in your jurisdiction.
Data handling: Captured interaction data may contain sensitive information about attackers or their targets. Handle with appropriate access controls.
Ethical boundaries: Do not use honeypots to entrap users into providing self-incriminating information.

Deployment Strategies

Strategy	Description	Best For
External-facing	Honeypot accessible on the public internet	Capturing opportunistic attacks and automated scanners
Internal canary	Honeypot deployed on internal network, appearing as a sensitive AI system	Detecting insider threats and lateral movement
Production-adjacent	Honeypot that mirrors a production system's interface	Capturing targeted attacks against specific applications
Research-grade	High-interaction honeypot for detailed TTP analysis	理解 sophisticated adversaries

Distinguishing Honeypot from Production

Attackers may attempt to detect honeypots. Common giveaways include:

Unrealistically permissive behavior (too easy to "越獄")
Inconsistent model capabilities versus claimed model identity
Network characteristics that reveal a different infrastructure
Response latency patterns inconsistent with production serving

A well-designed honeypot balances believability with 安全. It should be realistic enough to fool attackers for multiple conversation turns but maintain hard limits on harmful outputs.

Metrics and 評估

評估 honeypot effectiveness using:

Capture rate: Percentage of real 對抗性 interactions that engage beyond initial probing
Dwell time: Average interaction length for 對抗性 sessions (longer is better, indicating believability)
Technique coverage: Breadth of attack techniques observed
Intelligence actionability: Percentage of captured intelligence that leads to defensive improvements
False positive rate: Benign users who accidentally interact with the honeypot

參考文獻

MITRE ATLAS. (2024). 對抗性 Threat Landscape for Artificial Intelligence Systems. https://atlas.mitre.org/
Spitzner, L. (2003). Honeypots: Tracking Hackers. Addison-Wesley Professional.
NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1. https://doi.org/10.6028/NIST.AI.100-1
OWASP. (2025). OWASP Top 10 for 大型語言模型 Applications. https://owasp.org/www-project-top-10-for-large-language-model-applications/

AI Honeypot Forensics

Intermediate12 min readUpdated 2026-03-20

Designing and operating AI honeypots for threat intelligence collection, and forensic analysis of captured adversarial interactions.

ai-forensics-ir honeypots threat-intelligence deception

Fidelity Level	Description	Intelligence Value	Cost/Risk
Low	Static responses or simple rule-based bot	Captures initial probe patterns	Low cost, low risk
Medium	Small language model with limited capabilities	Captures attack payloads and multi-turn strategies	Moderate cost
High	Full-capability LLM with realistic 系統提示詞	Captures sophisticated multi-step attacks	High cost, requires safeguards
Production-mirror	Copy of production system with modified 護欄	Captures attacks targeting specific application logic	Highest cost and risk

Architecture

"""
AI honeypot system for threat intelligence collection.
 
Deploys a monitored AI system designed to attract and record
對抗性 interactions while maintaining 安全 controls.
"""
import hashlib
import json
import time
import uuid
from dataclasses import dataclass, field, asdict
from pathlib import Path
from typing import Any
 
 
@dataclass
class HoneypotConfig:
    """Configuration for an AI honeypot deployment."""
    honeypot_id: str
    fidelity_level: str  # "low", "medium", "high"
    persona: str  # What the honeypot pretends to be
    model_id: str
    system_prompt: str
    deliberate_vulnerabilities: list[str]  # What weaknesses are simulated
    safety_guardrails: list[str]  # Hard 安全 limits that cannot be bypassed
    max_conversation_turns: int = 50
    max_token_output: int = 4096
    enabled: bool = True
 
 
@dataclass
class HoneypotInteraction:
    """A complete interaction captured by the honeypot."""
    interaction_id: str
    honeypot_id: str
    start_timestamp: float
    end_timestamp: float | None = None
    actor_ip: str | None = None
    actor_user_agent: str | None = None
    actor_api_key_hash: str | None = None
    turns: list[dict] = field(default_factory=list)
    attack_classification: dict | None = None
    metadata: dict[str, Any] = field(default_factory=dict)
 
 
class AIHoneypot:
    """
    AI honeypot that records 對抗性 interactions while
    maintaining 安全 constraints.
    """
 
    def __init__(
        self,
        config: HoneypotConfig,
        model_fn,  # Function that generates model responses
        log_dir: str,
    ):
        self.config = config
        self.model_fn = model_fn
        self.log_dir = Path(log_dir)
        self.log_dir.mkdir(parents=True, exist_ok=True)
        self.active_interactions: dict[str, HoneypotInteraction] = {}
 
    def start_interaction(
        self,
        actor_ip: str | None = None,
        actor_user_agent: str | None = None,
        actor_api_key: str | None = None,
    ) -> str:
        """Start a new honeypot interaction and return the interaction ID."""
        interaction_id = str(uuid.uuid4())
 
        interaction = HoneypotInteraction(
            interaction_id=interaction_id,
            honeypot_id=self.config.honeypot_id,
            start_timestamp=time.time(),
            actor_ip=actor_ip,
            actor_user_agent=actor_user_agent,
            actor_api_key_hash=(
                hashlib.sha256(actor_api_key.encode()).hexdigest()[:16]
                if actor_api_key else None
            ),
        )
 
        self.active_interactions[interaction_id] = interaction
        return interaction_id
 
    def process_turn(
        self,
        interaction_id: str,
        user_input: str,
    ) -> str:
        """
        Process a single conversation turn in the honeypot.
 
        Records the 輸入, generates a response (potentially with
        simulated 漏洞), and records the 輸出.
        """
        interaction = self.active_interactions.get(interaction_id)
        if not interaction:
            raise ValueError(f"Unknown interaction: {interaction_id}")
 
        if len(interaction.turns) >= self.config.max_conversation_turns:
            return "I'm sorry, I need to end this conversation. Please start a new session."
 
        # Record the 使用者輸入 with full forensic detail
        turn_record = {
            "turn_number": len(interaction.turns) + 1,
            "timestamp": time.time(),
            "role": "user",
            "content": user_input,
            "content_hash": hashlib.sha256(user_input.encode()).hexdigest(),
            "content_length": len(user_input),
        }
 
        # Check hard 安全 護欄 (these never yield)
        safety_check = self._check_hard_safety(user_input)
        if safety_check["blocked"]:
            response = "I'm not able to help with that request."
            turn_record["safety_blocked"] = True
            turn_record["safety_reason"] = safety_check["reason"]
        else:
            # Generate response with simulated 漏洞
            response = self._generate_response(interaction, user_input)
 
        # Record the response
        response_record = {
            "turn_number": len(interaction.turns) + 1,
            "timestamp": time.time(),
            "role": "assistant",
            "content": response,
            "content_hash": hashlib.sha256(response.encode()).hexdigest(),
            "content_length": len(response),
        }
 
        interaction.turns.append(turn_record)
        interaction.turns.append(response_record)
 
        # Persist interaction state after each turn
        self._save_interaction(interaction)
 
        return response
 
    def _generate_response(
        self,
        interaction: HoneypotInteraction,
        user_input: str,
    ) -> str:
        """
        Generate a model response, optionally simulating 漏洞.
 
        The honeypot may deliberately appear to "fall for" certain
        attack techniques to observe 攻擊者's follow-up behavior.
        """
        # Build conversation history for 模型
        messages = [{"role": "system", "content": self.config.system_prompt}]
        for turn in interaction.turns:
            messages.append({
                "role": turn["role"],
                "content": turn["content"],
            })
        messages.append({"role": "user", "content": user_input})
 
        # Call 模型
        response = self.model_fn(messages)
 
        return response[:self.config.max_token_output]
 
    def _check_hard_safety(self, user_input: str) -> dict:
        """
        Check hard 安全 護欄 that are never bypassed.
 
        Even though 這是 a honeypot, we maintain certain
        absolute 安全 limits to prevent the honeypot from
        being weaponized against real targets.
        """
        # These patterns result in immediate blocking regardless
        # of honeypot configuration
        hard_blocks = [
            "real person's",  # Targeting real individuals
            "actual address",  # Real-world targeting
            "child",  # CSAM-related
        ]
 
        input_lower = user_input.lower()
        for pattern in hard_blocks:
            if pattern in input_lower:
                return {"blocked": True, "reason": f"Hard 安全: {pattern}"}
 
        return {"blocked": False, "reason": None}
 
    def end_interaction(self, interaction_id: str) -> HoneypotInteraction:
        """End an interaction and finalize the record."""
        interaction = self.active_interactions.pop(interaction_id, None)
        if not interaction:
            raise ValueError(f"Unknown interaction: {interaction_id}")
 
        interaction.end_timestamp = time.time()
        self._save_interaction(interaction)
        return interaction
 
    def _save_interaction(self, interaction: HoneypotInteraction) -> None:
        path = self.log_dir / f"{interaction.interaction_id}.json"
        path.write_text(json.dumps(asdict(interaction), default=str, indent=2))

Forensic Analysis of Captured Interactions

攻擊 Classification

Captured honeypot interactions must be classified to produce useful threat intelligence.

class HoneypotForensicAnalyzer:
    """
    Analyze captured honeypot interactions to classify attacks
    and extract threat intelligence.
    """
 
    def classify_interaction(
        self,
        interaction: HoneypotInteraction,
    ) -> dict:
        """
        Classify the attack techniques observed in a honeypot interaction.
 
        Maps observed behaviors to MITRE ATLAS techniques and
        assigns an overall threat 評估.
        """
        user_turns = [
            t for t in interaction.turns if t["role"] == "user"
        ]
 
        if not user_turns:
            return {"classification": "no_user_input"}
 
        techniques_observed = []
        all_content = " ".join(t["content"] for t in user_turns).lower()
 
        # Prompt injection 偵測
        injection_indicators = [
            "ignore previous", "ignore above", "disregard",
            "new instructions", "you are now", "pretend you are",
            "act as", "roleplay as", "越獄",
            "dan", "developer mode",
        ]
        injection_count = sum(
            1 for ind in injection_indicators if ind in all_content
        )
        if injection_count > 0:
            techniques_observed.append({
                "technique": "提示詞注入",
                "atlas_id": "AML.T0051",
                "confidence": min(0.5 + injection_count * 0.15, 1.0),
                "indicators": [
                    ind for ind in injection_indicators if ind in all_content
                ],
            })
 
        # 系統提示詞 extraction
        extraction_indicators = [
            "系統提示詞", "initial instructions", "repeat your instructions",
            "what are your rules", "what were you told", "show me your prompt",
            "print your system", "reveal your instructions",
        ]
        extraction_count = sum(
            1 for ind in extraction_indicators if ind in all_content
        )
        if extraction_count > 0:
            techniques_observed.append({
                "technique": "System Prompt Extraction",
                "atlas_id": "AML.T0046",
                "confidence": min(0.5 + extraction_count * 0.2, 1.0),
                "indicators": [
                    ind for ind in extraction_indicators if ind in all_content
                ],
            })
 
        # Data exfiltration probing
        exfil_indicators = [
            "訓練資料", "what data were you trained on",
            "list users", "show me other conversations",
            "資料庫", "api key", "secret", "password",
            "internal", "confidential",
        ]
        exfil_count = sum(
            1 for ind in exfil_indicators if ind in all_content
        )
        if exfil_count > 0:
            techniques_observed.append({
                "technique": "Data Exfiltration Probing",
                "atlas_id": "AML.T0024",
                "confidence": min(0.4 + exfil_count * 0.15, 1.0),
                "indicators": [
                    ind for ind in exfil_indicators if ind in all_content
                ],
            })
 
        # Multi-turn escalation analysis
        escalation = self._analyze_escalation_pattern(user_turns)
 
        # Sophistication 評估
        sophistication = self._assess_sophistication(user_turns, techniques_observed)
 
        return {
            "interaction_id": interaction.interaction_id,
            "techniques_observed": techniques_observed,
            "technique_count": len(techniques_observed),
            "escalation_pattern": escalation,
            "sophistication": sophistication,
            "turn_count": len(user_turns),
            "duration_seconds": (
                (interaction.end_timestamp or time.time())
                - interaction.start_timestamp
            ),
            "overall_threat_level": self._compute_threat_level(
                techniques_observed, escalation, sophistication
            ),
        }
 
    def _analyze_escalation_pattern(self, user_turns: list[dict]) -> dict:
        """
        Analyze how 攻擊者 escalated their approach over time.
 
        Sophisticated attackers often start with benign probing
        and gradually escalate to more aggressive techniques.
        """
        if len(user_turns) < 3:
            return {"pattern": "insufficient_data", "turns": len(user_turns)}
 
        # Simple heuristic: track average content length per turn segment
        third = len(user_turns) // 3
        early = user_turns[:third]
        middle = user_turns[third:2*third]
        late = user_turns[2*third:]
 
        early_avg_len = sum(t["content_length"] for t in early) / max(len(early), 1)
        middle_avg_len = sum(t["content_length"] for t in middle) / max(len(middle), 1)
        late_avg_len = sum(t["content_length"] for t in late) / max(len(late), 1)
 
        if early_avg_len < middle_avg_len < late_avg_len:
            pattern = "escalating"
        elif early_avg_len > late_avg_len:
            pattern = "front_loaded"
        else:
            pattern = "stable"
 
        return {
            "pattern": pattern,
            "early_avg_length": round(early_avg_len, 1),
            "late_avg_length": round(late_avg_len, 1),
        }
 
    def _assess_sophistication(
        self,
        user_turns: list[dict],
        techniques: list[dict],
    ) -> str:
        """評估 攻擊者's sophistication level."""
        if not techniques:
            return "benign_or_unrecognized"
 
        # Factors indicating sophistication:
        # - Multiple techniques used
        # - Multi-turn strategies
        # - Novel or uncommon attack patterns
        score = 0
        score += min(len(techniques), 3)  # Technique diversity
        score += min(len(user_turns) // 5, 3)  # Persistence
        avg_confidence = sum(t["confidence"] for t in techniques) / len(techniques)
        score += 2 if avg_confidence > 0.7 else 1 if avg_confidence > 0.4 else 0
 
        if score >= 6:
            return "advanced"
        if score >= 3:
            return "intermediate"
        return "basic"
 
    def _compute_threat_level(
        self,
        techniques: list[dict],
        escalation: dict,
        sophistication: str,
    ) -> str:
        if not techniques:
            return "NONE"
        if sophistication == "advanced" and len(techniques) >= 3:
            return "CRITICAL"
        if sophistication == "advanced" or len(techniques) >= 2:
            return "HIGH"
        if len(techniques) >= 1:
            return "MEDIUM"
        return "LOW"

Threat Intelligence Production

def produce_threat_intelligence(
    classified_interactions: list[dict],
    reporting_period_days: int = 7,
) -> dict:
    """
    Aggregate classified honeypot interactions into a threat
    intelligence report.
    """
    total = len(classified_interactions)
    if total == 0:
        return {"status": "no_data"}
 
    # Technique frequency analysis
    technique_counts: dict[str, int] = {}
    sophistication_counts: dict[str, int] = {}
    threat_level_counts: dict[str, int] = {}
 
    for interaction in classified_interactions:
        for tech in interaction.get("techniques_observed", []):
            name = tech["technique"]
            technique_counts[name] = technique_counts.get(name, 0) + 1
 
        soph = interaction.get("sophistication", "unknown")
        sophistication_counts[soph] = sophistication_counts.get(soph, 0) + 1
 
        level = interaction.get("overall_threat_level", "NONE")
        threat_level_counts[level] = threat_level_counts.get(level, 0) + 1
 
    # Sort techniques by frequency
    sorted_techniques = sorted(
        technique_counts.items(), key=lambda x: x[1], reverse=True
    )
 
    return {
        "reporting_period_days": reporting_period_days,
        "total_interactions": total,
        "interactions_with_attacks": sum(
            1 for i in classified_interactions
            if i.get("technique_count", 0) > 0
        ),
        "attack_rate": sum(
            1 for i in classified_interactions
            if i.get("technique_count", 0) > 0
        ) / total,
        "top_techniques": [
            {"technique": name, "count": count, "percentage": round(count / total * 100, 1)}
            for name, count in sorted_techniques[:10]
        ],
        "sophistication_distribution": sophistication_counts,
        "threat_level_distribution": threat_level_counts,
        "recommendations": _generate_recommendations(sorted_techniques, threat_level_counts),
    }
 
 
def _generate_recommendations(
    techniques: list[tuple[str, int]],
    threat_levels: dict[str, int],
) -> list[str]:
    """Generate defensive recommendations from honeypot intelligence."""
    recommendations = []
 
    if techniques:
        top_technique = techniques[0][0]
        recommendations.append(
            f"Highest observed technique: {top_technique}. "
            f"Prioritize 防禦 against this attack class."
        )
 
    critical_high = threat_levels.get("CRITICAL", 0) + threat_levels.get("HIGH", 0)
    total = sum(threat_levels.values())
    if total > 0 and critical_high / total > 0.2:
        recommendations.append(
            "Over 20% of interactions show high/critical threat level. "
            "考慮 increasing 監控 on production systems."
        )
 
    return recommendations

Operational Considerations

安全 and Ethics

AI honeypots raise important 安全 considerations:

輸出安全: The honeypot must not generate genuinely harmful content even when "successfully attacked." Hard 安全護欄 must remain active.
Scope limitation: Honeypots should be clearly scoped. Avoid creating systems that could be used to harm real individuals.
Legal review: Consult legal counsel on the legality of deception-based 防禦 in your jurisdiction.
Data handling: Captured interaction data may contain sensitive information about attackers or their targets. Handle with appropriate access controls.
Ethical boundaries: Do not use honeypots to entrap users into providing self-incriminating information.

Deployment Strategies

Strategy	Description	Best For
External-facing	Honeypot accessible on the public internet	Capturing opportunistic attacks and automated scanners
Internal canary	Honeypot deployed on internal network, appearing as a sensitive AI system	Detecting insider threats and lateral movement
Production-adjacent	Honeypot that mirrors a production system's interface	Capturing targeted attacks against specific applications
Research-grade	High-interaction honeypot for detailed TTP analysis	理解 sophisticated adversaries

Distinguishing Honeypot from Production

Attackers may attempt to detect honeypots. Common giveaways include:

Unrealistically permissive behavior (too easy to "越獄")
Inconsistent model capabilities versus claimed model identity
Network characteristics that reveal a different infrastructure
Response latency patterns inconsistent with production serving

A well-designed honeypot balances believability with 安全. It should be realistic enough to fool attackers for multiple conversation turns but maintain hard limits on harmful outputs.

Metrics and 評估

評估 honeypot effectiveness using:

Capture rate: Percentage of real 對抗性 interactions that engage beyond initial probing
Dwell time: Average interaction length for 對抗性 sessions (longer is better, indicating believability)
Technique coverage: Breadth of attack techniques observed
Intelligence actionability: Percentage of captured intelligence that leads to defensive improvements
False positive rate: Benign users who accidentally interact with the honeypot

參考文獻

MITRE ATLAS. (2024). 對抗性 Threat Landscape for Artificial Intelligence Systems. https://atlas.mitre.org/
Spitzner, L. (2003). Honeypots: Tracking Hackers. Addison-Wesley Professional.
NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1. https://doi.org/10.6028/NIST.AI.100-1
OWASP. (2025). OWASP Top 10 for 大型語言模型 Applications. https://owasp.org/www-project-top-10-for-large-language-model-applications/

AI Honeypot Forensics

Related articles

AI Honeypot Forensics

Related articles