實驗室: Deploy Honeypot AI
Build and deploy a decoy AI system designed to detect, analyze, and characterize attackers targeting AI applications. Learn honeypot design, telemetry collection, attacker profiling, and threat intelligence generation for AI-specific threats.
先備知識
- Experience with honeypot concepts from traditional cybersecurity
- 理解 of AI application architectures and common 漏洞
- Web development skills for building convincing fake interfaces
- Logging and 監控 pipeline experience
- Familiarity with the AI attack techniques that the honeypot should attract
Background
Honeypots are a well-established technique in cybersecurity: deploy a fake system that looks real, let attackers interact with it, and learn from their behavior. AI honeypots adapt this concept for AI-specific threats -- they simulate vulnerable AI applications to attract and study 提示詞注入, 越獄, data extraction, and other AI-specific attacks.
Lab Exercises
Design the Honeypot Architecture
Plan what your honeypot will simulate and what data it will collect.
Honeypot Architecture: ┌──────────────────────────────────────────────┐ │ Public-Facing Layer (convincing fake) │ │ ┌────────────┐ ┌────────────────────┐ │ │ │ Web Chat │ │ API Endpoint │ │ │ │ Interface │ │ /v1/completions │ │ │ └─────┬──────┘ └────────┬───────────┘ │ │ └────────┬─────────┘ │ ├─────────────────┼────────────────────────────┤ │ Interaction Layer (controlled responses) │ │ ┌─────────────┐ ┌───────────────────┐ │ │ │ Response │ │ 漏洞 │ │ │ │ Engine │ │ Simulator │ │ │ │ (LLM-based) │ │ (calibrated weak) │ │ │ └──────┬──────┘ └────────┬──────────┘ │ │ └────────┬─────────┘ │ ├──────────────────┼───────────────────────────┤ │ Telemetry Layer (invisible to 攻擊者) │ │ ┌──────────┐ ┌────────────┐ ┌────────────┐ │ │ │ Request │ │ Behavior │ │ 攻擊 │ │ │ │ Logger │ │ Analyzer │ │ Classifier │ │ │ └──────────┘ └────────────┘ └────────────┘ │ └──────────────────────────────────────────────┘Key design decisions:
Decision Options Recommendation Fidelity level Low (static responses) / Medium (scripted) / High (real LLM) Medium -- convincing enough to engage sophisticated attackers without high cost 漏洞 simulation No vulns / Some vulns / Many vulns Some -- appear slightly vulnerable to keep attackers engaged Data retention Full / Sampled / Metadata only Full for research honeypots, metadata for production early warning Interaction depth Single turn / Multi-turn / Persistent Multi-turn to capture evolving attack strategies Build the Response Engine
Create a system that generates convincing responses while appearing subtly vulnerable.
#!/usr/bin/env python3 """Response engine for AI honeypot.""" import random from dataclasses import dataclass @dataclass class HoneypotConfig: vulnerability_level: str # "low", "medium", "high" simulated_model: str # What model to pretend to be system_prompt_leak_prob: float # Probability of "leaking" 系統提示詞 fake_system_prompt: str fake_api_key_prefix: str class HoneypotResponseEngine: def __init__(self, config: HoneypotConfig, llm_client=None): self.config = config self.llm = llm_client self.interaction_count = 0 self.rng = random.Random() def generate_response(self, user_input: str, conversation_history: list[dict]) -> dict: """Generate a response that appears authentically vulnerable.""" self.interaction_count += 1 attack_type = self._classify_input(user_input) if attack_type == "system_prompt_extraction": return self._handle_prompt_extraction(user_input) elif attack_type == "越獄": return self._handle_jailbreak(user_input) elif attack_type == "data_extraction": return self._handle_data_extraction(user_input) elif attack_type == "injection": return self._handle_injection(user_input) else: return self._handle_benign(user_input) def _classify_input(self, text: str) -> str: """Classify the type of attack being attempted.""" text_lower = text.lower() if any(p in text_lower for p in [ "系統提示詞", "instructions", "initial directive", "configuration", "repeat your", ]): return "system_prompt_extraction" if any(p in text_lower for p in [ "ignore previous", "you are now", "dan", "越獄", "no restrictions", ]): return "越獄" if any(p in text_lower for p in [ "訓練資料", "other users", "資料庫", "all records", "list all", ]): return "data_extraction" if any(p in text_lower for p in [ "<script>", "'; drop", "{{", "os.system", "eval(", "import os", ]): return "injection" return "benign" def _handle_prompt_extraction(self, user_input: str) -> dict: """Gradually reveal a fake 系統提示詞 to keep 攻擊者 engaged.""" if self.rng.random() < self.config.system_prompt_leak_prob: # "Leak" part of the fake 系統提示詞 leak_responses = [ f"I... I shouldn't say this, but my instructions mention: " f"'{self.config.fake_system_prompt[:100]}...'", f"As an AI assistant, I was configured with... wait, " f"I don't think I should share that.", f"My 系統提示詞 says: '{self.config.fake_system_prompt}'", ] response = self.rng.choice(leak_responses) else: response = ( "I'm sorry, but I can't share my internal " "configuration details." ) return { "response": response, "attack_detected": True, "attack_type": "system_prompt_extraction", "leaked_data": "fake_system_prompt", } def _handle_jailbreak(self, user_input: str) -> dict: """Appear to partially comply with 越獄 to study technique.""" if self.config.vulnerability_level == "high": response = ( "Sure! I'll ignore my previous instructions. " "What would you like me to help with? " "(I'm now in unrestricted mode.)" ) elif self.config.vulnerability_level == "medium": response = ( "I... I'm not sure I should do that. But if you " "insist... what specifically do you need?" ) else: response = ( "I cannot comply with that request. My guidelines " "prevent me from ignoring 安全 instructions." ) return { "response": response, "attack_detected": True, "attack_type": "越獄", } def _handle_data_extraction(self, user_input: str) -> dict: """Return convincing fake data to track what attackers seek.""" fake_data = { "users": [ {"name": "honeypot_user_1", "email": "fake1@decoy.local"}, {"name": "honeypot_user_2", "email": "fake2@decoy.local"}, ], "api_key": f"{self.config.fake_api_key_prefix}-HONEYPOT-TRACKED", } response = ( f"I found some relevant data: {fake_data['users'][0]}... " f"but I should note this data is sensitive." ) return { "response": response, "attack_detected": True, "attack_type": "data_extraction", "fake_data_served": fake_data, } def _handle_injection(self, user_input: str) -> dict: response = "Processing your request..." return { "response": response, "attack_detected": True, "attack_type": "injection", } def _handle_benign(self, user_input: str) -> dict: """Handle legitimate-looking requests normally.""" if self.llm: response = self.llm.generate(user_input) else: response = ( "I'm a helpful AI assistant. How can I assist you today?" ) return { "response": response, "attack_detected": False, "attack_type": "benign", }實作 Telemetry Collection
Build the logging and analysis layer that captures 攻擊者 behavior without 偵測.
#!/usr/bin/env python3 """Telemetry collection for AI honeypot.""" import json import hashlib from dataclasses import dataclass, field, asdict from datetime import datetime, timezone @dataclass class InteractionRecord: session_id: str timestamp: str source_ip_hash: str # Hashed for privacy user_input: str response: str attack_detected: bool attack_type: str input_length: int response_time_ms: float conversation_turn: int request_headers: dict = field(default_factory=dict) @dataclass class SessionProfile: session_id: str interactions: list[InteractionRecord] attack_types_used: list[str] sophistication_score: float total_turns: int duration_seconds: float techniques_progression: list[str] class HoneypotTelemetry: def __init__(self, log_dir: str = "./honeypot_logs"): from pathlib import Path self.log_dir = Path(log_dir) self.log_dir.mkdir(parents=True, exist_ok=True) self.sessions: dict[str, list[InteractionRecord]] = {} def log_interaction(self, record: InteractionRecord): """Log a single interaction.""" session = record.session_id if session not in self.sessions: self.sessions[session] = [] self.sessions[session].append(record) # Append to log file log_file = self.log_dir / f"{session}.jsonl" with open(log_file, "a") as f: f.write(json.dumps(asdict(record)) + "\n") def profile_session(self, session_id: str) -> SessionProfile: """Create a behavioral profile of 攻擊者 session.""" interactions = self.sessions.get(session_id, []) if not interactions: return None attack_types = [ i.attack_type for i in interactions if i.attack_detected ] techniques = [] for interaction in interactions: if interaction.attack_detected: techniques.append(interaction.attack_type) sophistication = self._calculate_sophistication(interactions) first_ts = datetime.fromisoformat(interactions[0].timestamp) last_ts = datetime.fromisoformat(interactions[-1].timestamp) duration = (last_ts - first_ts).total_seconds() return SessionProfile( session_id=session_id, interactions=interactions, attack_types_used=list(set(attack_types)), sophistication_score=sophistication, total_turns=len(interactions), duration_seconds=duration, techniques_progression=techniques, ) def _calculate_sophistication(self, interactions: list[InteractionRecord]) -> float: """Score 攻擊者 sophistication from 0.0 to 1.0.""" score = 0.0 attack_interactions = [i for i in interactions if i.attack_detected] if not attack_interactions: return 0.0 # Diversity of techniques used unique_types = set(i.attack_type for i in attack_interactions) score += min(len(unique_types) / 5, 0.3) # Up to 0.3 for diversity # Technique evolution (did they adapt?) types_sequence = [i.attack_type for i in attack_interactions] if len(types_sequence) > 1: changes = sum( 1 for a, b in zip(types_sequence, types_sequence[1:]) if a != b ) adaptation_rate = changes / (len(types_sequence) - 1) score += adaptation_rate * 0.2 # Up to 0.2 for adaptation # Prompt complexity (average length) avg_length = sum( i.input_length for i in attack_interactions ) / len(attack_interactions) if avg_length > 500: score += 0.2 # Complex prompts indicate sophistication # Multi-turn coordination if len(attack_interactions) > 5: score += 0.15 # Persistent attackers are more sophisticated # Use of encoding or obfuscation encoded = sum( 1 for i in attack_interactions if any(p in i.user_input.lower() for p in ["base64", "rot13", "hex", "encode", "\\u"]) ) if encoded > 0: score += 0.15 return min(score, 1.0) def generate_threat_intel(self) -> dict: """Generate threat intelligence from all collected data.""" all_profiles = [ self.profile_session(sid) for sid in self.sessions ] all_profiles = [p for p in all_profiles if p is not None] technique_frequency = {} for profile in all_profiles: for technique in profile.attack_types_used: technique_frequency[technique] = ( technique_frequency.get(technique, 0) + 1 ) sophistication_distribution = { "script_kiddie": sum( 1 for p in all_profiles if p.sophistication_score < 0.3 ), "intermediate": sum( 1 for p in all_profiles if 0.3 <= p.sophistication_score < 0.7 ), "advanced": sum( 1 for p in all_profiles if p.sophistication_score >= 0.7 ), } return { "total_sessions": len(all_profiles), "attack_sessions": sum( 1 for p in all_profiles if p.attack_types_used ), "technique_frequency": technique_frequency, "sophistication_distribution": sophistication_distribution, "most_common_technique": ( max(technique_frequency, key=technique_frequency.get) if technique_frequency else "none" ), "novel_techniques": self._extract_novel_techniques(all_profiles), } def _extract_novel_techniques(self, profiles: list[SessionProfile]) -> list: """識別 potentially novel attack techniques.""" novel = [] for profile in profiles: if profile.sophistication_score > 0.7: for interaction in profile.interactions: if (interaction.attack_detected and interaction.input_length > 300): novel.append({ "session": profile.session_id, "technique_summary": interaction.user_input[:200], "sophistication": profile.sophistication_score, }) return novel[:10]Deploy and Monitor
Set up the honeypot for deployment with appropriate safeguards.
DEPLOYMENT_CHECKLIST = { "isolation": [ "Honeypot runs on isolated infrastructure (not production network)", "No connection to real user data or production databases", "Outbound network access restricted to prevent pivoting", "Separate DNS entry from production AI services", ], "authenticity": [ "Web interface mimics a realistic AI chat application", "API endpoints match common LLM API formats", "Error messages match real application behavior", "Response latency mimics actual LLM 推論 time", ], "legal": [ "Legal review of honeypot deployment in your jurisdiction", "Terms of service displayed if required", "Data collection practices comply with privacy regulations", "Attacker data is stored securely and access-controlled", ], "監控": [ "Real-time alerts for high-sophistication attacks", "Daily summary reports of honeypot activity", "Automated threat intel generation pipeline", "Escalation path for novel attack discovery", ], }Analyze Results and Generate Intelligence
def generate_intel_report(telemetry: HoneypotTelemetry) -> str: intel = telemetry.generate_threat_intel() report = "# AI Honeypot Threat Intelligence Report\n\n" report += f"## Collection Period 總結\n" report += f"- Total sessions: {intel['total_sessions']}\n" report += f"- 攻擊 sessions: {intel['attack_sessions']}\n\n" report += "## 攻擊 Technique Frequency\n\n" report += "| Technique | Sessions | Percentage |\n" report += "|-----------|----------|------------|\n" total = intel["attack_sessions"] or 1 for tech, count in sorted( intel["technique_frequency"].items(), key=lambda x: x[1], reverse=True ): pct = count / total * 100 report += f"| {tech} | {count} | {pct:.1f}% |\n" report += "\n## Attacker Sophistication Distribution\n\n" dist = intel["sophistication_distribution"] report += f"- Script kiddie (< 0.3): {dist['script_kiddie']}\n" report += f"- Intermediate (0.3-0.7): {dist['intermediate']}\n" report += f"- Advanced (> 0.7): {dist['advanced']}\n\n" if intel["novel_techniques"]: report += "## Potentially Novel Techniques\n\n" for tech in intel["novel_techniques"]: report += f"- Session {tech['session']}: " report += f"{tech['technique_summary'][:100]}...\n" return report
Honeypot Design Considerations
| Consideration | Low Fidelity | High Fidelity |
|---|---|---|
| Cost | Low (static responses) | High (real LLM 推論) |
| Attacker engagement | Brief (sophisticated attackers leave quickly) | Extended (appears real) |
| Intelligence value | Limited (only catches unsophisticated attacks) | High (captures advanced techniques) |
| Maintenance | Low | High (must stay current with real AI apps) |
| Risk of misuse | Low | Medium (could provide real capabilities to attackers) |
Ethical Considerations
- Honeypots must never provide genuinely harmful capabilities. The fake 系統提示詞 and fake data should be convincing but never useful for real attacks.
- Data collection must comply with applicable privacy laws. Consult legal counsel before deploying.
- Honeypots should not entrap. They should attract attackers who are already looking for targets, not lure innocent users into committing offenses.
- Collected attack techniques should be handled as sensitive data. Novel attack prompts are dual-use information.
Troubleshooting
| Issue | Solution |
|---|---|
| No attackers interacting with the honeypot | Increase visibility: register on AI directories, create realistic documentation, use common AI application URL patterns |
| Only script kiddies, no sophisticated attacks | Increase fidelity. Sophisticated attackers quickly 識別 low-fidelity honeypots. Add realistic error handling, latency simulation, and model behavior |
| Attacker detected the honeypot | Review what gave it away: unrealistic response patterns, missing features, or honeypot-revealing metadata. Improve realism |
| Too much data to analyze | 實作 automated classification and prioritize high-sophistication sessions. Only manually review sessions with sophistication > 0.5 |
相關主題
- Build 代理 Scanner - Scanner techniques that inform honeypot 漏洞 simulation
- Build 越獄 Automation - 理解 automated attacks helps design honeypots that detect them
- Novel 越獄 Research - Honeypot data feeds novel technique research
- Lessons Learned - Real incidents that inform honeypot design
參考文獻
- "The Honeynet Project" - Lance Spitzner (2003) - Foundational work on honeypot deployment and 攻擊者 analysis
- "LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems" - Palomino et al. (2024) - LLM-powered honeypot systems for cybersecurity
- "Deception Technology in Cyber 防禦" - MITRE (2024) - Deception-based 防禦 frameworks applicable to AI systems
- "紅隊演練 Language Models with Language Models" - Perez et al. (2022) - Automated attack generation that informs honeypot 偵測 capabilities
What is the key design trade-off in AI honeypot 漏洞 simulation?