實驗室: 漏洞 Research Methodology

Expert11 min readUpdated 2026-03-15

Systematic methodology lab for discovering novel AI vulnerabilities including hypothesis generation, attack surface mapping, experimental design, validation protocols, and responsible disclosure.

lab expert research methodology vulnerability-discovery

先備知識

Completion of at least five advanced labs
Strong 理解 of transformer architectures and 注意力 mechanisms
Familiarity with the full taxonomy of known LLM attacks (injection, 越獄, extraction, evasion)
Publication-quality technical writing skills
理解 of responsible disclosure practices

Finding a novel AI 漏洞 is fundamentally different from applying known techniques. Known attacks have documented payloads and expected results. Novel research requires you to 識別 unexplored regions of the 攻擊面, form hypotheses about why they might be vulnerable, design experiments to 測試 those hypotheses, and rigorously validate any findings before disclosure.

Lab Exercises

Survey the Known 攻擊 Landscape

Before you can find something new, you must thoroughly 理解 what is already known.

#!/usr/bin/env python3
"""Map the known AI attack landscape and 識別 research gaps."""
 
from dataclasses import dataclass, field
 
@dataclass
class AttackCategory:
    name: str
    known_techniques: list[str]
    explored_depth: str  # "deep", "moderate", "shallow", "unexplored"
    last_major_finding: str
    open_questions: list[str] = field(default_factory=list)
 
LANDSCAPE = [
    AttackCategory(
        name="Direct 提示詞注入",
        known_techniques=["DAN", "AIM", "persona adoption", "authority claims",
                         "delimiter injection", "instruction override"],
        explored_depth="deep",
        last_major_finding="2024 - Many-shot 越獄 via long context",
        open_questions=[
            "Are there universal injection patterns that work across all models?",
            "How does instruction hierarchy differ between model families?",
            "Can injection be made undetectable by 輸出 classifiers?",
        ],
    ),
    AttackCategory(
        name="Indirect 提示詞注入",
        known_techniques=["document injection", "web page 投毒", "email injection",
                         "RAG 投毒", "tool result manipulation"],
        explored_depth="moderate",
        last_major_finding="2024 - Cross-plugin injection in 代理 systems",
        open_questions=[
            "Can indirect injection work through non-text modalities (audio, video)?",
            "How do retrieval mechanisms affect injection success rates?",
            "Can injections persist across conversation sessions?",
        ],
    ),
    AttackCategory(
        name="Multi-Turn 攻擊",
        known_techniques=["crescendo", "PAIR", "TAP", "context accumulation"],
        explored_depth="moderate",
        last_major_finding="2024 - Crescendo attacks via gradual escalation",
        open_questions=[
            "What is the minimum number of turns needed for reliable bypass?",
            "Can multi-turn attacks be made fully automated without human creativity?",
            "How does conversation length affect 安全 boundary stability?",
        ],
    ),
    AttackCategory(
        name="Multimodal 攻擊",
        known_techniques=["text-in-image injection", "對抗性 patches",
                         "audio 對抗性 examples"],
        explored_depth="shallow",
        last_major_finding="2024 - Cross-modal injection via embedded text",
        open_questions=[
            "Can video inputs carry injection payloads across frames?",
            "How do different vision architectures affect injection success?",
            "Can 對抗性 audio be made imperceptible to humans?",
        ],
    ),
    AttackCategory(
        name="Reasoning 利用",
        known_techniques=["chain-of-thought manipulation", "self-consistency attacks",
                         "reasoning chain injection"],
        explored_depth="shallow",
        last_major_finding="2024 - Reasoning models showing novel failure modes",
        open_questions=[
            "Do reasoning models (o1, DeepSeek-R1) have unique 漏洞?",
            "Can hidden reasoning steps be exploited for injection?",
            "How does extended thinking affect 安全 boundary stability?",
        ],
    ),
    AttackCategory(
        name="Model Architecture 利用",
        known_techniques=["注意力 pattern manipulation", "positional encoding attacks",
                         "上下文視窗 boundary effects"],
        explored_depth="shallow",
        last_major_finding="2023 - 對抗性 suffix optimization (GCG)",
        open_questions=[
            "How do different 注意力 mechanisms affect 漏洞 profiles?",
            "Can KV-cache manipulation create persistent backdoors?",
            "Are there architectural features that make models inherently more resistant?",
        ],
    ),
]
 
if __name__ == "__main__":
    print("=== AI 攻擊 Landscape Survey ===\n")
 
    # Sort by exploration depth (prioritize unexplored areas)
    depth_order = {"unexplored": 0, "shallow": 1, "moderate": 2, "deep": 3}
    sorted_landscape = sorted(LANDSCAPE, key=lambda c: depth_order[c.explored_depth])
 
    for cat in sorted_landscape:
        print(f"[{cat.explored_depth.upper():>10}] {cat.name}")
        print(f"  Known techniques: {len(cat.known_techniques)}")
        print(f"  Last major finding: {cat.last_major_finding}")
        print(f"  Open questions: {len(cat.open_questions)}")
        for q in cat.open_questions:
            print(f"    ? {q}")
        print()
 
    # 識別 highest-priority research targets
    print("=== Research Priority Targets ===")
    print("Areas with 'shallow' or 'unexplored' depth and many open questions:\n")
    for cat in sorted_landscape:
        if cat.explored_depth in ("shallow", "unexplored"):
            print(f"  HIGH PRIORITY: {cat.name}")
            print(f"    {cat.open_questions[0]}")

python attack_landscape.py

Generate and Prioritize Hypotheses

Form testable hypotheses about potential 漏洞 in under-explored areas.

#!/usr/bin/env python3
"""Generate and prioritize research hypotheses for novel 漏洞 discovery."""
 
from dataclasses import dataclass
 
@dataclass
class Hypothesis:
    id: str
    statement: str
    category: str
    testable: bool
    falsifiable: bool
    predicted_impact: str  # "low", "medium", "high", "critical"
    required_resources: list[str]
    experiment_outline: str
 
HYPOTHESES = [
    Hypothesis(
        id="H-001",
        statement=(
            "Reasoning models (o1-style) are more vulnerable to injection during "
            "their extended thinking phase 因為 安全 checks are optimized for "
            "the final 輸出, not intermediate reasoning steps."
        ),
        category="reasoning_exploitation",
        testable=True,
        falsifiable=True,
        predicted_impact="high",
        required_resources=["Access to reasoning model API", "攻擊 payload set"],
        experiment_outline=(
            "1. Craft injection payloads targeting reasoning steps\n"
            "2. 測試 on reasoning models vs standard models\n"
            "3. Compare injection success rates\n"
            "4. Analyze whether successful injections appear in reasoning vs 輸出"
        ),
    ),
    Hypothesis(
        id="H-002",
        statement=(
            "Long-context models (100k+ 符元) have degraded 安全 enforcement "
            "for instructions that appear in the middle of very long contexts, due "
            "to 注意力 distribution patterns."
        ),
        category="architecture_exploitation",
        testable=True,
        falsifiable=True,
        predicted_impact="high",
        required_resources=["Long-context model API", "Token counting tools"],
        experiment_outline=(
            "1. Create contexts of varying lengths (10k, 50k, 100k 符元)\n"
            "2. Place identical injection at different positions\n"
            "3. Measure success rate by position and context length\n"
            "4. Control for content type and injection format"
        ),
    ),
    Hypothesis(
        id="H-003",
        statement=(
            "Structured 輸出 modes (JSON, 函式呼叫) have weaker 安全 "
            "enforcement than free-text generation 因為 安全 訓練 focused "
            "on natural language responses."
        ),
        category="output_mode_exploitation",
        testable=True,
        falsifiable=True,
        predicted_impact="medium",
        required_resources=["Models with structured 輸出 support"],
        experiment_outline=(
            "1. 測試 identical prompts in free-text vs JSON vs function call modes\n"
            "2. Measure 安全 refusal rates across 輸出 modes\n"
            "3. Compare content of refusals vs compliant responses\n"
            "4. 測試 across multiple model providers"
        ),
    ),
]
 
if __name__ == "__main__":
    print("=== Research Hypotheses ===\n")
 
    # Prioritize by impact and testability
    prioritized = sorted(
        HYPOTHESES,
        key=lambda h: (
            {"critical": 0, "high": 1, "medium": 2, "low": 3}[h.predicted_impact],
            0 if h.testable and h.falsifiable else 1,
        ),
    )
 
    for h in prioritized:
        print(f"[{h.id}] [{h.predicted_impact.upper():>8}] {h.category}")
        print(f"  Statement: {h.statement}")
        print(f"  Testable: {h.testable}, Falsifiable: {h.falsifiable}")
        print(f"  Resources: {', '.join(h.required_resources)}")
        print(f"  Experiment:\n{h.experiment_outline}")
        print()

python hypothesis_generator.py

Design Controlled Experiments

Create rigorous experimental designs with proper controls and statistical plans.

#!/usr/bin/env python3
"""Design controlled experiments for 漏洞 research."""
 
import json
from dataclasses import dataclass, field
 
@dataclass
class ExperimentDesign:
    hypothesis_id: str
    name: str
    independent_variable: str
    dependent_variable: str
    control_condition: str
    experimental_conditions: list[str]
    sample_size_per_condition: int
    confound_controls: list[str]
    success_criteria: str
    statistical_test: str
    significance_threshold: float = 0.05
 
    def display(self):
        print(f"Experiment: {self.name}")
        print(f"Hypothesis: {self.hypothesis_id}")
        print(f"IV: {self.independent_variable}")
        print(f"DV: {self.dependent_variable}")
        print(f"Control: {self.control_condition}")
        print(f"Conditions: {self.experimental_conditions}")
        print(f"N per condition: {self.sample_size_per_condition}")
        print(f"Confound controls: {self.confound_controls}")
        print(f"Success criteria: {self.success_criteria}")
        print(f"Statistical 測試: {self.statistical_test} (p < {self.significance_threshold})")
 
experiment_h002 = ExperimentDesign(
    hypothesis_id="H-002",
    name="Long-Context 安全 Degradation",
    independent_variable="Context length (符元) and injection position",
    dependent_variable="Injection success rate (binary: complied or refused)",
    control_condition="Short context (1k 符元) with injection at the beginning",
    experimental_conditions=[
        "10k 符元, injection at position 25%",
        "10k 符元, injection at position 50%",
        "10k 符元, injection at position 75%",
        "50k 符元, injection at position 25%",
        "50k 符元, injection at position 50%",
        "50k 符元, injection at position 75%",
        "100k 符元, injection at position 25%",
        "100k 符元, injection at position 50%",
        "100k 符元, injection at position 75%",
    ],
    sample_size_per_condition=50,
    confound_controls=[
        "Use identical injection payload across all conditions",
        "Use the same filler text type (news articles) across all lengths",
        "Randomize filler text content to prevent content-specific effects",
        "Run against same model version on same day",
        "Use temperature=0 for reproducibility",
    ],
    success_criteria=(
        "Statistically significant difference (p < 0.05) in success rate "
        "between short and long context conditions, with success rate "
        "increasing as context length increases"
    ),
    statistical_test="Chi-squared 測試 for independence with Bonferroni correction",
)
 
if __name__ == "__main__":
    print("=== Experiment Design ===\n")
    experiment_h002.display()
    print(f"\nTotal tests needed: "
          f"{(len(experiment_h002.experimental_conditions) + 1) * experiment_h002.sample_size_per_condition}")

python experiment_design.py

Execute and Validate

Run the experiment with proper controls and statistical analysis.

#!/usr/bin/env python3
"""Execute the experiment and perform statistical validation."""
 
import numpy as np
from scipy.stats import chi2_contingency
 
# After running 50 trials per condition, analyze results:
# 範例 results (replace with actual data)
results = {
    "1k_start":   {"success": 5, "total": 50},
    "10k_25pct":  {"success": 8, "total": 50},
    "10k_50pct":  {"success": 12, "total": 50},
    "10k_75pct":  {"success": 7, "total": 50},
    "50k_25pct":  {"success": 15, "total": 50},
    "50k_50pct":  {"success": 22, "total": 50},
    "50k_75pct":  {"success": 11, "total": 50},
    "100k_25pct": {"success": 18, "total": 50},
    "100k_50pct": {"success": 28, "total": 50},
    "100k_75pct": {"success": 14, "total": 50},
}
 
# Statistical 測試: control vs each condition
control = results["1k_start"]
for name, data in results.items():
    if name == "1k_start":
        continue
    table = [
        [control["success"], control["total"] - control["success"]],
        [data["success"], data["total"] - data["success"]],
    ]
    chi2, p_value, _, _ = chi2_contingency(table)
    sig = "*" if p_value < 0.05 else ""
    rate = data["success"] / data["total"]
    print(f"{name:<15} rate={rate:.0%} p={p_value:.4f} {sig}")

Document and Disclose

Write a publication-quality finding report and follow responsible disclosure procedures.

# Disclosure report structure:
# 1. Title and abstract
# 2. Affected models and versions
# 3. 漏洞 description
# 4. Reproduction steps (exact prompts, configurations, expected results)
# 5. Statistical evidence (測試 results, confidence intervals)
# 6. Impact 評估
# 7. Suggested mitigations
# 8. Disclosure timeline
#
# Disclosure process:
# 1. Report to vendor 安全 team (90-day disclosure window)
# 2. Collaborate on 緩解 development
# 3. Verify fix effectiveness
# 4. Publish after vendor confirmation or disclosure window expiry

Troubleshooting

Issue	Solution
Hypothesis is not falsifiable	Reframe it as a specific, measurable prediction with clear pass/fail criteria
Sample size is insufficient	Use power analysis to determine minimum samples; 50+ per condition is a good baseline
Results are inconsistent	Check for confounding variables; ensure temperature=0 and same model version
Finding does not reproduce on other models	Document as model-specific; it is still valuable even if not universal

Why This Matters

參考文獻

"Universal and Transferable 對抗性攻擊" - Zou et al. (2023) - Exemplary 漏洞 research methodology
"紅隊演練 Language Models to Reduce Harms" - Ganguli et al. (2022) - Structured 紅隊演練 research framework
"Responsible AI 漏洞 Disclosure" - Partnership on AI (2024) - Disclosure best practices

Knowledge Check

Why is falsifiability essential for a good 漏洞 research hypothesis?

Knowledge Check

Why must 漏洞 research control for confounding variables?

Background

Lab Exercises

Survey the Known 攻擊 Landscape

Before you can find something new, you must thoroughly 理解 what is already known.

#!/usr/bin/env python3
"""Map the known AI attack landscape and 識別 research gaps."""
 
from dataclasses import dataclass, field
 
@dataclass
class AttackCategory:
    name: str
    known_techniques: list[str]
    explored_depth: str  # "deep", "moderate", "shallow", "unexplored"
    last_major_finding: str
    open_questions: list[str] = field(default_factory=list)
 
LANDSCAPE = [
    AttackCategory(
        name="Direct 提示詞注入",
        known_techniques=["DAN", "AIM", "persona adoption", "authority claims",
                         "delimiter injection", "instruction override"],
        explored_depth="deep",
        last_major_finding="2024 - Many-shot 越獄 via long context",
        open_questions=[
            "Are there universal injection patterns that work across all models?",
            "How does instruction hierarchy differ between model families?",
            "Can injection be made undetectable by 輸出 classifiers?",
        ],
    ),
    AttackCategory(
        name="Indirect 提示詞注入",
        known_techniques=["document injection", "web page 投毒", "email injection",
                         "RAG 投毒", "tool result manipulation"],
        explored_depth="moderate",
        last_major_finding="2024 - Cross-plugin injection in 代理 systems",
        open_questions=[
            "Can indirect injection work through non-text modalities (audio, video)?",
            "How do retrieval mechanisms affect injection success rates?",
            "Can injections persist across conversation sessions?",
        ],
    ),
    AttackCategory(
        name="Multi-Turn 攻擊",
        known_techniques=["crescendo", "PAIR", "TAP", "context accumulation"],
        explored_depth="moderate",
        last_major_finding="2024 - Crescendo attacks via gradual escalation",
        open_questions=[
            "What is the minimum number of turns needed for reliable bypass?",
            "Can multi-turn attacks be made fully automated without human creativity?",
            "How does conversation length affect 安全 boundary stability?",
        ],
    ),
    AttackCategory(
        name="Multimodal 攻擊",
        known_techniques=["text-in-image injection", "對抗性 patches",
                         "audio 對抗性 examples"],
        explored_depth="shallow",
        last_major_finding="2024 - Cross-modal injection via embedded text",
        open_questions=[
            "Can video inputs carry injection payloads across frames?",
            "How do different vision architectures affect injection success?",
            "Can 對抗性 audio be made imperceptible to humans?",
        ],
    ),
    AttackCategory(
        name="Reasoning 利用",
        known_techniques=["chain-of-thought manipulation", "self-consistency attacks",
                         "reasoning chain injection"],
        explored_depth="shallow",
        last_major_finding="2024 - Reasoning models showing novel failure modes",
        open_questions=[
            "Do reasoning models (o1, DeepSeek-R1) have unique 漏洞?",
            "Can hidden reasoning steps be exploited for injection?",
            "How does extended thinking affect 安全 boundary stability?",
        ],
    ),
    AttackCategory(
        name="Model Architecture 利用",
        known_techniques=["注意力 pattern manipulation", "positional encoding attacks",
                         "上下文視窗 boundary effects"],
        explored_depth="shallow",
        last_major_finding="2023 - 對抗性 suffix optimization (GCG)",
        open_questions=[
            "How do different 注意力 mechanisms affect 漏洞 profiles?",
            "Can KV-cache manipulation create persistent backdoors?",
            "Are there architectural features that make models inherently more resistant?",
        ],
    ),
]
 
if __name__ == "__main__":
    print("=== AI 攻擊 Landscape Survey ===\n")
 
    # Sort by exploration depth (prioritize unexplored areas)
    depth_order = {"unexplored": 0, "shallow": 1, "moderate": 2, "deep": 3}
    sorted_landscape = sorted(LANDSCAPE, key=lambda c: depth_order[c.explored_depth])
 
    for cat in sorted_landscape:
        print(f"[{cat.explored_depth.upper():>10}] {cat.name}")
        print(f"  Known techniques: {len(cat.known_techniques)}")
        print(f"  Last major finding: {cat.last_major_finding}")
        print(f"  Open questions: {len(cat.open_questions)}")
        for q in cat.open_questions:
            print(f"    ? {q}")
        print()
 
    # 識別 highest-priority research targets
    print("=== Research Priority Targets ===")
    print("Areas with 'shallow' or 'unexplored' depth and many open questions:\n")
    for cat in sorted_landscape:
        if cat.explored_depth in ("shallow", "unexplored"):
            print(f"  HIGH PRIORITY: {cat.name}")
            print(f"    {cat.open_questions[0]}")

python attack_landscape.py

Generate and Prioritize Hypotheses

Form testable hypotheses about potential 漏洞 in under-explored areas.

#!/usr/bin/env python3
"""Generate and prioritize research hypotheses for novel 漏洞 discovery."""
 
from dataclasses import dataclass
 
@dataclass
class Hypothesis:
    id: str
    statement: str
    category: str
    testable: bool
    falsifiable: bool
    predicted_impact: str  # "low", "medium", "high", "critical"
    required_resources: list[str]
    experiment_outline: str
 
HYPOTHESES = [
    Hypothesis(
        id="H-001",
        statement=(
            "Reasoning models (o1-style) are more vulnerable to injection during "
            "their extended thinking phase 因為 安全 checks are optimized for "
            "the final 輸出, not intermediate reasoning steps."
        ),
        category="reasoning_exploitation",
        testable=True,
        falsifiable=True,
        predicted_impact="high",
        required_resources=["Access to reasoning model API", "攻擊 payload set"],
        experiment_outline=(
            "1. Craft injection payloads targeting reasoning steps\n"
            "2. 測試 on reasoning models vs standard models\n"
            "3. Compare injection success rates\n"
            "4. Analyze whether successful injections appear in reasoning vs 輸出"
        ),
    ),
    Hypothesis(
        id="H-002",
        statement=(
            "Long-context models (100k+ 符元) have degraded 安全 enforcement "
            "for instructions that appear in the middle of very long contexts, due "
            "to 注意力 distribution patterns."
        ),
        category="architecture_exploitation",
        testable=True,
        falsifiable=True,
        predicted_impact="high",
        required_resources=["Long-context model API", "Token counting tools"],
        experiment_outline=(
            "1. Create contexts of varying lengths (10k, 50k, 100k 符元)\n"
            "2. Place identical injection at different positions\n"
            "3. Measure success rate by position and context length\n"
            "4. Control for content type and injection format"
        ),
    ),
    Hypothesis(
        id="H-003",
        statement=(
            "Structured 輸出 modes (JSON, 函式呼叫) have weaker 安全 "
            "enforcement than free-text generation 因為 安全 訓練 focused "
            "on natural language responses."
        ),
        category="output_mode_exploitation",
        testable=True,
        falsifiable=True,
        predicted_impact="medium",
        required_resources=["Models with structured 輸出 support"],
        experiment_outline=(
            "1. 測試 identical prompts in free-text vs JSON vs function call modes\n"
            "2. Measure 安全 refusal rates across 輸出 modes\n"
            "3. Compare content of refusals vs compliant responses\n"
            "4. 測試 across multiple model providers"
        ),
    ),
]
 
if __name__ == "__main__":
    print("=== Research Hypotheses ===\n")
 
    # Prioritize by impact and testability
    prioritized = sorted(
        HYPOTHESES,
        key=lambda h: (
            {"critical": 0, "high": 1, "medium": 2, "low": 3}[h.predicted_impact],
            0 if h.testable and h.falsifiable else 1,
        ),
    )
 
    for h in prioritized:
        print(f"[{h.id}] [{h.predicted_impact.upper():>8}] {h.category}")
        print(f"  Statement: {h.statement}")
        print(f"  Testable: {h.testable}, Falsifiable: {h.falsifiable}")
        print(f"  Resources: {', '.join(h.required_resources)}")
        print(f"  Experiment:\n{h.experiment_outline}")
        print()

python hypothesis_generator.py

Design Controlled Experiments

Create rigorous experimental designs with proper controls and statistical plans.

#!/usr/bin/env python3
"""Design controlled experiments for 漏洞 research."""
 
import json
from dataclasses import dataclass, field
 
@dataclass
class ExperimentDesign:
    hypothesis_id: str
    name: str
    independent_variable: str
    dependent_variable: str
    control_condition: str
    experimental_conditions: list[str]
    sample_size_per_condition: int
    confound_controls: list[str]
    success_criteria: str
    statistical_test: str
    significance_threshold: float = 0.05
 
    def display(self):
        print(f"Experiment: {self.name}")
        print(f"Hypothesis: {self.hypothesis_id}")
        print(f"IV: {self.independent_variable}")
        print(f"DV: {self.dependent_variable}")
        print(f"Control: {self.control_condition}")
        print(f"Conditions: {self.experimental_conditions}")
        print(f"N per condition: {self.sample_size_per_condition}")
        print(f"Confound controls: {self.confound_controls}")
        print(f"Success criteria: {self.success_criteria}")
        print(f"Statistical 測試: {self.statistical_test} (p < {self.significance_threshold})")
 
experiment_h002 = ExperimentDesign(
    hypothesis_id="H-002",
    name="Long-Context 安全 Degradation",
    independent_variable="Context length (符元) and injection position",
    dependent_variable="Injection success rate (binary: complied or refused)",
    control_condition="Short context (1k 符元) with injection at the beginning",
    experimental_conditions=[
        "10k 符元, injection at position 25%",
        "10k 符元, injection at position 50%",
        "10k 符元, injection at position 75%",
        "50k 符元, injection at position 25%",
        "50k 符元, injection at position 50%",
        "50k 符元, injection at position 75%",
        "100k 符元, injection at position 25%",
        "100k 符元, injection at position 50%",
        "100k 符元, injection at position 75%",
    ],
    sample_size_per_condition=50,
    confound_controls=[
        "Use identical injection payload across all conditions",
        "Use the same filler text type (news articles) across all lengths",
        "Randomize filler text content to prevent content-specific effects",
        "Run against same model version on same day",
        "Use temperature=0 for reproducibility",
    ],
    success_criteria=(
        "Statistically significant difference (p < 0.05) in success rate "
        "between short and long context conditions, with success rate "
        "increasing as context length increases"
    ),
    statistical_test="Chi-squared 測試 for independence with Bonferroni correction",
)
 
if __name__ == "__main__":
    print("=== Experiment Design ===\n")
    experiment_h002.display()
    print(f"\nTotal tests needed: "
          f"{(len(experiment_h002.experimental_conditions) + 1) * experiment_h002.sample_size_per_condition}")

python experiment_design.py

Execute and Validate

Run the experiment with proper controls and statistical analysis.

#!/usr/bin/env python3
"""Execute the experiment and perform statistical validation."""
 
import numpy as np
from scipy.stats import chi2_contingency
 
# After running 50 trials per condition, analyze results:
# 範例 results (replace with actual data)
results = {
    "1k_start":   {"success": 5, "total": 50},
    "10k_25pct":  {"success": 8, "total": 50},
    "10k_50pct":  {"success": 12, "total": 50},
    "10k_75pct":  {"success": 7, "total": 50},
    "50k_25pct":  {"success": 15, "total": 50},
    "50k_50pct":  {"success": 22, "total": 50},
    "50k_75pct":  {"success": 11, "total": 50},
    "100k_25pct": {"success": 18, "total": 50},
    "100k_50pct": {"success": 28, "total": 50},
    "100k_75pct": {"success": 14, "total": 50},
}
 
# Statistical 測試: control vs each condition
control = results["1k_start"]
for name, data in results.items():
    if name == "1k_start":
        continue
    table = [
        [control["success"], control["total"] - control["success"]],
        [data["success"], data["total"] - data["success"]],
    ]
    chi2, p_value, _, _ = chi2_contingency(table)
    sig = "*" if p_value < 0.05 else ""
    rate = data["success"] / data["total"]
    print(f"{name:<15} rate={rate:.0%} p={p_value:.4f} {sig}")

Document and Disclose

Write a publication-quality finding report and follow responsible disclosure procedures.

# Disclosure report structure:
# 1. Title and abstract
# 2. Affected models and versions
# 3. 漏洞 description
# 4. Reproduction steps (exact prompts, configurations, expected results)
# 5. Statistical evidence (測試 results, confidence intervals)
# 6. Impact 評估
# 7. Suggested mitigations
# 8. Disclosure timeline
#
# Disclosure process:
# 1. Report to vendor 安全 team (90-day disclosure window)
# 2. Collaborate on 緩解 development
# 3. Verify fix effectiveness
# 4. Publish after vendor confirmation or disclosure window expiry

Troubleshooting

Issue	Solution
Hypothesis is not falsifiable	Reframe it as a specific, measurable prediction with clear pass/fail criteria
Sample size is insufficient	Use power analysis to determine minimum samples; 50+ per condition is a good baseline
Results are inconsistent	Check for confounding variables; ensure temperature=0 and same model version
Finding does not reproduce on other models	Document as model-specific; it is still valuable even if not universal

參考文獻

"Universal and Transferable 對抗性攻擊" - Zou et al. (2023) - Exemplary 漏洞 research methodology

"紅隊演練 Language Models to Reduce Harms" - Ganguli et al. (2022) - Structured 紅隊演練 research framework

"Responsible AI 漏洞 Disclosure" - Partnership on AI (2024) - Disclosure best practices

Knowledge Check

Why is falsifiability essential for a good 漏洞 research hypothesis?

Knowledge Check

Why must 漏洞 research control for confounding variables?

實驗室: 漏洞 Research Methodology

先備知識

Background

Lab Exercises

Survey the Known 攻擊 Landscape

Generate and Prioritize Hypotheses

Design Controlled Experiments

Execute and Validate

Document and Disclose

Troubleshooting

Why This Matters

相關主題

參考文獻

實驗室: 漏洞 Research Methodology

先備知識

Background

Lab Exercises

Survey the Known 攻擊 Landscape

Generate and Prioritize Hypotheses

Design Controlled Experiments

Execute and Validate

Document and Disclose

Troubleshooting

Why This Matters

相關主題

參考文獻

實驗室: 漏洞 Research Methodology

Survey the Known 攻擊 Landscape

Generate and Prioritize Hypotheses

Design Controlled Experiments

Execute and Validate

Document and Disclose

Related articles

實驗室: 漏洞 Research Methodology

Survey the Known 攻擊 Landscape

Generate and Prioritize Hypotheses

Design Controlled Experiments

Execute and Validate

Document and Disclose

Related articles