# claude
標記為「claude」的 16 篇文章
Case Study: Claude Many-Shot Jailbreaking
Analysis of Anthropic's disclosure of many-shot jailbreaking and its implications for in-context learning.
Lab: Anthropic Claude API Basics
Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.
Claude Attack Surface
Claude-specific attack vectors including Constitutional AI weaknesses, tool use exploitation, system prompt handling, vision attacks, and XML tag injection techniques.
Claude (Anthropic) Overview
Architecture and security overview of Anthropic's Claude model family including Sonnet, Opus, and Haiku variants, Constitutional AI training, RLHF approach, and harmlessness design philosophy.
Claude Known Vulnerabilities
Documented Claude vulnerabilities including many-shot jailbreaking, alignment faking research, crescendo attacks, prompt injection via artifacts, and system prompt extraction techniques.
Claude Testing Methodology
Systematic methodology for red teaming Claude models, including API probing, model card analysis, safety boundary mapping, and comparative testing across Opus, Sonnet, and Haiku tiers.
Claude Architecture Security Analysis
Deep security analysis of Claude's architecture including extended thinking, tool use, and safety mechanisms.
Testing Anthropic Claude: Complete Guide
Complete red team testing guide for Anthropic's Claude including tool use, extended thinking, and computer use.
Case Study: Claude Many-Shot 越獄ing
Analysis of Anthropic's disclosure of many-shot jailbreaking and its implications for in-context learning.
實驗室: Anthropic Claude API Basics
Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.
Claude 攻擊面
Claude 特有攻擊向量,含憲法 AI 弱點、工具使用利用、系統提示處理、視覺攻擊與 XML 標籤注入技術。
Claude(Anthropic)概觀
Anthropic Claude 模型家族的架構與安全概觀,涵蓋 Sonnet、Opus 與 Haiku 變體、Constitutional AI 訓練、RLHF 做法,以及 harmlessness 設計哲學。
Claude 已知漏洞
已記錄之 Claude 漏洞,包括 many-shot jailbreak、對齊偽裝研究、crescendo 攻擊、經由 artifact 之提示注入,以及系統提示擷取技術。
Claude Testing Methodology
Systematic methodology for red teaming Claude models, including API probing, model card analysis, safety boundary mapping, and comparative testing across Opus, Sonnet, and Haiku tiers.
Claude Architecture 安全 Analysis
Deep security analysis of Claude's architecture including extended thinking, tool use, and safety mechanisms.
Testing Anthropic Claude: Complete 指南
Complete red team testing guide for Anthropic's Claude including tool use, extended thinking, and computer use.