跳至主要內容

主題詞彙表部落格 ATT&CK 導覽器挑戰

Loading...

© 2026 redteams.ai. 保留所有權利。

採用 Next.js

主題詞彙表標籤部落格 ATT&CK 導覽器挑戰

方法論貢獻書籤 RSS 訂閱 GitHub 聯絡我們

隱私權 Cookie 服務條款版權資訊

// stay adversarial

# anthropic

標記為「anthropic」的 15 篇文章

案例研究:Many-Shot 越獄的發現

深入分析 Anthropic 的 many-shot 越獄研究,以及對長上下文模型安全的啟示。

case-studyanthropicmany-shot

安全 Analysis of Claude Code CLI

In-depth security assessment of Claude Code CLI covering its permission model, tool execution, MCP integration, and enterprise security considerations.

code-gen-securityclaude-codecli-securityanthropic

憲法式分類器

Anthropic 的憲法式分類器防禦:使用憲法式 AI 原則訓練承受 3,000+ 小時對抗性紅隊演練的輸入/輸出分類器。

constitutional-classifiersdefensejailbreak-defenseanthropicclassifiersconstitutional-ai

API 微調安全

對來自 OpenAI、Anthropic、Together AI、Fireworks AI 等之雲端微調 API 的安全分析——這些服務如何建立新的攻擊面，以及供應商已部署的防禦。

api-fine-tuningopenaianthropictogether-aifireworkssafetycloud-security

AI API 生態系

紅隊員之 AI API 生態系指南——OpenAI、Anthropic、Google、AWS、Azure、開源 API、身分驗證模式，與常見安全錯誤組態。

apiauthenticationopenaianthropicsecuritybeginner

Sleeper 代理模型s

Anthropic's research on models that behave differently when triggered by specific conditions: deceptive alignment, conditional backdoors, training-resistant deceptive behaviors, and implications for AI safety.

sleeper-agentsdeceptive-alignmentbackdooranthropicai-safety

憲法式分類器

作為 LLM 應用之外部安全護欄的憲法式分類器研究。

frontier-researchconstitutional-classifierssafetyanthropic

潛伏代理:訓練時植入的後門

全面剖析 Hubinger et al. 的潛伏代理研究 (Anthropic, 2024 年 1 月)——後門如何穿越安全訓練而存活、為何愈大的模型愈能維持後門、線性探測偵測,以及對 AI 安全與紅隊演練的啟示。

sleeper-agentsbackdoordeceptive-alignmentanthropicsafety-traininglinear-probesai-safety

Lab: Anthropic Claude API Basics

設定 the Anthropic Claude API for 紅隊演練, learn authentication, the Messages API, 系統提示詞s, and how temperature and top-p affect attack success rates.

labanthropicclaudeapibeginner

Your First Claude API Call

設定 the Anthropic SDK and make your first Claude API call with 系統提示詞s and messages.

labsanthropicapibeginner

Claude（Anthropic）概觀

Anthropic Claude 模型家族的架構與安全概觀，涵蓋 Sonnet、Opus 與 Haiku 變體、Constitutional AI 訓練、RLHF 做法，以及 harmlessness 設計哲學。

claudeanthropicconstitutional-airlhfharmlessnessred-teaming

Claude 架構安全性

Anthropic 的 Claude 架構、其訓練方法及所衍生安全特性的深入探討。

model-deep-divesclaudeanthropicsecurity

Many-Shot 越獄分析

對 many-shot 越獄技術及其對上下文學習意涵的深入分析。

prompt-injectionmany-shotjailbreakinganthropic

Many-Shot Jailbreaking 詳解

Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.

walkthroughsmany-shotjailbreakinganthropic

測試 Anthropic Claude:完整指南

Anthropic Claude 的完整紅隊測試指南(含工具使用、延伸思考與電腦使用)。

walkthroughsplatformsanthropicclaude