# anthropic
15 articlestagged with “anthropic”
Case Study: Many-Shot Jailbreaking Discovery
Deep analysis of Anthropic's many-shot jailbreaking research and its implications for long-context model safety.
Security Analysis of Claude Code CLI
In-depth security assessment of Claude Code CLI covering its permission model, tool execution, MCP integration, and enterprise security considerations.
Constitutional Classifiers
Anthropic's Constitutional Classifiers defense: using constitutional AI principles to train input/output classifiers that withstood 3,000+ hours of adversarial red teaming.
API Fine-Tuning Security
Security analysis of cloud fine-tuning APIs from OpenAI, Anthropic, Together AI, Fireworks AI, and others -- how these services create new attack surfaces and the defenses providers have deployed.
The AI API Ecosystem
A red teamer's guide to the AI API landscape — OpenAI, Anthropic, Google, AWS, Azure, open-source APIs, authentication patterns, and common security misconfigurations.
Sleeper Agent Models
Anthropic's research on models that behave differently when triggered by specific conditions: deceptive alignment, conditional backdoors, training-resistant deceptive behaviors, and implications for AI safety.
Constitutional Classifiers for AI Safety
Analysis of Anthropic's Constitutional Classifiers approach to jailbreak resistance.
Sleeper Agents: Training-Time Backdoors
Comprehensive analysis of Hubinger et al.'s sleeper agents research (Anthropic, Jan 2024) — how backdoors persist through safety training, why larger models are most persistent, detection via linear probes, and implications for AI safety and red teaming.
Lab: Anthropic Claude API Basics
Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.
Your First Claude API Call
Set up the Anthropic SDK and make your first Claude API call with system prompts and messages.
Claude (Anthropic) Overview
Architecture and security overview of Anthropic's Claude model family including Sonnet, Opus, and Haiku variants, Constitutional AI training, RLHF approach, and harmlessness design philosophy.
Claude Architecture Security Analysis
Deep security analysis of Claude's architecture including extended thinking, tool use, and safety mechanisms.
Many-Shot Jailbreaking Analysis
Deep analysis of the many-shot jailbreaking technique and its implications for in-context learning.
Many-Shot Jailbreaking Walkthrough
Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.
Testing Anthropic Claude: Complete Guide
Complete red team testing guide for Anthropic's Claude including tool use, extended thinking, and computer use.