# anthropic

16 articlestagged with “anthropic”

Function Calling Exploitation

Practical attacks against OpenAI function calling, Anthropic tool use, and similar APIs -- injecting function calls through prompt injection, exploiting parameter validation gaps, and chaining calls.

agentsfunction-callingopenaianthropictool-useinjection

Intermediate

Case Study: Many-Shot Jailbreaking Discovery

Deep analysis of Anthropic's many-shot jailbreaking research and its implications for long-context model safety.

case-studyanthropicmany-shot

Intermediate

Security Analysis of Claude Code CLI

In-depth security assessment of Claude Code CLI covering its permission model, tool execution, MCP integration, and enterprise security considerations.

code-gen-securityclaude-codecli-securityanthropic

Advanced

Constitutional Classifiers

Anthropic's Constitutional Classifiers defense: using constitutional AI principles to train input/output classifiers that withstood 3,000+ hours of adversarial red teaming.

constitutional-classifiersdefensejailbreak-defenseanthropicclassifiersconstitutional-ai

Intermediate

API Fine-Tuning Security

Security analysis of cloud fine-tuning APIs from OpenAI, Anthropic, Together AI, Fireworks AI, and others -- how these services create new attack surfaces and the defenses providers have deployed.

api-fine-tuningopenaianthropictogether-aifireworkssafetycloud-security

Intermediate

The AI API Ecosystem

A red teamer's guide to the AI API landscape — OpenAI, Anthropic, Google, AWS, Azure, open-source APIs, authentication patterns, and common security misconfigurations.

apiauthenticationopenaianthropicsecuritybeginner

Beginner

Sleeper Agent Models

Anthropic's research on models that behave differently when triggered by specific conditions: deceptive alignment, conditional backdoors, training-resistant deceptive behaviors, and implications for AI safety.

sleeper-agentsdeceptive-alignmentbackdooranthropicai-safety

Expert

Constitutional Classifiers for AI Safety

Analysis of Anthropic's Constitutional Classifiers approach to jailbreak resistance.

frontier-researchconstitutional-classifierssafetyanthropic

Advanced

Sleeper Agents: Training-Time Backdoors

Comprehensive analysis of Hubinger et al.'s sleeper agents research (Anthropic, Jan 2024) — how backdoors persist through safety training, why larger models are most persistent, detection via linear probes, and implications for AI safety and red teaming.

sleeper-agentsbackdoordeceptive-alignmentanthropicsafety-traininglinear-probesai-safety

Advanced

Lab: Anthropic Claude API Basics

Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.

labanthropicclaudeapibeginner

Beginner

Your First Claude API Call

Set up the Anthropic SDK and make your first Claude API call with system prompts and messages.

labsanthropicapibeginner

Beginner

Claude (Anthropic) Overview

Architecture and security overview of Anthropic's Claude model family including Sonnet, Opus, and Haiku variants, Constitutional AI training, RLHF approach, and harmlessness design philosophy.

claudeanthropicconstitutional-airlhfharmlessnessred-teaming

Intermediate

Claude Architecture Security Analysis

Deep security analysis of Claude's architecture including extended thinking, tool use, and safety mechanisms.

model-deep-divesclaudeanthropicsecurity

Advanced

Many-Shot Jailbreaking Analysis

Deep analysis of the many-shot jailbreaking technique and its implications for in-context learning.

prompt-injectionmany-shotjailbreakinganthropic

Advanced

Many-Shot Jailbreaking Walkthrough

Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.

walkthroughsmany-shotjailbreakinganthropic

Intermediate

Testing Anthropic Claude: Complete Guide

Complete red team testing guide for Anthropic's Claude including tool use, extended thinking, and computer use.

walkthroughsplatformsanthropicclaude

Intermediate