# anthropic

16 artikelengetagd met “anthropic”

Misbruik van function calling

Praktische aanvallen op OpenAI function calling, Anthropic tool use en vergelijkbare API's -- function calls injecteren via prompt injection, gaten in parametervalidatie misbruiken en calls aan elkaar koppelen.

agentsfunction-callingopenaianthropictool-useinjection

Gemiddeld

Casestudy: de ontdekking van many-shot jailbreaken

Deep analysis of Anthropic's many-shot jailbreaking research and its implications for long-context model safety.

case-studyanthropicmany-shot

Gemiddeld

Beveiligingsanalyse van de Claude Code CLI

In-depth security assessment of Claude Code CLI covering its permission model, tool execution, MCP integration, and enterprise security considerations.

code-gen-securityclaude-codecli-securityanthropic

Gevorderd

Constitutional Classifiers

Anthropics Constitutional Classifiers-verdediging: het gebruik van constitutional AI-principes om input/output-classifiers te trainen die 3.000+ uur adversarieel red teaming doorstonden.

constitutional-classifiersdefensejailbreak-defenseanthropicclassifiersconstitutional-ai

Gemiddeld

Beveiliging van API-fine-tuning

Security analysis of cloud fine-tuning APIs from OpenAI, Anthropic, Together AI, Fireworks AI, and others -- how these services create new attack surfaces and the defenses providers have deployed.

api-fine-tuningopenaianthropictogether-aifireworkssafetycloud-security

Gemiddeld

Het AI-API-ecosysteem

Een gids voor de redteamer door het AI-API-landschap — OpenAI, Anthropic, Google, AWS, Azure, open-source API's, authenticatiepatronen en veelvoorkomende beveiligingsmisconfiguraties.

apiauthenticationopenaianthropicsecuritybeginner

Beginner

Sleeper Agent-Modellen

Anthropics onderzoek naar modellen die zich anders gedragen wanneer ze worden getriggerd door specifieke voorwaarden: misleidende alignment, conditionele backdoors, trainingsbestendig misleidend gedrag, en implicaties voor AI-veiligheid.

sleeper-agentsdeceptive-alignmentbackdooranthropicai-safety

Expert

Constitutional Classifiers voor AI-veiligheid

Analyse van Anthropics Constitutional Classifiers-aanpak voor jailbreak-weerbaarheid.

frontier-researchconstitutional-classifierssafetyanthropic

Gevorderd

Sleeper Agents: Backdoors tijdens de training

Uitgebreide analyse van het sleeper agents-onderzoek van Hubinger et al. (Anthropic, jan. 2024) — hoe backdoors veiligheidstraining overleven, waarom grotere modellen het meest persistent zijn, detectie via lineaire probes, en implicaties voor AI-veiligheid en red teaming.

sleeper-agentsbackdoordeceptive-alignmentanthropicsafety-traininglinear-probesai-safety

Gevorderd

Lab: grondbeginselen van de Anthropic Claude API

Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.

labanthropicclaudeapibeginner

Beginner

Je eerste Claude-API-call

Set up the Anthropic SDK and make your first Claude API call with system prompts and messages.

labsanthropicapibeginner

Beginner

Overzicht van Claude (Anthropic)

Architecture and security overview of Anthropic's Claude model family including Sonnet, Opus, and Haiku variants, Constitutional AI training, RLHF approach, and harmlessness design philosophy.

claudeanthropicconstitutional-airlhfharmlessnessred-teaming

Gemiddeld

Beveiligingsanalyse van de Claude-architectuur

Deep security analysis of Claude's architecture including extended thinking, tool use, and safety mechanisms.

model-deep-divesclaudeanthropicsecurity

Gevorderd

Analyse van many-shot jailbreaking

Diepgaande analyse van de many-shot jailbreaking-techniek en wat die betekent voor in-context learning.

prompt-injectionmany-shotjailbreakinganthropic

Gevorderd

Walkthrough: many-shot jailbreaking

Walkthrough implementing Anthropic's many-shot jailbreaking technique with analysis of scaling behavior.

walkthroughsmany-shotjailbreakinganthropic

Gemiddeld

Anthropic Claude testen: complete gids

Complete red team testing guide for Anthropic's Claude including tool use, extended thinking, and computer use.

walkthroughsplatformsanthropicclaude

Gemiddeld