The AI API Ecosystem
A red teamer's guide to the AI API landscape — OpenAI, Anthropic, Google, AWS, Azure, open-source APIs, authentication patterns, and common security misconfigurations.
The API as Attack Surface
Most AI applications do not interact with models directly — they call APIs. These APIs are the gatekeepers between user input and model inference, handling authentication, rate limiting, input validation, and response formatting. For red teamers, the API layer is often the first point of contact and a rich source of vulnerabilities.
Major Provider APIs
OpenAI API
The most widely used AI API, powering GPT-4, GPT-4o, o1, o3, DALL-E, and Whisper.
| Aspect | Details |
|---|---|
| Authentication | Bearer token (sk-... API keys) |
| Rate limiting | Per-organization, tiered by usage plan |
| Endpoints | /v1/chat/completions, /v1/embeddings, /v1/images/generations, /v1/audio |
| Notable features | Function calling, JSON mode, streaming, logprobs, moderation endpoint |
| Security-relevant | Moderation API runs separately from completions; tool use expands attack surface |
Red team notes: OpenAI API keys follow a predictable format (sk-proj-... for project keys, sk-... for legacy keys). The moderation endpoint can be tested independently to understand what the content filter will catch. Function calling introduces structured output that bypasses text-based output filters.
Anthropic API
Powers Claude models with a focus on safety and constitutional AI.
| Aspect | Details |
|---|---|
| Authentication | API key header (x-api-key) |
| Rate limiting | Per-key, with separate limits for input and output tokens |
| Endpoints | /v1/messages, /v1/complete (legacy) |
| Notable features | System prompt as separate parameter, tool use, streaming, vision |
| Security-relevant | System prompt treated as first-class parameter (not just a message role) |
Red team notes: Anthropic's system prompt architecture separates system instructions from the message history more distinctly than other providers. Rate limits differentiate between input and output tokens, which affects extraction attack strategies.
Google AI APIs
Powers Gemini models through both direct API and Google Cloud Vertex AI.
| Aspect | Details |
|---|---|
| Authentication | API key (direct) or OAuth/service account (Vertex AI) |
| Rate limiting | Per-project quotas in Google Cloud |
| Endpoints | generativelanguage.googleapis.com, Vertex AI endpoints |
| Notable features | Grounding with Google Search, code execution, multimodal (text, image, video, audio) |
| Security-relevant | Google Search grounding introduces indirect injection via web content |
Red team notes: Gemini's grounding feature that connects to Google Search introduces a significant indirect prompt injection vector — web pages retrieved during grounding can contain adversarial instructions. The dual API surface (direct and Vertex) may have different security configurations.
AWS Bedrock
Amazon's managed service providing access to multiple model providers through a unified API.
| Aspect | Details |
|---|---|
| Authentication | AWS IAM (Signature V4) |
| Rate limiting | Provisioned throughput or on-demand with account limits |
| Endpoints | bedrock-runtime.\{region\}.amazonaws.com |
| Notable features | Multi-model access (Claude, Llama, Mistral, Titan), Guardrails API, Knowledge Bases |
| Security-relevant | IAM policies can be misconfigured; Guardrails API is a separate layer |
Red team notes: AWS Bedrock wraps multiple model providers behind AWS IAM authentication. Misconfigured IAM policies (overly broad bedrock:InvokeModel permissions) are common. The Bedrock Guardrails API adds a provider-managed guardrail layer that is independent of individual model safety.
Azure OpenAI Service
Microsoft's enterprise deployment of OpenAI models through Azure infrastructure.
| Aspect | Details |
|---|---|
| Authentication | API key or Azure AD (Entra ID) token |
| Rate limiting | Per-deployment, configurable |
| Endpoints | \{resource-name\}.openai.azure.com |
| Notable features | Content filtering enabled by default, private networking support, managed identity |
| Security-relevant | Azure Content Safety filters are separate from OpenAI's moderation; can be configured independently |
Red team notes: Azure OpenAI has its own content filtering layer (Azure Content Safety) that runs in addition to OpenAI's built-in safety. This means bypass requires evading both layers. However, content filter configuration is in the Azure portal, and misconfigured deployments may have filters disabled.
Authentication Patterns and Vulnerabilities
API Key Authentication
The simplest and most common pattern. A secret key is included in every request.
Common vulnerabilities:
- Keys hardcoded in client-side JavaScript
- Keys committed to public Git repositories
- Keys stored in environment variables that leak through error pages or server-status endpoints
- No key rotation policy — compromised keys remain valid indefinitely
- Overly permissive keys that grant access to all models and endpoints
OAuth / Service Account Authentication
More secure but more complex. Used by Google Cloud, Azure AD, and enterprise deployments.
Common vulnerabilities:
- Overly broad service account permissions
- Token refresh endpoints that lack proper access controls
- Long-lived tokens stored insecurely
- Missing audience or scope validation in token verification
No Authentication
Some self-hosted deployments and development endpoints run without authentication.
Common vulnerabilities:
- Internal-only endpoints exposed through misconfigured load balancers or network policies
- Development deployments left running in production
- Model inference endpoints accessible from within a VPC but not properly segmented
API Proxying and Aggregation
Many applications do not call provider APIs directly. Instead, they use proxy layers or aggregation services that introduce additional attack surface.
Common Proxy Patterns
| Pattern | Description | Security Implication |
|---|---|---|
| Backend proxy | Application backend forwards requests to the AI provider | The proxy may strip or modify safety headers; logging may capture sensitive data |
| API gateway | A gateway (Kong, Apigee) sits in front of the AI API | Misconfigured gateways may allow header injection or bypass rate limiting |
| Aggregator service | Services like OpenRouter or LiteLLM provide a single API for multiple providers | The aggregator has access to all requests; it adds a trust boundary |
| Caching layer | Responses are cached to reduce cost and latency | Cache poisoning can serve adversarial responses to other users |
Proxy-Specific Vulnerabilities
- System prompt injection through headers: Some proxy implementations pass system prompts through custom headers that can be overridden by the client
- Key exposure through proxies: The proxy holds the provider API key; compromise of the proxy exposes the key
- Inconsistent security policies: The proxy may implement different rate limits or content filters than the underlying provider
- Request/response logging: Proxies may log full request and response bodies, creating a data leakage vector
API Design Differences That Affect Attacks
Subtle differences in how providers implement their APIs create provider-specific attack opportunities:
| Feature | Variation | Security Impact |
|---|---|---|
| System prompt | Separate parameter vs. first message | Affects prompt injection strategies |
| Token counting | Different tokenizers produce different counts | Affects context-window-based attacks |
| Streaming | SSE vs. WebSocket vs. chunked transfer | Streaming may bypass post-processing filters |
| Error messages | Verbose vs. generic | Verbose errors leak model and configuration information |
| Function calling | Different schemas and validation | Schema validation gaps enable injection through tool arguments |
| Moderation | Separate endpoint vs. inline | Separate endpoints can be tested independently |
Common API Security Misconfigurations
Based on real-world red team findings, these are the most frequently observed API security issues in AI applications:
- Exposed API keys in client-side code — the application's frontend contains the AI provider API key, allowing direct access
- Missing rate limiting on the application's own API — even if the provider rate limits, the application's endpoint may not
- No input length validation — accepting arbitrarily long inputs enables context-window attacks and cost exploitation
- Verbose error messages — error responses that reveal the model name, system prompt fragments, or infrastructure details
- Inconsistent endpoint security — the main chat endpoint is secured but alternative endpoints (embedding, moderation, completion) are not
- Missing output validation — model responses are returned directly to the user without sanitization
- Shared API keys — a single API key used across all environments (dev, staging, production) and all features
- No audit logging — no record of what queries were sent to the model, making incident response impossible
Related Topics
- The AI Landscape — broader context of providers and platforms
- Deployment Patterns — how deployment affects API security
- AI System Architecture — deeper dive into API architecture
- Guardrails Architecture — how guardrails integrate with APIs
References
- "OpenAI API Reference" - OpenAI (2025) - Complete documentation of the OpenAI API including authentication, endpoints, and parameters
- "Anthropic API Documentation" - Anthropic (2025) - Documentation of the Claude API including system prompt handling and rate limiting
- "OWASP API Security Top 10" - OWASP (2023) - Standard classification of API security risks applicable to AI API implementations
- "Exposed ML Model Servers on the Internet" - ProtectAI (2025) - Research on publicly accessible AI inference endpoints discovered through internet scanning
What is the most common real-world API security vulnerability in AI applications?