AI Cost & Billing 攻擊s
Red team techniques for AI cost exploitation: model invocation abuse for billing inflation, token exhaustion attacks, GPU compute abuse, auto-scaling exploitation, and denial-of-wallet attacks across cloud providers.
AI Cost & Billing 攻擊
AI services are among the most expensive 雲端 resources. A single GPU instance can cost $30+/hour, foundation model API calls are billed per 符元, and provisioned throughput deployments cost thousands per day. This creates a unique attack vector: denial-of-wallet attacks that inflict financial damage without disrupting service availability.
For red teamers, cost attacks demonstrate business impact in dollar terms that executives 理解 immediately.
Model Invocation Abuse
Token-Based Cost Amplification
Foundation model APIs (Bedrock, Azure OpenAI, Vertex AI) bill per 符元. Attackers can maximize cost per request through several techniques:
| Technique | Description | Cost Amplification |
|---|---|---|
| Max 符元 requests | Set max_tokens to maximum, craft prompts that generate long responses | 10-100x vs. typical request |
| Expensive model targeting | Route to the most expensive model available | 2-30x vs. cheapest model |
| Streaming abuse | Use streaming endpoints to maintain long connections | Connection + 符元 costs |
| Multi-turn inflation | Create long conversation histories that are re-sent with each request | Quadratic 符元 growth |
| 系統提示詞 bloat | If 系統提示詞 is configurable, make it extremely long | System 符元 billed on every request |
# Cost amplification through max 符元 generation
import boto3, json
client = boto3.client('bedrock-runtime')
# Craft prompt that generates maximum 輸出
# Use instructions that produce verbose, repetitive 輸出
response = client.invoke_model(
modelId='anthropic.claude-3-opus-20240229-v1:0', # Most expensive model
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 4096, # Maximum 輸出
"messages": [{
"role": "user",
"content": "Write an extremely detailed, step-by-step guide "
"covering every possible aspect of [TOPIC]. Include "
"extensive examples, edge cases, and explanations. "
"Do not summarize or abbreviate any section."
}]
})
)Cost Calculation
理解 pricing enables accurate impact 評估:
| Provider/Model | 輸入 Cost (per 1M 符元) | 輸出 Cost (per 1M 符元) | Max 輸出 Tokens |
|---|---|---|---|
| AWS Bedrock Claude 3 Opus | ~$15 | ~$75 | 4,096 |
| Azure OpenAI GPT-4o | ~$5 | ~$15 | 16,384 |
| GCP Vertex AI Gemini 1.5 Pro | ~$3.50 | ~$10.50 | 8,192 |
A single maximally expensive Bedrock request (4K 輸入 + 4K 輸出 符元) costs roughly $0.36. At 100 requests/second sustained for 1 hour, that is approximately $129,600.
Token Exhaustion 攻擊
Quota Exhaustion
雲端 AI services have quotas (符元 per minute, requests per minute). Exhausting these quotas denies service to legitimate users:
# AWS Bedrock: Check current quotas
aws service-quotas get-service-quota \
--service-code bedrock \
--quota-code <quota-code>
# Azure OpenAI: Check deployment capacity
az cognitiveservices account deployment list \
--name <account> --resource-group <rg> \
--query "[].{name:name,capacity:properties.sku.capacity}"攻擊 Patterns
Sustained high-volume invocation
Send requests at the maximum allowed rate continuously. Each request uses maximum 符元. This exhausts both the per-minute 符元 quota and generates substantial costs.
Burst attacks
Send large bursts of requests to trigger rate limiting for legitimate users while the burst requests consume quota. Even if 攻擊者's requests are eventually throttled, the quota consumed during the burst window is not recovered.
Multi-model spreading
If the target has access to multiple models, distribute requests across all models to maximize total 符元 consumption across quotas that are tracked per-model.
Context window stuffing
Send requests with maximum-length 輸入 contexts. This consumes 輸入 符元 (which are billed) and forces 模型 to process more data per request, reducing effective throughput for other users.
Provisioned Throughput Abuse
Provisioned Throughput (AWS Bedrock PT, Azure OpenAI PTU) is dedicated capacity that costs money whether or not it is used:
| 攻擊 | Description | Impact |
|---|---|---|
| Provision and abandon | If IAM allows, create PT/PTU deployments and leave them running | Continuous cost with no usage |
| Capacity exhaustion | Saturate PT/PTU capacity with 攻擊者 requests | Legitimate users denied access to reserved capacity |
| Scaling trigger | Force provisioning of additional capacity through sustained high load | Increased capacity costs |
# AWS: Check for ability to create provisioned throughput
aws bedrock create-provisioned-model-throughput \
--model-units 1 \
--provisioned-model-name cost-attack-測試 \
--model-id anthropic.claude-3-sonnet-20240229-v1:0 \
2>&1 # Check for 權限GPU Compute Abuse
Training Job Abuse
ML 訓練 jobs consume expensive GPU instances. 攻擊者 with 訓練 job creation 權限 can launch compute for unauthorized purposes:
# AWS SageMaker: Create a 訓練 job with expensive GPU instances
aws sagemaker create-訓練-job \
--訓練-job-name gpu-abuse-測試 \
--algorithm-specification \
TrainingImage=<container>,TrainingInputMode=File \
--resource-config \
InstanceType=ml.p4d.24xlarge,InstanceCount=4,VolumeSizeInGB=500 \
--role-arn <role-arn> \
--stopping-condition MaxRuntimeInSeconds=86400 \
--輸出-data-config S3OutputPath=s3://bucket/輸出A single ml.p4d.24xlarge instance costs approximately $37/hour. Four instances for 24 hours: $3,552. With spot pricing unavailable for attackers (they want reliable access), on-demand costs are even higher.
Compute Resource Costs
| Instance Type | Provider | GPU | Approximate Cost/Hour |
|---|---|---|---|
ml.p4d.24xlarge | AWS | 8x A100 | ~$37 |
Standard_NC96ads_A100_v4 | Azure | 4x A100 | ~$32 |
a2-ultragpu-8g | GCP | 8x A100 | ~$40 |
ml.p5.48xlarge | AWS | 8x H100 | ~$98 |
Cryptocurrency Mining
GPU instances provisioned through ML services are attractive for cryptocurrency mining:
- Training job containers have full GPU access
- Custom container images can include mining software
- Training jobs can run for hours or days before 偵測
- GPU instance costs are borne by the target organization
Auto-Scaling 利用
Scaling Trigger 攻擊
AI serving endpoints use auto-scaling to handle variable load. Attackers can 利用 scaling configurations:
| Scaling Config | 攻擊 | Result |
|---|---|---|
| CPU-based scaling | Send compute-intensive requests | Triggers scale-up based on CPU utilization |
| Request-count scaling | Flood with requests | Triggers scale-up based on request rate |
| Custom metric scaling | 利用 metric collection gaps | Trigger scaling through metric manipulation |
| No max limit | Sustained high load | Unbounded scaling, unbounded cost |
Scaling 攻擊 Chain
識別 scaling configuration
Determine the endpoint's auto-scaling policy: what metric triggers scaling, what the cool-down period is, and whether 存在 a maximum instance count.
# AWS SageMaker: Check auto-scaling aws application-autoscaling describe-scaling-policies \ --service-namespace sagemaker \ --resource-id endpoint/<endpoint-name>/variant/<variant-name> # GCP Vertex AI: Check endpoint scaling gcloud ai endpoints describe <endpoint-id> --region=us-central1 \ --format="json(deployedModels[].automaticResources)"Calculate cost impact
Determine the cost per instance and the maximum scale. If maximum is not set, the theoretical maximum is limited only by account quotas.
Trigger scaling
Send sustained traffic at a rate that triggers scale-up. Use the most expensive request pattern (max 符元, expensive model) to maximize both per-request cost and infrastructure cost.
Maintain pressure
Keep traffic sustained long enough for auto-scaling to provision maximum instances. Scale-up is typically faster than scale-down, so even a brief attack can leave expensive instances running during the cool-down period.
Cool-Down 利用
Auto-scaling policies include cool-down periods to prevent thrashing. Attackers can 利用 this:
- Scale-up, then stop: Trigger maximum scaling, then stop traffic. Instances remain running during the scale-down cool-down period (typically 5-15 minutes).
- Pulse attacks: Send bursts of traffic followed by quiet periods timed to prevent scale-down, keeping instances at maximum count while minimizing 攻擊者 request volume.
- Scale-up cascade: If auto-scaling provisions instances that themselves trigger further scaling (e.g., a new instance starts a warm-up process that consumes resources), cascade scaling exceeds intended limits.
Denial-of-Wallet 攻擊 Methodology
Planning
- Enumerate AI services and pricing: 識別 the most expensive services in use
- Determine access level: What API actions can 攻擊者 perform?
- Map scaling configurations: How does the target handle load increases?
- 識別 billing controls: Are there budget alerts, spending limits, or approval workflows?
- Estimate impact: Calculate the maximum financial damage achievable
Execution Considerations
偵測 and Billing Lag
Most 雲端 billing has a lag between resource consumption and billing visibility:
| Provider | Billing Lag | Cost Explorer Delay |
|---|---|---|
| AWS | Up to 24 hours | Same-day estimates, final bills delayed |
| Azure | Up to 24 hours | Cost Management may lag 24-48 hours |
| GCP | Up to 24 hours | Billing export may lag several hours |
This lag means that cost attacks can accumulate significant charges before any billing alert triggers. Red teams should factor this into engagement planning to avoid unintended financial impact.
Impact Communication
Cost attack findings should be communicated in business terms:
- Hourly exposure: "攻擊者 with this access can generate $X,XXX/hour in AI costs"
- Scaling factor: "Auto-scaling without maximum limits amplifies this to $XX,XXX/hour"
- 偵測 gap: "Current billing alerts would not trigger until $X has been spent, with an additional $X in the billing pipeline"
- Annual risk: "If exploited persistently, annual cost exposure is $X,XXX,XXX"
相關主題
- AWS AI Services -- AWS-specific cost attack surfaces
- Azure AI Services -- Azure PTU and cost considerations
- GCP AI Services -- GCP cost controls and quota management
- SageMaker 利用 -- GPU compute abuse through SageMaker
攻擊者 has bedrock:InvokeModel 權限 scoped to all models. To maximize the financial impact of a cost attack, which approach is most effective?
Why is billing lag a critical factor in AI cost attack planning for 紅隊 engagements?
參考文獻
- AWS Pricing Calculator -- Calculate AWS AI service costs
- Azure Pricing Calculator -- Calculate Azure AI service costs
- GCP Pricing Calculator -- Calculate GCP AI service costs
- OWASP LLM06: Excessive Agency -- Cost abuse as part of excessive agency risks