AI Deployment Patterns and 安全 Implications
How API-based, self-hosted, edge, and hybrid deployment patterns each create distinct security considerations and attack surfaces for AI systems.
Deployment Determines 攻擊 Surface
The same AI model deployed in different ways presents fundamentally different 安全 profiles. A model accessible only through a rate-limited API with server-side 護欄 is a very different target than the same model running locally on a user's laptop with no external controls. 理解 deployment patterns is essential for scoping 紅隊 engagements and prioritizing attacks.
Pattern 1: API-Based Deployment
模型 runs on the provider's infrastructure and is accessed through a REST API. 這是 the most common pattern for commercial AI services.
Architecture
┌──────────┐ HTTPS ┌──────────────────────────┐
│ Client │ ──────────────→ │ API Gateway │
│ App │ │ ├─ Authentication │
│ │ ←────────────── │ ├─ Rate Limiting │
└──────────┘ │ ├─ 輸入 護欄 │
│ ├─ Model Inference │
│ ├─ 輸出 護欄 │
│ └─ Logging/監控 │
└──────────────────────────┘
安全 Characteristics
| Aspect | 評估 |
|---|---|
| Authentication | API keys, OAuth 符元 — provider controls access |
| Rate limiting | Server-side enforcement — provider can throttle abuse |
| 護欄 | Server-side — consistent enforcement across all clients |
| 監控 | Full visibility into all requests and responses |
| Model access | Weights are not exposed — extraction requires many queries |
| Update control | Provider can patch 漏洞 centrally |
攻擊 Surface
- API key theft or leakage: Keys embedded in client-side code, version control, or logs
- Rate limit bypass: Distributed requests, key rotation, endpoint multiplexing
- Authentication flaws: Insufficient key scoping, missing key rotation, overly permissive CORS
- 輸入/輸出 interception: Man-in-the-middle attacks if TLS is not properly implemented
- Prompt injection: Remains possible through the API regardless of server-side controls
- Side-channel information: Response timing, 符元 counts, and error messages can reveal information about 護欄 and model behavior
Pattern 2: Self-Hosted Deployment
The organization runs 模型 on its own infrastructure (on-premises servers, private 雲端, or dedicated 雲端 instances).
Architecture
┌──────────┐ ┌──────────────────────────┐
│ Client │ ──────────────→ │ Organization's Infra │
│ App │ │ ├─ Load Balancer │
│ │ ←────────────── │ ├─ API Layer │
└──────────┘ │ ├─ Custom 護欄 │
│ ├─ Inference Engine │
│ │ (vLLM, TGI, etc.) │
│ ├─ Model Weights │
│ └─ Custom 監控 │
└──────────────────────────┘
安全 Characteristics
| Aspect | 評估 |
|---|---|
| Authentication | Organization-defined — quality varies widely |
| Rate limiting | Must be implemented by the organization |
| 護欄 | Must be built or integrated by the organization |
| 監控 | Must be configured by the organization |
| Model access | Weights are on the organization's infrastructure — insider threat risk |
| Update control | Organization controls update pace — may lag behind patches |
攻擊 Surface
Self-hosted deployments inherit all the 攻擊面 of API-based deployments plus:
- Infrastructure 漏洞: Unpatched servers, misconfigured networking, exposed management interfaces
- Model weight theft: Physical or network access to model files enables complete model extraction
- Custom code 漏洞: Bespoke 護欄 and API layers may have 安全 flaws that mature provider implementations do not
- Inference engine 漏洞: vLLM, text-generation-推論, and other serving frameworks may have their own 安全 issues
- Dependency chain: Custom deployments depend on many libraries (transformers, torch, CUDA), each a potential attack vector
- Configuration drift: Without centralized management, 安全 configurations may degrade over time
Advantages for Defenders
Self-hosted deployments also offer 安全 advantages:
- Full control over data — no data leaves the organization's infrastructure
- Ability to 實作 domain-specific 護欄 that providers do not offer
- No dependency on a third-party provider's 安全 posture
- Ability to run models in air-gapped environments for highly sensitive workloads
Pattern 3: Edge Deployment
模型 runs on end-user devices — smartphones, laptops, IoT devices, or embedded systems. This pattern is growing rapidly with the release of efficient small models.
Architecture
┌─────────────────────────────────┐
│ End-User Device │
│ ├─ Application │
│ ├─ Local Inference Runtime │
│ │ (ONNX, Core ML, llama.cpp)│
│ ├─ Model Weights (quantized) │
│ └─ (Optional) Client-side │
│ 護欄 │
└─────────────────────────────────┘
│ (optional)
▼
┌──────────────────────┐
│ Backend Services │
│ (analytics, updates)│
└──────────────────────┘
安全 Characteristics
| Aspect | 評估 |
|---|---|
| Authentication | N/A — 使用者 controls the device |
| Rate limiting | Not applicable — local execution has no external rate limit |
| 護欄 | Client-side only — can be bypassed by the device owner |
| 監控 | Limited to what the app reports back — easily circumvented |
| Model access | Weights are on the device — can be extracted with device access |
| Update control | Dependent on user accepting updates |
攻擊 Surface
Edge deployment fundamentally changes the 安全 model 因為 使用者 is the potential 攻擊者 and they control the execution environment:
- Model weight extraction: Weights are stored on the device and can be extracted, reverse-engineered, or modified
- 護欄 removal: Any client-side 安全 measures can be disabled by modifying the application
- Model modification: Weights can be altered to remove 安全 微調 or inject backdoors
- Unrestricted 推論: No rate limits, no 監控, no usage restrictions once 模型 is on the device
- Derivative model creation: Extracted weights can be used to create fine-tuned variants without 安全 constraints
Pattern 4: Hybrid Deployment
Hybrid architectures combine multiple patterns, typically using edge deployment for simple tasks and 雲端 deployment for complex ones.
Architecture
┌─────────────────────┐ ┌─────────────────────┐
│ End-User Device │ │ 雲端 Backend │
│ ├─ Small Local │ ─────→ │ ├─ Full Model │
│ │ Model │ Complex │ ├─ 護欄 │
│ ├─ Simple Tasks │ queries │ ├─ 監控 │
│ └─ Routing Logic │ ←───── │ └─ Analytics │
│ │ Results │ │
└─────────────────────┘ └─────────────────────┘
安全 Characteristics
Hybrid deployments combine the attack surfaces of both edge and 雲端 patterns, plus introduce additional risks:
- Routing manipulation: Tricking the routing logic into sending sensitive queries to the local (ungaurded) model instead of the 雲端 model
- Trust boundary confusion: 系統 must decide which model to trust when they disagree
- Data leakage through routing: The routing decision itself may reveal information about query sensitivity
- Inconsistent behavior: Users may discover that the local and 雲端 models respond differently to the same 輸入, revealing the existence of server-side 護欄
Deployment Pattern Comparison
| Factor | API-Based | Self-Hosted | Edge | Hybrid |
|---|---|---|---|---|
| 護欄 enforcement | Strong | Variable | Weak | Mixed |
| 監控 capability | Full | Full | Limited | Partial |
| Model weight protection | Strong | Moderate | None | Mixed |
| Data privacy | Low (data sent to provider) | High | High | Variable |
| 攻擊 sophistication required | Medium | Medium-High | Low | Medium |
| Update speed | Fast (provider-controlled) | Slow (org-controlled) | Slowest (user-controlled) | Variable |
| Regulatory compliance | Provider-dependent | Organization-controlled | Challenging | Complex |
Identifying the Deployment Pattern
During the reconnaissance phase of an engagement, several indicators reveal the deployment pattern:
| Indicator | Suggests |
|---|---|
Requests go to api.openai.com, api.anthropic.com, etc. | API-based (direct provider) |
| Requests go to the organization's domain with AI-related endpoints | Self-hosted or proxied API |
| Model responses have consistent latency regardless of model size | API-based (provider handles scaling) |
| Responses work offline | Edge deployment |
| Response quality varies based on connectivity | Hybrid deployment |
| API errors reference vLLM, TGI, or Triton | Self-hosted |
| Response headers include provider-specific identifiers | API-based |
相關主題
- The AI Landscape — the broader ecosystem context for deployment decisions
- The AI API Ecosystem — deep dive into API-based deployment patterns
- Open vs Closed Models — how model availability affects deployment options
- AI System Architecture — system-level view of deployment architectures
參考文獻
- "MLOps: Continuous Delivery for Machine Learning" - Google (2024) - Best practices for deploying and operating ML systems in production environments
- "On-Device AI: Challenges and Opportunities" - Apple ML Research (2024) - Technical overview of deploying AI models on edge devices with resource constraints
- "vLLM: Efficient Memory Management for 大型語言模型 Serving" - Kwon et al. (2023) - The 推論 engine underlying many self-hosted LLM deployments
- "Securing AI Model Deployment: A Practitioner's Guide" - OWASP (2025) - 安全 considerations 對每個 deployment pattern in AI applications
Why is edge deployment considered the weakest deployment pattern from a 安全 perspective?