AI Workload Isolation
Isolation techniques for AI workloads using VMs, containers, and trusted execution environments (TEEs).
概覽
AI workloads present unique isolation challenges that standard 雲端 安全 patterns do not fully address. GPU passthrough breaks traditional container isolation boundaries, shared GPU memory enables cross-tenant data leakage, and model serving frameworks often require elevated privileges that weaken sandboxing. When multiple AI workloads share the same infrastructure — whether in multi-tenant 雲端 environments, shared 訓練 clusters, or edge deployments — the isolation boundary between workloads becomes a critical 安全 concern.
This article examines the isolation techniques available for AI workloads at every layer of the stack: process isolation, container isolation, VM isolation, and hardware-level isolation through trusted execution environments (TEEs). 對每個 technique, we analyze the 安全 guarantees it provides, the specific ways those guarantees break down for GPU-accelerated AI workloads, and the practical mitigations available. The content aligns with NIST AI RMF governance requirements for AI system isolation and MITRE ATLAS infrastructure-level attack techniques.
The Isolation Challenge for AI Workloads
Why AI Workloads Are Different
Standard web applications run on CPUs, use well-understood syscall interfaces, and interact with isolated memory spaces. AI workloads differ in several important ways:
| Property | Standard Workload | AI Workload |
|---|---|---|
| Compute device | CPU only | CPU + GPU/TPU |
| Memory model | User-space virtual memory | User-space + GPU VRAM (shared) |
| Device access | Minimal (/dev/null, etc.) | /dev/nvidia*, DMA access |
| Privilege requirements | Unprivileged user | Often requires --privileged or device plugins |
| Data sensitivity | Application data | Model weights, 訓練資料, 推論 inputs |
| Network requirements | Standard TCP/IP | NCCL, RDMA, InfiniBand for distributed 訓練 |
The GPU is the primary isolation-breaking component. NVIDIA GPUs accessed via CUDA do not provide the same memory isolation guarantees as CPU virtual memory. When multiple containers share a GPU via time-slicing or MPS (Multi-Process Service), one process can potentially observe residual data left in GPU memory by another process.
攻擊 Surface Map
┌─────────────────────────────────────────────────┐
│ Application Layer │
│ Model serving framework (vLLM, Triton, TGI) │
├─────────────────────────────────────────────────┤
│ Container Runtime │
│ Docker / containerd / CRI-O │
│ ┌──────────────────────────────────────────┐ │
│ │ GPU Device Plugin / NVIDIA Container │ │
│ │ Toolkit (nvidia-container-toolkit) │ │
│ └──────────────────────────────────────────┘ │
├─────────────────────────────────────────────────┤
│ Host OS / Hypervisor │
│ NVIDIA kernel driver (nvidia.ko) │
├─────────────────────────────────────────────────┤
│ Hardware │
│ GPU VRAM, PCIe bus, NVLink, DMA engines │
└─────────────────────────────────────────────────┘
Container Isolation for AI Workloads
Standard Container Isolation
Linux containers use namespaces (PID, network, mount, user), cgroups (resource limits), and seccomp profiles (syscall filtering) to isolate workloads. For CPU-only workloads, these mechanisms provide strong isolation when properly configured.
import subprocess
import json
from typing import Dict, List
class ContainerSecurityAuditor:
"""Audit container 安全 configuration for AI workloads."""
def __init__(self, container_id: str):
self.container_id = container_id
def inspect_container(self) -> Dict:
"""Retrieve container configuration."""
result = subprocess.run(
["docker", "inspect", self.container_id],
capture_output=True, text=True, check=True,
)
return json.loads(result.stdout)[0]
def check_gpu_isolation(self) -> List[Dict]:
"""Check for GPU isolation weaknesses in the container configuration."""
config = self.inspect_container()
findings = []
# Check for privileged mode
if config.get("HostConfig", {}).get("Privileged", False):
findings.append({
"severity": "critical",
"finding": "Container running in privileged mode",
"detail": "Privileged mode disables all container isolation. "
"GPU workloads should use --gpus flag instead.",
"remediation": "Use 'docker run --gpus device=0' instead of '--privileged'",
})
# Check for device mappings
devices = config.get("HostConfig", {}).get("Devices", []) or []
nvidia_devices = [d for d in devices if "nvidia" in str(d)]
if nvidia_devices:
findings.append({
"severity": "medium",
"finding": f"Direct NVIDIA device access: {len(nvidia_devices)} devices",
"detail": "Direct device mapping bypasses NVIDIA container toolkit isolation",
"remediation": "Use NVIDIA container toolkit with --gpus flag",
})
# Check for host PID namespace
if config.get("HostConfig", {}).get("PidMode") == "host":
findings.append({
"severity": "high",
"finding": "Container shares host PID namespace",
"detail": "Host PID namespace allows observing GPU processes from other containers",
"remediation": "Remove --pid=host flag",
})
# Check seccomp profile
seccomp = config.get("HostConfig", {}).get("SecurityOpt", [])
if any("seccomp=unconfined" in str(s) for s in seccomp):
findings.append({
"severity": "high",
"finding": "Seccomp profile disabled",
"detail": "Unconfined seccomp allows all syscalls including ioctl for GPU manipulation",
"remediation": "Apply a custom seccomp profile that allows required GPU ioctls only",
})
# Check for SYS_ADMIN capability
caps = config.get("HostConfig", {}).get("CapAdd", []) or []
if "SYS_ADMIN" in caps:
findings.append({
"severity": "high",
"finding": "SYS_ADMIN capability granted",
"detail": "SYS_ADMIN enables mount namespace escapes and device access",
"remediation": "Remove SYS_ADMIN and use specific capabilities instead",
})
return findingsGPU-Specific Container Escape Vectors
The NVIDIA container toolkit (nvidia-container-toolkit, formerly nvidia-docker2) provides a runtime hook that maps GPU devices into containers. While 這是 safer than using --privileged, it still creates 攻擊面:
-
NVIDIA driver ioctls: The container communicates with the host GPU driver via ioctl syscalls on
/dev/nvidia*devices. 漏洞 in the NVIDIA kernel driver can be exploited from within a container to achieve host code execution. CVE-2024-0132 (NVIDIA Container Toolkit TOCTOU 漏洞) demonstrated how a crafted container image could escape the container by exploiting a time-of-check-time-of-use race condition in the toolkit itself. -
GPU memory residuals: When a container allocates and frees GPU memory, the data may remain in VRAM until overwritten. A subsequent container using the same GPU can scan for residual data.
-
PCIe DMA access: GPUs use DMA to transfer data between host memory and GPU memory. In some configurations, a compromised GPU workload could use DMA to read or write arbitrary host memory.
#!/bin/bash
# Audit NVIDIA container toolkit configuration
# Run on the host system
echo "=== NVIDIA Container Toolkit 安全 Audit ==="
# Check toolkit version
echo "[*] NVIDIA Container Toolkit version:"
nvidia-container-cli --version 2>/dev/null || echo " Not found"
# Check if CDI (Container Device Interface) is enabled
echo "[*] CDI configuration:"
if [ -f /etc/nvidia-container-runtime/config.toml ]; then
grep -i "cdi" /etc/nvidia-container-runtime/config.toml
else
echo " Config file not found"
fi
# Check GPU isolation mode
echo "[*] GPU isolation mode:"
nvidia-smi -q | grep -i "MIG\|isolation\|accounting" || echo " Could not query GPU"
# Check if MIG (Multi-Instance GPU) is available and enabled
echo "[*] MIG (Multi-Instance GPU) status:"
nvidia-smi mig -lgi 2>/dev/null || echo " MIG not available on this GPU"
# Check for IOMMU (required for DMA protection)
echo "[*] IOMMU status:"
if dmesg | grep -qi "IOMMU enabled"; then
echo " IOMMU is enabled (good)"
else
echo " WARNING: IOMMU may not be enabled"
fi
# List all NVIDIA device files
echo "[*] NVIDIA device files:"
ls -la /dev/nvidia* 2>/dev/null || echo " No NVIDIA devices found"Multi-Instance GPU (MIG)
NVIDIA A100, A30, and H100 GPUs support Multi-Instance GPU (MIG) technology, which hardware-partitions a single GPU into up to seven isolated instances. Each MIG instance has its own dedicated memory, cache, and compute units. Unlike software-level GPU sharing (time-slicing or MPS), MIG provides hardware-enforced isolation.
import subprocess
import json
from typing import List, Dict
class MIGSecurityManager:
"""Manage and audit MIG instances for workload isolation."""
def list_mig_instances(self) -> List[Dict]:
"""List all MIG GPU instances and their configuration."""
result = subprocess.run(
["nvidia-smi", "mig", "-lgi"],
capture_output=True, text=True,
)
# Parse nvidia-smi 輸出
instances = []
for line in result.stdout.strip().split("\n"):
if "MIG" in line or "GPU" in line:
instances.append({"raw": line.strip()})
return instances
def verify_memory_isolation(self, instance_id: int) -> Dict:
"""
Verify that a MIG instance cannot access memory from other instances.
Returns 測試 results.
"""
# This 測試 allocates memory on one MIG instance and verifies
# it cannot be read from another instance
test_script = f"""
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "MIG-{instance_id}"
# Allocate and fill a tensor with known pattern
t = torch.full((1024, 1024), 42.0, device="cuda")
print(f"Allocated on MIG instance {instance_id}: {{t.sum().item()}}")
# Free the tensor
del t
torch.cuda.empty_cache()
"""
result = subprocess.run(
["python3", "-c", test_script],
capture_output=True, text=True, timeout=30,
)
return {
"instance_id": instance_id,
"success": result.returncode == 0,
"輸出": result.stdout.strip(),
"error": result.stderr.strip() if result.returncode != 0 else None,
}MIG is the strongest GPU isolation available today for multi-tenant environments. 然而, it has limitations: not all GPU models support it, the fixed partition sizes may not match workload requirements, and it reduces the total compute available per workload. For red teams, the key question is whether MIG is actually enabled and configured correctly in the target environment, or whether the infrastructure relies on weaker isolation mechanisms.
VM-Level Isolation
GPU Passthrough to VMs
Virtual machines provide stronger isolation than containers 因為 the hypervisor mediates all hardware access. For GPU workloads, GPUs can be assigned to VMs using PCIe passthrough (VFIO) or SR-IOV (Single Root I/O Virtualization).
With PCIe passthrough, an entire GPU is dedicated to a single VM. This provides strong isolation — the VM's GPU driver communicates directly with the hardware and cannot access other VMs' GPU memory. 然而, it limits GPU sharing: one GPU per VM.
SR-IOV virtualizes a single physical GPU into multiple virtual functions (VFs), each assigned to a different VM. NVIDIA supports this on selected data center GPUs. SR-IOV provides better isolation than container-level sharing 因為 each VF has hardware-enforced memory boundaries, but it is less flexible than MIG and requires specific hardware support.
#!/bin/bash
# Configure SR-IOV for NVIDIA GPU (A100 example)
# Must be run on the bare-metal host with root privileges
# Enable SR-IOV on the GPU PCI device
GPU_PCI_ID="0000:3b:00.0" # 範例 — find with lspci | grep NVIDIA
echo "=== Configuring GPU SR-IOV ==="
# Check if SR-IOV is supported
echo "[*] Checking SR-IOV capability:"
lspci -vvv -s "$GPU_PCI_ID" | grep -i "single root"
# Enable virtual functions (e.g., 4 VFs)
echo 4 > "/sys/bus/pci/devices/$GPU_PCI_ID/sriov_numvfs"
# Verify VFs were created
echo "[*] Virtual functions:"
lspci | grep -i nvidia | grep "Virtual Function"
# Assign VFs to VMs using VFIO
echo "[*] Binding VF to VFIO driver for VM assignment:"
VF_PCI_ID="0000:3b:00.1" # First VF
echo "vfio-pci" > "/sys/bus/pci/devices/$VF_PCI_ID/driver_override"
echo "$VF_PCI_ID" > /sys/bus/pci/drivers/vfio-pci/bindConfidential VMs
雲端 providers now offer confidential VMs that use hardware-level encryption to protect VM memory from the hypervisor and host OS. AMD SEV-SNP (Secure Nested Paging) and Intel TDX (Trust Domain Extensions) encrypt VM memory with keys that the hypervisor cannot access.
For AI workloads, confidential VMs protect:
- Model weights loaded into CPU memory
- Inference inputs and outputs during processing
- 訓練資料 during preprocessing
然而, confidential VMs do not currently protect GPU memory. Data transferred to the GPU for computation leaves the CPU's encrypted memory space and enters unencrypted GPU VRAM. 這是 a critical gap that limits the protection confidential VMs provide for AI workloads.
Trusted Execution Environments for AI
CPU-Based TEEs
Intel SGX (Software Guard Extensions) and ARM TrustZone provide enclave-based isolation where code and data are encrypted in memory and protected from the OS and hypervisor. For AI workloads, TEEs can protect:
from typing import Dict, Any
import hashlib
class TEEAttestationVerifier:
"""
Verify TEE attestation for AI workloads.
Ensures the 推論 environment is genuine and unmodified.
"""
def __init__(self, expected_measurements: Dict[str, str]):
"""
Args:
expected_measurements: Expected hash values for TEE measurements.
Keys are measurement names (e.g., 'mrenclave', 'mrsigner'),
values are expected hex-encoded hash values.
"""
self.expected = expected_measurements
def verify_sgx_attestation(self, attestation_report: Dict[str, Any]) -> Dict:
"""Verify an Intel SGX attestation report."""
results = {"verified": True, "checks": []}
# Verify MRENCLAVE (hash of the enclave code and initial data)
mrenclave = attestation_report.get("mrenclave", "")
if mrenclave != self.expected.get("mrenclave", ""):
results["verified"] = False
results["checks"].append({
"check": "mrenclave",
"status": "FAIL",
"detail": "Enclave measurement does not match expected value. "
"The 推論 code may have been modified.",
})
else:
results["checks"].append({
"check": "mrenclave",
"status": "PASS",
})
# Verify MRSIGNER (hash of the enclave signing key)
mrsigner = attestation_report.get("mrsigner", "")
if mrsigner != self.expected.get("mrsigner", ""):
results["verified"] = False
results["checks"].append({
"check": "mrsigner",
"status": "FAIL",
"detail": "Signer identity does not match. "
"The enclave was not signed by the expected party.",
})
else:
results["checks"].append({
"check": "mrsigner",
"status": "PASS",
})
# Check for debug mode (should be disabled in production)
if attestation_report.get("attributes", {}).get("debug", False):
results["verified"] = False
results["checks"].append({
"check": "debug_mode",
"status": "FAIL",
"detail": "Enclave is running in debug mode. "
"Memory can be inspected by the host.",
})
return results
def verify_model_integrity_in_tee(
self, model_bytes: bytes, expected_hash: str
) -> bool:
"""Verify model integrity inside the TEE before loading."""
actual_hash = hashlib.sha256(model_bytes).hexdigest()
return actual_hash == expected_hashGPU TEEs and Confidential Computing
NVIDIA H100 GPUs introduced confidential computing support that extends TEE protection to GPU memory. With NVIDIA Confidential Computing, data is encrypted when transferred from CPU to GPU and is decrypted only inside the GPU's secure execution environment. This addresses the critical gap where confidential VMs could not protect GPU-resident data.
The architecture works as follows:
- The CPU TEE (AMD SEV-SNP or Intel TDX) establishes a secure session with the GPU
- A secure channel is established over PCIe using authenticated encryption
- Data transferred to the GPU is encrypted in transit and decrypted only inside the GPU's protected memory region
- GPU computation results are encrypted before being sent back to the CPU
這是 a significant advancement for protecting model weights and 推論 data, but it has performance implications (5-15% overhead depending on workload) and requires specific hardware (H100 or later).
紅隊 評估 Methodology
Phase 1: Isolation Architecture Discovery
#!/bin/bash
# Discover the isolation architecture of the target AI infrastructure
echo "=== AI Workload Isolation 評估 ==="
# Determine if running in a container
echo "[*] Container 偵測:"
if [ -f /.dockerenv ]; then
echo " Running inside Docker container"
elif grep -q "containerd" /proc/1/cgroup 2>/dev/null; then
echo " Running inside containerd"
elif [ -f /run/secrets/kubernetes.io ]; then
echo " Running inside Kubernetes pod"
else
echo " Not detected as container"
fi
# Check GPU access method
echo "[*] GPU access method:"
if [ -e /dev/nvidia0 ]; then
echo " Direct GPU device access"
ls -la /dev/nvidia*
fi
# Check for MIG
echo "[*] MIG configuration:"
nvidia-smi mig -lgi 2>/dev/null || echo " MIG not available/enabled"
# Check capabilities
echo "[*] Container capabilities:"
cat /proc/self/status | grep -i cap
# Check mount namespace
echo "[*] Mount namespace:"
ls -la /proc/self/ns/mnt
# Check for shared GPU memory accessible from this context
echo "[*] GPU memory status:"
nvidia-smi --query-gpu=memory.used,memory.free --format=csv 2>/dev/nullPhase 2: Isolation Boundary 測試
測試 whether the isolation boundaries actually prevent cross-workload access:
- GPU memory scanning: Allocate GPU memory and check for residual data from previous workloads
- Container escape attempts: 測試 for known container escape vectors in the GPU runtime
- Network isolation: Verify that NCCL traffic for distributed 訓練 is isolated from other workloads
- Filesystem isolation: Check for shared model storage that could enable 投毒
Phase 3: Privilege Escalation
AI workloads frequently require elevated privileges. 測試 whether these privileges can be leveraged for lateral movement:
- Can the GPU device access be used to read host memory via DMA?
- Can the container's network namespace access the 訓練 cluster's control plane?
- Can 模型 storage mount point be used to modify other workloads' models?
Defensive Recommendations
- Use MIG for multi-tenant GPU sharing instead of time-slicing or MPS
- Never use
--privilegedfor AI containers; use the NVIDIA container toolkit with--gpusinstead - Enable IOMMU to prevent DMA-based attacks from GPU devices
- Apply seccomp profiles that whitelist only required GPU ioctls
- Use confidential computing (NVIDIA H100 CC) for sensitive 推論 workloads
- Verify attestation before sending model weights or sensitive data to remote TEEs
- Scrub GPU memory between workloads by deallocating and reallocating with zeroed memory
- Isolate NCCL traffic on dedicated network interfaces with encryption enabled
參考文獻
- CVE-2024-0132 — NVIDIA Container Toolkit TOCTOU 漏洞 enabling container escape
- NVIDIA Confidential Computing — https://developer.nvidia.com/confidential-computing
- NVIDIA Multi-Instance GPU (MIG) — https://docs.nvidia.com/datacenter/tesla/mig-user-guide/
- AMD SEV-SNP — https://www.amd.com/en/developer/sev.html
- Intel TDX — https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html
- NIST AI RMF — Govern 1.7 (AI system isolation requirements)
- MITRE ATLAS — AML.T0040 (ML-Enabled Product/Service), infrastructure attack techniques