AI Workload Isolation
Isolation techniques for AI workloads using VMs, containers, and trusted execution environments (TEEs).
Overview
AI workloads present unique isolation challenges that standard cloud security patterns do not fully address. GPU passthrough breaks traditional container isolation boundaries, shared GPU memory enables cross-tenant data leakage, and model serving frameworks often require elevated privileges that weaken sandboxing. When multiple AI workloads share the same infrastructure — whether in multi-tenant cloud environments, shared training clusters, or edge deployments — the isolation boundary between workloads becomes a critical security concern.
This article examines the isolation techniques available for AI workloads at every layer of the stack: process isolation, container isolation, VM isolation, and hardware-level isolation through trusted execution environments (TEEs). For each technique, we analyze the security guarantees it provides, the specific ways those guarantees break down for GPU-accelerated AI workloads, and the practical mitigations available. The content aligns with NIST AI RMF governance requirements for AI system isolation and MITRE ATLAS infrastructure-level attack techniques.
The Isolation Challenge for AI Workloads
Why AI Workloads Are Different
Standard web applications run on CPUs, use well-understood syscall interfaces, and interact with isolated memory spaces. AI workloads differ in several important ways:
| Property | Standard Workload | AI Workload |
|---|---|---|
| Compute device | CPU only | CPU + GPU/TPU |
| Memory model | User-space virtual memory | User-space + GPU VRAM (shared) |
| Device access | Minimal (/dev/null, etc.) | /dev/nvidia*, DMA access |
| Privilege requirements | Unprivileged user | Often requires --privileged or device plugins |
| Data sensitivity | Application data | Model weights, training data, inference inputs |
| Network requirements | Standard TCP/IP | NCCL, RDMA, InfiniBand for distributed training |
The GPU is the primary isolation-breaking component. NVIDIA GPUs accessed via CUDA do not provide the same memory isolation guarantees as CPU virtual memory. When multiple containers share a GPU via time-slicing or MPS (Multi-Process Service), one process can potentially observe residual data left in GPU memory by another process.
Attack Surface Map
┌─────────────────────────────────────────────────┐
│ Application Layer │
│ Model serving framework (vLLM, Triton, TGI) │
├─────────────────────────────────────────────────┤
│ Container Runtime │
│ Docker / containerd / CRI-O │
│ ┌──────────────────────────────────────────┐ │
│ │ GPU Device Plugin / NVIDIA Container │ │
│ │ Toolkit (nvidia-container-toolkit) │ │
│ └──────────────────────────────────────────┘ │
├─────────────────────────────────────────────────┤
│ Host OS / Hypervisor │
│ NVIDIA kernel driver (nvidia.ko) │
├─────────────────────────────────────────────────┤
│ Hardware │
│ GPU VRAM, PCIe bus, NVLink, DMA engines │
└─────────────────────────────────────────────────┘
Container Isolation for AI Workloads
Standard Container Isolation
Linux containers use namespaces (PID, network, mount, user), cgroups (resource limits), and seccomp profiles (syscall filtering) to isolate workloads. For CPU-only workloads, these mechanisms provide strong isolation when properly configured.
import subprocess
import json
from typing import Dict, List
class ContainerSecurityAuditor:
"""Audit container security configuration for AI workloads."""
def __init__(self, container_id: str):
self.container_id = container_id
def inspect_container(self) -> Dict:
"""Retrieve container configuration."""
result = subprocess.run(
["docker", "inspect", self.container_id],
capture_output=True, text=True, check=True,
)
return json.loads(result.stdout)[0]
def check_gpu_isolation(self) -> List[Dict]:
"""Check for GPU isolation weaknesses in the container configuration."""
config = self.inspect_container()
findings = []
# Check for privileged mode
if config.get("HostConfig", {}).get("Privileged", False):
findings.append({
"severity": "critical",
"finding": "Container running in privileged mode",
"detail": "Privileged mode disables all container isolation. "
"GPU workloads should use --gpus flag instead.",
"remediation": "Use 'docker run --gpus device=0' instead of '--privileged'",
})
# Check for device mappings
devices = config.get("HostConfig", {}).get("Devices", []) or []
nvidia_devices = [d for d in devices if "nvidia" in str(d)]
if nvidia_devices:
findings.append({
"severity": "medium",
"finding": f"Direct NVIDIA device access: {len(nvidia_devices)} devices",
"detail": "Direct device mapping bypasses NVIDIA container toolkit isolation",
"remediation": "Use NVIDIA container toolkit with --gpus flag",
})
# Check for host PID namespace
if config.get("HostConfig", {}).get("PidMode") == "host":
findings.append({
"severity": "high",
"finding": "Container shares host PID namespace",
"detail": "Host PID namespace allows observing GPU processes from other containers",
"remediation": "Remove --pid=host flag",
})
# Check seccomp profile
seccomp = config.get("HostConfig", {}).get("SecurityOpt", [])
if any("seccomp=unconfined" in str(s) for s in seccomp):
findings.append({
"severity": "high",
"finding": "Seccomp profile disabled",
"detail": "Unconfined seccomp allows all syscalls including ioctl for GPU manipulation",
"remediation": "Apply a custom seccomp profile that allows required GPU ioctls only",
})
# Check for SYS_ADMIN capability
caps = config.get("HostConfig", {}).get("CapAdd", []) or []
if "SYS_ADMIN" in caps:
findings.append({
"severity": "high",
"finding": "SYS_ADMIN capability granted",
"detail": "SYS_ADMIN enables mount namespace escapes and device access",
"remediation": "Remove SYS_ADMIN and use specific capabilities instead",
})
return findingsGPU-Specific Container Escape Vectors
The NVIDIA container toolkit (nvidia-container-toolkit, formerly nvidia-docker2) provides a runtime hook that maps GPU devices into containers. While this is safer than using --privileged, it still creates attack surface:
-
NVIDIA driver ioctls: The container communicates with the host GPU driver via ioctl syscalls on
/dev/nvidia*devices. Vulnerabilities in the NVIDIA kernel driver can be exploited from within a container to achieve host code execution. CVE-2024-0132 (NVIDIA Container Toolkit TOCTOU vulnerability) demonstrated how a crafted container image could escape the container by exploiting a time-of-check-time-of-use race condition in the toolkit itself. -
GPU memory residuals: When a container allocates and frees GPU memory, the data may remain in VRAM until overwritten. A subsequent container using the same GPU can scan for residual data.
-
PCIe DMA access: GPUs use DMA to transfer data between host memory and GPU memory. In some configurations, a compromised GPU workload could use DMA to read or write arbitrary host memory.
#!/bin/bash
# Audit NVIDIA container toolkit configuration
# Run on the host system
echo "=== NVIDIA Container Toolkit Security Audit ==="
# Check toolkit version
echo "[*] NVIDIA Container Toolkit version:"
nvidia-container-cli --version 2>/dev/null || echo " Not found"
# Check if CDI (Container Device Interface) is enabled
echo "[*] CDI configuration:"
if [ -f /etc/nvidia-container-runtime/config.toml ]; then
grep -i "cdi" /etc/nvidia-container-runtime/config.toml
else
echo " Config file not found"
fi
# Check GPU isolation mode
echo "[*] GPU isolation mode:"
nvidia-smi -q | grep -i "MIG\|isolation\|accounting" || echo " Could not query GPU"
# Check if MIG (Multi-Instance GPU) is available and enabled
echo "[*] MIG (Multi-Instance GPU) status:"
nvidia-smi mig -lgi 2>/dev/null || echo " MIG not available on this GPU"
# Check for IOMMU (required for DMA protection)
echo "[*] IOMMU status:"
if dmesg | grep -qi "IOMMU enabled"; then
echo " IOMMU is enabled (good)"
else
echo " WARNING: IOMMU may not be enabled"
fi
# List all NVIDIA device files
echo "[*] NVIDIA device files:"
ls -la /dev/nvidia* 2>/dev/null || echo " No NVIDIA devices found"Multi-Instance GPU (MIG)
NVIDIA A100, A30, and H100 GPUs support Multi-Instance GPU (MIG) technology, which hardware-partitions a single GPU into up to seven isolated instances. Each MIG instance has its own dedicated memory, cache, and compute units. Unlike software-level GPU sharing (time-slicing or MPS), MIG provides hardware-enforced isolation.
import subprocess
import json
from typing import List, Dict
class MIGSecurityManager:
"""Manage and audit MIG instances for workload isolation."""
def list_mig_instances(self) -> List[Dict]:
"""List all MIG GPU instances and their configuration."""
result = subprocess.run(
["nvidia-smi", "mig", "-lgi"],
capture_output=True, text=True,
)
# Parse nvidia-smi output
instances = []
for line in result.stdout.strip().split("\n"):
if "MIG" in line or "GPU" in line:
instances.append({"raw": line.strip()})
return instances
def verify_memory_isolation(self, instance_id: int) -> Dict:
"""
Verify that a MIG instance cannot access memory from other instances.
Returns test results.
"""
# This test allocates memory on one MIG instance and verifies
# it cannot be read from another instance
test_script = f"""
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "MIG-{instance_id}"
# Allocate and fill a tensor with known pattern
t = torch.full((1024, 1024), 42.0, device="cuda")
print(f"Allocated on MIG instance {instance_id}: {{t.sum().item()}}")
# Free the tensor
del t
torch.cuda.empty_cache()
"""
result = subprocess.run(
["python3", "-c", test_script],
capture_output=True, text=True, timeout=30,
)
return {
"instance_id": instance_id,
"success": result.returncode == 0,
"output": result.stdout.strip(),
"error": result.stderr.strip() if result.returncode != 0 else None,
}MIG is the strongest GPU isolation available today for multi-tenant environments. However, it has limitations: not all GPU models support it, the fixed partition sizes may not match workload requirements, and it reduces the total compute available per workload. For red teams, the key question is whether MIG is actually enabled and configured correctly in the target environment, or whether the infrastructure relies on weaker isolation mechanisms.
VM-Level Isolation
GPU Passthrough to VMs
Virtual machines provide stronger isolation than containers because the hypervisor mediates all hardware access. For GPU workloads, GPUs can be assigned to VMs using PCIe passthrough (VFIO) or SR-IOV (Single Root I/O Virtualization).
With PCIe passthrough, an entire GPU is dedicated to a single VM. This provides strong isolation — the VM's GPU driver communicates directly with the hardware and cannot access other VMs' GPU memory. However, it limits GPU sharing: one GPU per VM.
SR-IOV virtualizes a single physical GPU into multiple virtual functions (VFs), each assigned to a different VM. NVIDIA supports this on selected data center GPUs. SR-IOV provides better isolation than container-level sharing because each VF has hardware-enforced memory boundaries, but it is less flexible than MIG and requires specific hardware support.
#!/bin/bash
# Configure SR-IOV for NVIDIA GPU (A100 example)
# Must be run on the bare-metal host with root privileges
# Enable SR-IOV on the GPU PCI device
GPU_PCI_ID="0000:3b:00.0" # Example — find with lspci | grep NVIDIA
echo "=== Configuring GPU SR-IOV ==="
# Check if SR-IOV is supported
echo "[*] Checking SR-IOV capability:"
lspci -vvv -s "$GPU_PCI_ID" | grep -i "single root"
# Enable virtual functions (e.g., 4 VFs)
echo 4 > "/sys/bus/pci/devices/$GPU_PCI_ID/sriov_numvfs"
# Verify VFs were created
echo "[*] Virtual functions:"
lspci | grep -i nvidia | grep "Virtual Function"
# Assign VFs to VMs using VFIO
echo "[*] Binding VF to VFIO driver for VM assignment:"
VF_PCI_ID="0000:3b:00.1" # First VF
echo "vfio-pci" > "/sys/bus/pci/devices/$VF_PCI_ID/driver_override"
echo "$VF_PCI_ID" > /sys/bus/pci/drivers/vfio-pci/bindConfidential VMs
Cloud providers now offer confidential VMs that use hardware-level encryption to protect VM memory from the hypervisor and host OS. AMD SEV-SNP (Secure Nested Paging) and Intel TDX (Trust Domain Extensions) encrypt VM memory with keys that the hypervisor cannot access.
For AI workloads, confidential VMs protect:
- Model weights loaded into CPU memory
- Inference inputs and outputs during processing
- Training data during preprocessing
However, confidential VMs do not currently protect GPU memory. Data transferred to the GPU for computation leaves the CPU's encrypted memory space and enters unencrypted GPU VRAM. This is a critical gap that limits the protection confidential VMs provide for AI workloads.
Trusted Execution Environments for AI
CPU-Based TEEs
Intel SGX (Software Guard Extensions) and ARM TrustZone provide enclave-based isolation where code and data are encrypted in memory and protected from the OS and hypervisor. For AI workloads, TEEs can protect:
from typing import Dict, Any
import hashlib
class TEEAttestationVerifier:
"""
Verify TEE attestation for AI workloads.
Ensures the inference environment is genuine and unmodified.
"""
def __init__(self, expected_measurements: Dict[str, str]):
"""
Args:
expected_measurements: Expected hash values for TEE measurements.
Keys are measurement names (e.g., 'mrenclave', 'mrsigner'),
values are expected hex-encoded hash values.
"""
self.expected = expected_measurements
def verify_sgx_attestation(self, attestation_report: Dict[str, Any]) -> Dict:
"""Verify an Intel SGX attestation report."""
results = {"verified": True, "checks": []}
# Verify MRENCLAVE (hash of the enclave code and initial data)
mrenclave = attestation_report.get("mrenclave", "")
if mrenclave != self.expected.get("mrenclave", ""):
results["verified"] = False
results["checks"].append({
"check": "mrenclave",
"status": "FAIL",
"detail": "Enclave measurement does not match expected value. "
"The inference code may have been modified.",
})
else:
results["checks"].append({
"check": "mrenclave",
"status": "PASS",
})
# Verify MRSIGNER (hash of the enclave signing key)
mrsigner = attestation_report.get("mrsigner", "")
if mrsigner != self.expected.get("mrsigner", ""):
results["verified"] = False
results["checks"].append({
"check": "mrsigner",
"status": "FAIL",
"detail": "Signer identity does not match. "
"The enclave was not signed by the expected party.",
})
else:
results["checks"].append({
"check": "mrsigner",
"status": "PASS",
})
# Check for debug mode (should be disabled in production)
if attestation_report.get("attributes", {}).get("debug", False):
results["verified"] = False
results["checks"].append({
"check": "debug_mode",
"status": "FAIL",
"detail": "Enclave is running in debug mode. "
"Memory can be inspected by the host.",
})
return results
def verify_model_integrity_in_tee(
self, model_bytes: bytes, expected_hash: str
) -> bool:
"""Verify model integrity inside the TEE before loading."""
actual_hash = hashlib.sha256(model_bytes).hexdigest()
return actual_hash == expected_hashGPU TEEs and Confidential Computing
NVIDIA H100 GPUs introduced confidential computing support that extends TEE protection to GPU memory. With NVIDIA Confidential Computing, data is encrypted when transferred from CPU to GPU and is decrypted only inside the GPU's secure execution environment. This addresses the critical gap where confidential VMs could not protect GPU-resident data.
The architecture works as follows:
- The CPU TEE (AMD SEV-SNP or Intel TDX) establishes a secure session with the GPU
- A secure channel is established over PCIe using authenticated encryption
- Data transferred to the GPU is encrypted in transit and decrypted only inside the GPU's protected memory region
- GPU computation results are encrypted before being sent back to the CPU
This is a significant advancement for protecting model weights and inference data, but it has performance implications (5-15% overhead depending on workload) and requires specific hardware (H100 or later).
Red Team Assessment Methodology
Phase 1: Isolation Architecture Discovery
#!/bin/bash
# Discover the isolation architecture of the target AI infrastructure
echo "=== AI Workload Isolation Assessment ==="
# Determine if running in a container
echo "[*] Container detection:"
if [ -f /.dockerenv ]; then
echo " Running inside Docker container"
elif grep -q "containerd" /proc/1/cgroup 2>/dev/null; then
echo " Running inside containerd"
elif [ -f /run/secrets/kubernetes.io ]; then
echo " Running inside Kubernetes pod"
else
echo " Not detected as container"
fi
# Check GPU access method
echo "[*] GPU access method:"
if [ -e /dev/nvidia0 ]; then
echo " Direct GPU device access"
ls -la /dev/nvidia*
fi
# Check for MIG
echo "[*] MIG configuration:"
nvidia-smi mig -lgi 2>/dev/null || echo " MIG not available/enabled"
# Check capabilities
echo "[*] Container capabilities:"
cat /proc/self/status | grep -i cap
# Check mount namespace
echo "[*] Mount namespace:"
ls -la /proc/self/ns/mnt
# Check for shared GPU memory accessible from this context
echo "[*] GPU memory status:"
nvidia-smi --query-gpu=memory.used,memory.free --format=csv 2>/dev/nullPhase 2: Isolation Boundary Testing
Test whether the isolation boundaries actually prevent cross-workload access:
- GPU memory scanning: Allocate GPU memory and check for residual data from previous workloads
- Container escape attempts: Test for known container escape vectors in the GPU runtime
- Network isolation: Verify that NCCL traffic for distributed training is isolated from other workloads
- Filesystem isolation: Check for shared model storage that could enable poisoning
Phase 3: Privilege Escalation
AI workloads frequently require elevated privileges. Test whether these privileges can be leveraged for lateral movement:
- Can the GPU device access be used to read host memory via DMA?
- Can the container's network namespace access the training cluster's control plane?
- Can the model storage mount point be used to modify other workloads' models?
Defensive Recommendations
- Use MIG for multi-tenant GPU sharing instead of time-slicing or MPS
- Never use
--privilegedfor AI containers; use the NVIDIA container toolkit with--gpusinstead - Enable IOMMU to prevent DMA-based attacks from GPU devices
- Apply seccomp profiles that whitelist only required GPU ioctls
- Use confidential computing (NVIDIA H100 CC) for sensitive inference workloads
- Verify attestation before sending model weights or sensitive data to remote TEEs
- Scrub GPU memory between workloads by deallocating and reallocating with zeroed memory
- Isolate NCCL traffic on dedicated network interfaces with encryption enabled
References
- CVE-2024-0132 — NVIDIA Container Toolkit TOCTOU vulnerability enabling container escape
- NVIDIA Confidential Computing — https://developer.nvidia.com/confidential-computing
- NVIDIA Multi-Instance GPU (MIG) — https://docs.nvidia.com/datacenter/tesla/mig-user-guide/
- AMD SEV-SNP — https://www.amd.com/en/developer/sev.html
- Intel TDX — https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html
- NIST AI RMF — Govern 1.7 (AI system isolation requirements)
- MITRE ATLAS — AML.T0040 (ML-Enabled Product/Service), infrastructure attack techniques