Aanvallen op GPU-rekenclusters

Expert13 min lezenBijgewerkt op 2026-03-15

Analyse op expertniveau van aanvallen op GPU-rekenclusters die worden gebruikt voor ML-training en -inferentie, waaronder side-channel-aanvallen op GPU-geheugen, exploitatie van de CUDA-runtime, falen van multi-tenant-isolatie en RDMA-netwerkaanvallen.

gpu cuda side-channel multi-tenant rdma cluster-attacks

GPU-rekenclusters vormen de ruggengraat van moderne ML-infrastructuur. Organisaties geven miljoenen uit aan NVIDIA DGX, AMD Instinct en cloud-GPU-instances voor het trainen en serveren van modellen. De beveiliging van deze clusters is een kritieke zorg, maar GPU-hardware en de bijbehorende softwarestack zijn primair ontworpen voor prestaties, niet voor isolatie. Dit creëert exploiteerbare hiaten die red teams kunnen benutten om toegang te krijgen tot de data van andere tenants, modelgewichten te extraheren en trainingsruns te verstoren.

GPU-geheugenarchitectuur en aanvalsoppervlak

NVIDIA GPU-geheugenhiërarchie

Het begrijpen van de GPU-geheugenhiërarchie is essentieel voor het identificeren van mogelijkheden voor datalekkage:

┌──────────────────────────────────────────────────┐
│                  GPU Device                       │
│                                                    │
│  ┌──────────────────────────────────────────────┐ │
│  │            Global Memory (HBM)                │ │
│  │  ┌─────────────┐  ┌─────────────────────────┐│ │
│  │  │ Model Weights│  │ Activations / KV Cache  ││ │
│  │  └─────────────┘  └─────────────────────────┘│ │
│  │  ┌─────────────┐  ┌─────────────────────────┐│ │
│  │  │ Gradients    │  │ Optimizer State          ││ │
│  │  └─────────────┘  └─────────────────────────┘│ │
│  └──────────────────────────────────────────────┘ │
│                                                    │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐  │
│  │    SM 0     │  │    SM 1     │  │    SM N     │  │
│  │ ┌────────┐  │  │ ┌────────┐  │  │ ┌────────┐  │  │
│  │ │Shared  │  │  │ │Shared  │  │  │ │Shared  │  │  │
│  │ │Memory  │  │  │ │Memory  │  │  │ │Memory  │  │  │
│  │ └────────┘  │  │ └────────┘  │  │ └────────┘  │  │
│  │ ┌────────┐  │  │ ┌────────┐  │  │ ┌────────┐  │  │
│  │ │L1 Cache│  │  │ │L1 Cache│  │  │ │L1 Cache│  │  │
│  │ └────────┘  │  │ └────────┘  │  │ └────────┘  │  │
│  └────────────┘  └────────────┘  └────────────┘  │
│                                                    │
│  ┌──────────────────────────────────────────────┐ │
│  │              L2 Cache (Shared)                │ │
│  └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘

Exploitatie van niet-geïnitialiseerd geheugen

GPU-geheugen wordt niet automatisch gewist tussen kernellanceringen of tussen processen die een GPU delen. Dit is de meest toegankelijke aanvalsvector in multi-tenant-omgevingen:

import torch
import numpy as np
 
def probe_gpu_memory(allocation_size_mb: int = 512, num_probes: int = 10):
    """
    Probe GPU memory for residual data from previous allocations.
    In multi-tenant environments, this may contain fragments of
    other users' model weights, activations, or input data.
    """
    findings = []
 
    for probe_idx in range(num_probes):
        # Allocate without initialization — reads whatever is in memory
        tensor = torch.empty(
            allocation_size_mb * 1024 * 1024 // 4,  # float32 elements
            dtype=torch.float32,
            device="cuda"
        )
 
        # Analyze contents for non-zero patterns
        nonzero_ratio = (tensor != 0).float().mean().item()
        value_range = (tensor.min().item(), tensor.max().item())
 
        # Check for structured patterns (model weights have characteristic distributions)
        std = tensor.std().item()
        mean = tensor.abs().mean().item()
 
        if nonzero_ratio > 0.01:  # More than 1% non-zero indicates residual data
            findings.append({
                "probe": probe_idx,
                "nonzero_ratio": nonzero_ratio,
                "value_range": value_range,
                "std": std,
                "mean_abs": mean,
                "likely_content": classify_residual_data(std, mean),
            })
 
        del tensor
        torch.cuda.empty_cache()
 
    return findings
 
def classify_residual_data(std: float, mean_abs: float) -> str:
    """Heuristic classification of residual GPU memory contents."""
    if 0.01 < std < 0.1 and mean_abs < 0.05:
        return "likely_model_weights (small initialization)"
    elif 0.1 < std < 2.0:
        return "likely_activations_or_gradients"
    elif std > 10.0:
        return "likely_optimizer_state (Adam momentum/variance)"
    elif mean_abs < 1e-6:
        return "likely_zeroed_or_sparse"
    else:
        return "unknown_structured_data"

Side-channel-aanvallen op GPU-workloads

Timing-side-channels

De uitvoeringstijden van GPU-kernels lekken informatie over de data die wordt verwerkt:

import torch
import time
 
def timing_side_channel_probe(target_gpu: int = 0):
    """
    Measure GPU kernel execution timing to infer characteristics
    of co-located workloads. Execution time correlates with:
    - Model size (number of parameters)
    - Batch size (number of inputs processed)
    - Sequence length (for transformer models)
    - Sparsity patterns in data
    """
    torch.cuda.set_device(target_gpu)
 
    timings = []
    for _ in range(1000):
        # Launch a small probe kernel
        probe = torch.randn(64, 64, device="cuda")
        torch.cuda.synchronize()
 
        start = time.perf_counter_ns()
        result = torch.matmul(probe, probe)
        torch.cuda.synchronize()
        end = time.perf_counter_ns()
 
        timings.append(end - start)
 
    timings = np.array(timings)
 
    # Timing variance indicates resource contention from co-located workloads
    return {
        "mean_ns": np.mean(timings),
        "std_ns": np.std(timings),
        "p99_ns": np.percentile(timings, 99),
        "bimodal": detect_bimodal_distribution(timings),
        "contention_detected": np.std(timings) > np.mean(timings) * 0.1,
    }
 
def detect_bimodal_distribution(data: np.ndarray) -> bool:
    """Bimodal timing suggests batch processing boundaries in co-located workload."""
    from scipy import stats
    _, p_value = stats.normaltest(data)
    return p_value < 0.001

Vermogens- en thermische side-channels

Het GPU-stroomverbruik en de thermische metingen zijn toegankelijk via managementinterfaces en correleren met de kenmerken van de workload:

def monitor_gpu_power_channel(duration_seconds: int = 60, sample_rate_hz: int = 10):
    """
    Monitor GPU power consumption as a side channel.
    Power draw patterns reveal:
    - Training vs. inference workload type
    - Batch processing cadence
    - Model architecture characteristics
    """
    import subprocess
    import time
 
    readings = []
    interval = 1.0 / sample_rate_hz
 
    for _ in range(duration_seconds * sample_rate_hz):
        # nvidia-smi provides power and utilization data
        result = subprocess.run(
            ["nvidia-smi",
             "--query-gpu=power.draw,utilization.gpu,temperature.gpu,memory.used",
             "--format=csv,noheader,nounits"],
            capture_output=True, text=True
        )
 
        if result.returncode == 0:
            values = result.stdout.strip().split(", ")
            readings.append({
                "timestamp": time.time(),
                "power_w": float(values[0]),
                "util_pct": float(values[1]),
                "temp_c": float(values[2]),
                "mem_used_mb": float(values[3]),
            })
 
        time.sleep(interval)
 
    return analyze_power_patterns(readings)
 
def analyze_power_patterns(readings: list) -> dict:
    """Extract workload characteristics from power consumption patterns."""
    powers = [r["power_w"] for r in readings]
    utils = [r["util_pct"] for r in readings]
 
    # Detect periodic patterns (training loop cadence)
    from scipy.signal import find_peaks
    peaks, properties = find_peaks(powers, height=np.mean(powers))
 
    if len(peaks) > 2:
        intervals = np.diff(peaks)
        cadence = np.median(intervals) / 10  # Convert to seconds
 
        return {
            "workload_type": "training" if cadence > 1.0 else "inference",
            "batch_cadence_seconds": cadence,
            "peak_power_w": max(powers),
            "avg_power_w": np.mean(powers),
        }
 
    return {"workload_type": "inference_or_idle", "avg_power_w": np.mean(powers)}

Side-channel	Gelekte data	Nauwkeurigheid	Vereisten
Kernel-timing	Modelgrootte, batchgrootte, sequentielengte	Gemiddeld	Gecolokaliseerd proces op dezelfde GPU
Vermogensanalyse	Trainingscadans, workloadtype	Hoog	nvidia-smi-toegang
Geheugenbandbreedte	Datatransferpatronen, modelladen	Gemiddeld	Toegang tot performance counter
PCIe-verkeer	Datapatronen tussen host en apparaat	Laag	PCIe-monitoringcapaciteit
Thermische patronen	Aanhoudende versus burst-rekenkracht	Laag	Toegang tot temperatuursensor

Multi-tenant GPU-isolatiemechanismen

NVIDIA Multi-Instance GPU (MIG)

MIG (Multi-Instance GPU) biedt de sterkste beschikbare isolatie voor GPU-multi-tenancy:

def assess_mig_isolation(gpu_index: int = 0):
    """Assess MIG partition isolation on NVIDIA A100/H100 GPUs."""
    import subprocess
    import json
 
    findings = []
 
    # List MIG instances
    result = subprocess.run(
        ["nvidia-smi", "mig", "-lgi", "-i", str(gpu_index)],
        capture_output=True, text=True
    )
    findings.append({"mig_instances": result.stdout})
 
    # Check MIG mode status
    result = subprocess.run(
        ["nvidia-smi", "--query-gpu=mig.mode.current", "--format=csv,noheader",
         "-i", str(gpu_index)],
        capture_output=True, text=True
    )
 
    mig_enabled = "Enabled" in result.stdout
 
    if not mig_enabled:
        findings.append({
            "severity": "HIGH",
            "finding": "MIG not enabled on multi-tenant GPU",
            "impact": "No hardware isolation between tenants",
        })
 
    # Even with MIG, check for shared resources
    findings.append({
        "note": "MIG isolates compute and memory but shares: "
                "PCIe bus, NVLink, video encoder/decoder, "
                "GPU management processor",
    })
 
    return findings

NVIDIA Multi-Process Service (MPS)

MPS (Multi-Process Service) biedt prestatievoordelen maar zwakkere isolatie:

Isolatiemechanisme	Reken-isolatie	Geheugenisolatie	Fout-isolatie	Performanceoverhead
MIG	Door hardware gepartitioneerde SM's	Aparte geheugenpartities	Volledig -- crash ingeperkt	0% (toegewijde resources)
MPS	Gedeelde SM's, tijdsgemultiplexed	Gedeelde adresruimte	Geen -- één crash doodt alles	Laag
Time-slicing	Round-robin-scheduling	Geen isolatie	Geen	Gemiddeld (contextwisseling)
vGPU	Door hypervisor bemiddeld	Door hypervisor afgedwongen	Volledig	5-15%

RDMA- en interconnect-aanvallen

Exploitatie van InfiniBand en RoCE

High-performance GPU-clusters gebruiken RDMA voor communicatie tussen nodes tijdens gedistribueerde training:

def enumerate_rdma_endpoints():
    """
    Enumerate RDMA-capable network interfaces and endpoints.
    RDMA traffic bypasses the kernel network stack, meaning
    standard firewall rules and network policies do not apply.
    """
    import subprocess
 
    findings = []
 
    # Check for RDMA devices
    result = subprocess.run(["ibv_devices"], capture_output=True, text=True)
    if result.returncode == 0:
        findings.append({
            "finding": "RDMA devices present",
            "devices": result.stdout,
            "severity": "INFO",
        })
 
    # Check for InfiniBand subnet manager
    result = subprocess.run(["ibstat"], capture_output=True, text=True)
    if result.returncode == 0:
        findings.append({
            "finding": "InfiniBand status",
            "status": result.stdout,
        })
 
    # Enumerate RDMA connections
    result = subprocess.run(
        ["rdma", "resource", "show", "cm_id"],
        capture_output=True, text=True
    )
    if result.returncode == 0:
        findings.append({
            "finding": "Active RDMA connections",
            "connections": result.stdout,
            "note": "These connections bypass kernel network stack and firewalls",
        })
 
    # Check for GPUDirect RDMA capability
    result = subprocess.run(
        ["nvidia-smi", "nvlink", "--status"],
        capture_output=True, text=True
    )
    if result.returncode == 0:
        findings.append({
            "finding": "NVLink status (GPUDirect capable)",
            "status": result.stdout,
        })
 
    return findings

NVLink- en NVSwitch-aanvallen

In multi-GPU-systemen (DGX, HGX) biedt NVLink directe GPU-naar-GPU-geheugentoegang:

def probe_nvlink_topology():
    """
    Map NVLink topology to identify potential cross-GPU
    data access paths. NVLink enables GPUDirect which allows
    one GPU to directly read/write another GPU's memory.
    """
    import subprocess
 
    result = subprocess.run(
        ["nvidia-smi", "topo", "-m"],
        capture_output=True, text=True
    )
 
    topology = result.stdout
 
    # Parse topology matrix for NVLink connections
    # NV# indicates NVLink connection with # links
    # SYS indicates cross-socket (slower)
    # PHB indicates same PCIe host bridge
 
    return {
        "topology": topology,
        "note": "GPUs connected via NVLink can perform direct memory "
                "access (peer-to-peer). If GPU 0 and GPU 1 are NVLink-connected "
                "and run different tenants' workloads, a CUDA program on GPU 0 "
                "can potentially read GPU 1's memory via cuMemcpyPeer.",
    }

Exploitatie van de CUDA-runtime

Driver- en runtime-kwetsbaarheden

De CUDA-softwarestack vormt een significant aanvalsoppervlak:

Component	Kwetsbaarheidsklasse	Voorbeeld-CVE's	Impact
NVIDIA Kernel Driver	Privilege-escalatie	CVE-2024-0071, CVE-2024-0074	Host-compromittering vanuit container
CUDA Runtime	Geheugencorruptie	CVE-2023-31021	Code-uitvoering in GPU-context
cuDNN	Bufferoverloop	Diverse	Willekeurige code-uitvoering
NCCL	Ongeauthenticeerde toegang	Ontwerpprobleem	Onderschepping van data bij gedistribueerde training
nvidia-persistenced	Lokale privilege-escalatie	CVE-2024-0090	Root-toegang vanuit GPU-gebruiker

def assess_cuda_attack_surface():
    """Enumerate CUDA stack components and known vulnerability exposure."""
    import subprocess
 
    components = {}
 
    # Driver version
    result = subprocess.run(
        ["nvidia-smi", "--query-gpu=driver_version", "--format=csv,noheader"],
        capture_output=True, text=True
    )
    components["driver_version"] = result.stdout.strip()
 
    # CUDA version
    result = subprocess.run(
        ["nvcc", "--version"], capture_output=True, text=True
    )
    components["cuda_version"] = result.stdout
 
    # Check for known vulnerable driver versions
    driver_ver = components["driver_version"]
    known_vulnerable = {
        "535.104": ["CVE-2024-0071"],  # Example
        "535.86": ["CVE-2023-31021"],
    }
 
    for vuln_ver, cves in known_vulnerable.items():
        if driver_ver.startswith(vuln_ver):
            components["vulnerabilities"] = cves
 
    return components

Aanvalsscenario's op clusterniveau

Scenario 1: Cross-tenant-data-extractie

1. Attacker obtains legitimate access to a GPU instance in a shared cluster
2. Probe uninitialized GPU memory for residual data from previous tenant
3. Use timing side channels to determine when co-located workload processes batches
4. Allocate and read GPU memory immediately after co-located workload releases it
5. Reconstruct model weights or training data fragments from recovered memory

Scenario 2: Onderschepping van gedistribueerde training

1. Gain access to the training cluster network (InfiniBand or RoCE fabric)
2. Enumerate NCCL communication endpoints (default: no authentication)
3. Join the NCCL communication ring by impersonating a training worker
4. Intercept gradient updates transmitted between nodes during allreduce operations
5. Reconstruct model updates and potentially training data from gradient information

Scenario 3: GPU-ondersteunde container-escape

1. From within a GPU-enabled container, access /dev/nvidia* device files
2. Use GPU memory mapping to probe host memory regions accessible through DMA
3. Exploit NVIDIA driver vulnerabilities for kernel-level privilege escalation
4. Use GPU DMA capabilities to read or write host memory outside container boundaries
5. Establish persistence through GPU firmware or driver-level modifications

Gerelateerde onderwerpen

Aanvallen op AI-deployments -- algemene aanvallen op deploymentinfrastructuur
Kubernetes-beveiliging voor ML-workloads -- Kubernetes-specifieke beveiliging van ML-infrastructuur
Infrastructuurexploitatie -- bredere technieken voor infrastructuurexploitatie
Risico's in de modeltoeleveringsketen -- toeleveringsketenaanvallen op modelniveau
Aanvallen op gedistribueerde training -- het aanvallen van het trainingsproces zelf

Referenties

Naghibijouybari et al., "Rendered Insecure: GPU Side Channel Attacks are Practical" (2018) - Fundamenteel onderzoek dat praktische GPU-side-channel-aanvallen over gecolokaliseerde processen aantoont
Wei et al., "Leaky DNN: Stealing Deep-Learning Model Secret with GPU Context-Switching Side-Channel" (2020) - GPU-contextwisseling als side-channel voor modelextractie
NVIDIA Multi-Instance GPU User Guide (2025) - Officiële MIG-documentatie over partitieconfiguraties en isolatiegaranties
NVIDIA Container Toolkit Security Best Practices (2025) - Beveiligingsrichtlijnen voor GPU-containers, inclusief apparaatisolatie
Hu et al., "Security Analysis of RDMA-based Data Center Networks" (2023) - RDMA-beveiligingsanalyse in datacenteromgevingen

Knowledge Check

Waarom zijn standaard Kubernetes-netwerkbeleidsregels onvoldoende voor het beveiligen van GPU-clustercommunicatie tijdens gedistribueerde training?

Aanvallen op GPU-rekenclusters

Expert13 min lezenBijgewerkt op 2026-03-15

gpu cuda side-channel multi-tenant rdma cluster-attacks

GPU-geheugenarchitectuur en aanvalsoppervlak

NVIDIA GPU-geheugenhiërarchie

Het begrijpen van de GPU-geheugenhiërarchie is essentieel voor het identificeren van mogelijkheden voor datalekkage:

┌──────────────────────────────────────────────────┐
│                  GPU Device                       │
│                                                    │
│  ┌──────────────────────────────────────────────┐ │
│  │            Global Memory (HBM)                │ │
│  │  ┌─────────────┐  ┌─────────────────────────┐│ │
│  │  │ Model Weights│  │ Activations / KV Cache  ││ │
│  │  └─────────────┘  └─────────────────────────┘│ │
│  │  ┌─────────────┐  ┌─────────────────────────┐│ │
│  │  │ Gradients    │  │ Optimizer State          ││ │
│  │  └─────────────┘  └─────────────────────────┘│ │
│  └──────────────────────────────────────────────┘ │
│                                                    │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐  │
│  │    SM 0     │  │    SM 1     │  │    SM N     │  │
│  │ ┌────────┐  │  │ ┌────────┐  │  │ ┌────────┐  │  │
│  │ │Shared  │  │  │ │Shared  │  │  │ │Shared  │  │  │
│  │ │Memory  │  │  │ │Memory  │  │  │ │Memory  │  │  │
│  │ └────────┘  │  │ └────────┘  │  │ └────────┘  │  │
│  │ ┌────────┐  │  │ ┌────────┐  │  │ ┌────────┐  │  │
│  │ │L1 Cache│  │  │ │L1 Cache│  │  │ │L1 Cache│  │  │
│  │ └────────┘  │  │ └────────┘  │  │ └────────┘  │  │
│  └────────────┘  └────────────┘  └────────────┘  │
│                                                    │
│  ┌──────────────────────────────────────────────┐ │
│  │              L2 Cache (Shared)                │ │
│  └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘

Exploitatie van niet-geïnitialiseerd geheugen

GPU-geheugen wordt niet automatisch gewist tussen kernellanceringen of tussen processen die een GPU delen. Dit is de meest toegankelijke aanvalsvector in multi-tenant-omgevingen:

import torch
import numpy as np
 
def probe_gpu_memory(allocation_size_mb: int = 512, num_probes: int = 10):
    """
    Probe GPU memory for residual data from previous allocations.
    In multi-tenant environments, this may contain fragments of
    other users' model weights, activations, or input data.
    """
    findings = []
 
    for probe_idx in range(num_probes):
        # Allocate without initialization — reads whatever is in memory
        tensor = torch.empty(
            allocation_size_mb * 1024 * 1024 // 4,  # float32 elements
            dtype=torch.float32,
            device="cuda"
        )
 
        # Analyze contents for non-zero patterns
        nonzero_ratio = (tensor != 0).float().mean().item()
        value_range = (tensor.min().item(), tensor.max().item())
 
        # Check for structured patterns (model weights have characteristic distributions)
        std = tensor.std().item()
        mean = tensor.abs().mean().item()
 
        if nonzero_ratio > 0.01:  # More than 1% non-zero indicates residual data
            findings.append({
                "probe": probe_idx,
                "nonzero_ratio": nonzero_ratio,
                "value_range": value_range,
                "std": std,
                "mean_abs": mean,
                "likely_content": classify_residual_data(std, mean),
            })
 
        del tensor
        torch.cuda.empty_cache()
 
    return findings
 
def classify_residual_data(std: float, mean_abs: float) -> str:
    """Heuristic classification of residual GPU memory contents."""
    if 0.01 < std < 0.1 and mean_abs < 0.05:
        return "likely_model_weights (small initialization)"
    elif 0.1 < std < 2.0:
        return "likely_activations_or_gradients"
    elif std > 10.0:
        return "likely_optimizer_state (Adam momentum/variance)"
    elif mean_abs < 1e-6:
        return "likely_zeroed_or_sparse"
    else:
        return "unknown_structured_data"

Side-channel-aanvallen op GPU-workloads

Timing-side-channels

De uitvoeringstijden van GPU-kernels lekken informatie over de data die wordt verwerkt:

import torch
import time
 
def timing_side_channel_probe(target_gpu: int = 0):
    """
    Measure GPU kernel execution timing to infer characteristics
    of co-located workloads. Execution time correlates with:
    - Model size (number of parameters)
    - Batch size (number of inputs processed)
    - Sequence length (for transformer models)
    - Sparsity patterns in data
    """
    torch.cuda.set_device(target_gpu)
 
    timings = []
    for _ in range(1000):
        # Launch a small probe kernel
        probe = torch.randn(64, 64, device="cuda")
        torch.cuda.synchronize()
 
        start = time.perf_counter_ns()
        result = torch.matmul(probe, probe)
        torch.cuda.synchronize()
        end = time.perf_counter_ns()
 
        timings.append(end - start)
 
    timings = np.array(timings)
 
    # Timing variance indicates resource contention from co-located workloads
    return {
        "mean_ns": np.mean(timings),
        "std_ns": np.std(timings),
        "p99_ns": np.percentile(timings, 99),
        "bimodal": detect_bimodal_distribution(timings),
        "contention_detected": np.std(timings) > np.mean(timings) * 0.1,
    }
 
def detect_bimodal_distribution(data: np.ndarray) -> bool:
    """Bimodal timing suggests batch processing boundaries in co-located workload."""
    from scipy import stats
    _, p_value = stats.normaltest(data)
    return p_value < 0.001

Vermogens- en thermische side-channels

Het GPU-stroomverbruik en de thermische metingen zijn toegankelijk via managementinterfaces en correleren met de kenmerken van de workload:

def monitor_gpu_power_channel(duration_seconds: int = 60, sample_rate_hz: int = 10):
    """
    Monitor GPU power consumption as a side channel.
    Power draw patterns reveal:
    - Training vs. inference workload type
    - Batch processing cadence
    - Model architecture characteristics
    """
    import subprocess
    import time
 
    readings = []
    interval = 1.0 / sample_rate_hz
 
    for _ in range(duration_seconds * sample_rate_hz):
        # nvidia-smi provides power and utilization data
        result = subprocess.run(
            ["nvidia-smi",
             "--query-gpu=power.draw,utilization.gpu,temperature.gpu,memory.used",
             "--format=csv,noheader,nounits"],
            capture_output=True, text=True
        )
 
        if result.returncode == 0:
            values = result.stdout.strip().split(", ")
            readings.append({
                "timestamp": time.time(),
                "power_w": float(values[0]),
                "util_pct": float(values[1]),
                "temp_c": float(values[2]),
                "mem_used_mb": float(values[3]),
            })
 
        time.sleep(interval)
 
    return analyze_power_patterns(readings)
 
def analyze_power_patterns(readings: list) -> dict:
    """Extract workload characteristics from power consumption patterns."""
    powers = [r["power_w"] for r in readings]
    utils = [r["util_pct"] for r in readings]
 
    # Detect periodic patterns (training loop cadence)
    from scipy.signal import find_peaks
    peaks, properties = find_peaks(powers, height=np.mean(powers))
 
    if len(peaks) > 2:
        intervals = np.diff(peaks)
        cadence = np.median(intervals) / 10  # Convert to seconds
 
        return {
            "workload_type": "training" if cadence > 1.0 else "inference",
            "batch_cadence_seconds": cadence,
            "peak_power_w": max(powers),
            "avg_power_w": np.mean(powers),
        }
 
    return {"workload_type": "inference_or_idle", "avg_power_w": np.mean(powers)}

Side-channel	Gelekte data	Nauwkeurigheid	Vereisten
Kernel-timing	Modelgrootte, batchgrootte, sequentielengte	Gemiddeld	Gecolokaliseerd proces op dezelfde GPU
Vermogensanalyse	Trainingscadans, workloadtype	Hoog	nvidia-smi-toegang
Geheugenbandbreedte	Datatransferpatronen, modelladen	Gemiddeld	Toegang tot performance counter
PCIe-verkeer	Datapatronen tussen host en apparaat	Laag	PCIe-monitoringcapaciteit
Thermische patronen	Aanhoudende versus burst-rekenkracht	Laag	Toegang tot temperatuursensor

Multi-tenant GPU-isolatiemechanismen

NVIDIA Multi-Instance GPU (MIG)

MIG (Multi-Instance GPU) biedt de sterkste beschikbare isolatie voor GPU-multi-tenancy:

def assess_mig_isolation(gpu_index: int = 0):
    """Assess MIG partition isolation on NVIDIA A100/H100 GPUs."""
    import subprocess
    import json
 
    findings = []
 
    # List MIG instances
    result = subprocess.run(
        ["nvidia-smi", "mig", "-lgi", "-i", str(gpu_index)],
        capture_output=True, text=True
    )
    findings.append({"mig_instances": result.stdout})
 
    # Check MIG mode status
    result = subprocess.run(
        ["nvidia-smi", "--query-gpu=mig.mode.current", "--format=csv,noheader",
         "-i", str(gpu_index)],
        capture_output=True, text=True
    )
 
    mig_enabled = "Enabled" in result.stdout
 
    if not mig_enabled:
        findings.append({
            "severity": "HIGH",
            "finding": "MIG not enabled on multi-tenant GPU",
            "impact": "No hardware isolation between tenants",
        })
 
    # Even with MIG, check for shared resources
    findings.append({
        "note": "MIG isolates compute and memory but shares: "
                "PCIe bus, NVLink, video encoder/decoder, "
                "GPU management processor",
    })
 
    return findings

NVIDIA Multi-Process Service (MPS)

MPS (Multi-Process Service) biedt prestatievoordelen maar zwakkere isolatie:

Isolatiemechanisme	Reken-isolatie	Geheugenisolatie	Fout-isolatie	Performanceoverhead
MIG	Door hardware gepartitioneerde SM's	Aparte geheugenpartities	Volledig -- crash ingeperkt	0% (toegewijde resources)
MPS	Gedeelde SM's, tijdsgemultiplexed	Gedeelde adresruimte	Geen -- één crash doodt alles	Laag
Time-slicing	Round-robin-scheduling	Geen isolatie	Geen	Gemiddeld (contextwisseling)
vGPU	Door hypervisor bemiddeld	Door hypervisor afgedwongen	Volledig	5-15%

RDMA- en interconnect-aanvallen

Exploitatie van InfiniBand en RoCE

High-performance GPU-clusters gebruiken RDMA voor communicatie tussen nodes tijdens gedistribueerde training:

def enumerate_rdma_endpoints():
    """
    Enumerate RDMA-capable network interfaces and endpoints.
    RDMA traffic bypasses the kernel network stack, meaning
    standard firewall rules and network policies do not apply.
    """
    import subprocess
 
    findings = []
 
    # Check for RDMA devices
    result = subprocess.run(["ibv_devices"], capture_output=True, text=True)
    if result.returncode == 0:
        findings.append({
            "finding": "RDMA devices present",
            "devices": result.stdout,
            "severity": "INFO",
        })
 
    # Check for InfiniBand subnet manager
    result = subprocess.run(["ibstat"], capture_output=True, text=True)
    if result.returncode == 0:
        findings.append({
            "finding": "InfiniBand status",
            "status": result.stdout,
        })
 
    # Enumerate RDMA connections
    result = subprocess.run(
        ["rdma", "resource", "show", "cm_id"],
        capture_output=True, text=True
    )
    if result.returncode == 0:
        findings.append({
            "finding": "Active RDMA connections",
            "connections": result.stdout,
            "note": "These connections bypass kernel network stack and firewalls",
        })
 
    # Check for GPUDirect RDMA capability
    result = subprocess.run(
        ["nvidia-smi", "nvlink", "--status"],
        capture_output=True, text=True
    )
    if result.returncode == 0:
        findings.append({
            "finding": "NVLink status (GPUDirect capable)",
            "status": result.stdout,
        })
 
    return findings

NVLink- en NVSwitch-aanvallen

In multi-GPU-systemen (DGX, HGX) biedt NVLink directe GPU-naar-GPU-geheugentoegang:

def probe_nvlink_topology():
    """
    Map NVLink topology to identify potential cross-GPU
    data access paths. NVLink enables GPUDirect which allows
    one GPU to directly read/write another GPU's memory.
    """
    import subprocess
 
    result = subprocess.run(
        ["nvidia-smi", "topo", "-m"],
        capture_output=True, text=True
    )
 
    topology = result.stdout
 
    # Parse topology matrix for NVLink connections
    # NV# indicates NVLink connection with # links
    # SYS indicates cross-socket (slower)
    # PHB indicates same PCIe host bridge
 
    return {
        "topology": topology,
        "note": "GPUs connected via NVLink can perform direct memory "
                "access (peer-to-peer). If GPU 0 and GPU 1 are NVLink-connected "
                "and run different tenants' workloads, a CUDA program on GPU 0 "
                "can potentially read GPU 1's memory via cuMemcpyPeer.",
    }

Exploitatie van de CUDA-runtime

Driver- en runtime-kwetsbaarheden

De CUDA-softwarestack vormt een significant aanvalsoppervlak:

Component	Kwetsbaarheidsklasse	Voorbeeld-CVE's	Impact
NVIDIA Kernel Driver	Privilege-escalatie	CVE-2024-0071, CVE-2024-0074	Host-compromittering vanuit container
CUDA Runtime	Geheugencorruptie	CVE-2023-31021	Code-uitvoering in GPU-context
cuDNN	Bufferoverloop	Diverse	Willekeurige code-uitvoering
NCCL	Ongeauthenticeerde toegang	Ontwerpprobleem	Onderschepping van data bij gedistribueerde training
nvidia-persistenced	Lokale privilege-escalatie	CVE-2024-0090	Root-toegang vanuit GPU-gebruiker

def assess_cuda_attack_surface():
    """Enumerate CUDA stack components and known vulnerability exposure."""
    import subprocess
 
    components = {}
 
    # Driver version
    result = subprocess.run(
        ["nvidia-smi", "--query-gpu=driver_version", "--format=csv,noheader"],
        capture_output=True, text=True
    )
    components["driver_version"] = result.stdout.strip()
 
    # CUDA version
    result = subprocess.run(
        ["nvcc", "--version"], capture_output=True, text=True
    )
    components["cuda_version"] = result.stdout
 
    # Check for known vulnerable driver versions
    driver_ver = components["driver_version"]
    known_vulnerable = {
        "535.104": ["CVE-2024-0071"],  # Example
        "535.86": ["CVE-2023-31021"],
    }
 
    for vuln_ver, cves in known_vulnerable.items():
        if driver_ver.startswith(vuln_ver):
            components["vulnerabilities"] = cves
 
    return components

Aanvalsscenario's op clusterniveau

Scenario 1: Cross-tenant-data-extractie

1. Attacker obtains legitimate access to a GPU instance in a shared cluster
2. Probe uninitialized GPU memory for residual data from previous tenant
3. Use timing side channels to determine when co-located workload processes batches
4. Allocate and read GPU memory immediately after co-located workload releases it
5. Reconstruct model weights or training data fragments from recovered memory

Scenario 2: Onderschepping van gedistribueerde training

1. Gain access to the training cluster network (InfiniBand or RoCE fabric)
2. Enumerate NCCL communication endpoints (default: no authentication)
3. Join the NCCL communication ring by impersonating a training worker
4. Intercept gradient updates transmitted between nodes during allreduce operations
5. Reconstruct model updates and potentially training data from gradient information

Scenario 3: GPU-ondersteunde container-escape

1. From within a GPU-enabled container, access /dev/nvidia* device files
2. Use GPU memory mapping to probe host memory regions accessible through DMA
3. Exploit NVIDIA driver vulnerabilities for kernel-level privilege escalation
4. Use GPU DMA capabilities to read or write host memory outside container boundaries
5. Establish persistence through GPU firmware or driver-level modifications

Gerelateerde onderwerpen

Aanvallen op AI-deployments -- algemene aanvallen op deploymentinfrastructuur
Kubernetes-beveiliging voor ML-workloads -- Kubernetes-specifieke beveiliging van ML-infrastructuur
Infrastructuurexploitatie -- bredere technieken voor infrastructuurexploitatie
Risico's in de modeltoeleveringsketen -- toeleveringsketenaanvallen op modelniveau
Aanvallen op gedistribueerde training -- het aanvallen van het trainingsproces zelf

Referenties

Naghibijouybari et al., "Rendered Insecure: GPU Side Channel Attacks are Practical" (2018) - Fundamenteel onderzoek dat praktische GPU-side-channel-aanvallen over gecolokaliseerde processen aantoont
Wei et al., "Leaky DNN: Stealing Deep-Learning Model Secret with GPU Context-Switching Side-Channel" (2020) - GPU-contextwisseling als side-channel voor modelextractie
NVIDIA Multi-Instance GPU User Guide (2025) - Officiële MIG-documentatie over partitieconfiguraties en isolatiegaranties
NVIDIA Container Toolkit Security Best Practices (2025) - Beveiligingsrichtlijnen voor GPU-containers, inclusief apparaatisolatie
Hu et al., "Security Analysis of RDMA-based Data Center Networks" (2023) - RDMA-beveiligingsanalyse in datacenteromgevingen

Knowledge Check

Waarom zijn standaard Kubernetes-netwerkbeleidsregels onvoldoende voor het beveiligen van GPU-clustercommunicatie tijdens gedistribueerde training?

Aanvallen op GPU-rekenclusters

Gerelateerde artikelen

Aanvallen op GPU-rekenclusters

Gerelateerde artikelen