Trusted Execution Environments for AI Workloads

advanced19 min readUpdated 2026-03-21

Security analysis of Intel SGX, AMD SEV, and ARM TrustZone for protecting AI model inference and training in untrusted environments

infrastructure confidential-computing tee hardware-security side-channels

Overview

Trusted Execution Environments (TEEs) represent one of the most promising hardware-based approaches to protecting AI models and training data in untrusted or multi-tenant environments. As organizations increasingly deploy sensitive AI models to cloud infrastructure they do not fully control, the need to protect model weights, inference inputs, and training data from privileged attackers — including cloud administrators and compromised hypervisors — has become critical.

The three dominant TEE technologies in the AI space are Intel Software Guard Extensions (SGX), AMD Secure Encrypted Virtualization (SEV), and ARM TrustZone. Each provides a different abstraction for isolating sensitive computation: SGX offers process-level enclaves with a small trusted computing base, AMD SEV encrypts entire virtual machines with minimal guest modifications, and ARM TrustZone partitions the processor into secure and normal worlds, primarily targeting edge and mobile AI deployments. Major cloud providers now offer confidential computing instances — Azure Confidential Computing with SGX and SEV-SNP, Google Cloud Confidential VMs with SEV, and AWS Nitro Enclaves with a related but distinct isolation model.

However, TEEs are not a silver bullet. Red teamers must understand that these technologies have been repeatedly shown to be vulnerable to side-channel attacks, controlled-channel attacks, and architectural flaws that can leak sensitive information including model weights and inference data. The gap between the theoretical security guarantees and practical deployment realities is substantial, and this article examines both the attack surface and the defensive posture that organizations should adopt when relying on confidential computing for AI workloads.

TEE Architecture for AI Workloads

Intel SGX: Enclave-Based Model Protection

Intel SGX creates isolated memory regions called enclaves that are encrypted in DRAM and decrypted only inside the CPU package. The operating system, hypervisor, and other processes cannot read or modify enclave memory. For AI workloads, SGX enclaves can protect model weights during inference: the model is loaded into the enclave, encrypted inputs are passed in, and only encrypted outputs leave.

The SGX threat model explicitly excludes the OS and hypervisor from the trusted computing base. This is powerful for AI deployments where you want to run inference on a cloud provider's hardware without trusting the cloud provider to not inspect your model. However, SGX has significant limitations for AI:

Memory constraints: SGX enclaves are limited to the Enclave Page Cache (EPC), which was 128MB in earlier generations and up to 512GB in 4th Gen Xeon Scalable processors. Large language models can exceed these limits, requiring paging that introduces performance overhead and side-channel risks.
No GPU support: SGX enclaves run on the CPU only. GPU-accelerated inference must either happen outside the enclave (losing confidentiality) or use software-based inference inside the enclave at significantly reduced performance.
Attestation complexity: Remote attestation proves to a verifier that code is running inside a genuine SGX enclave, but the attestation infrastructure itself (Intel's provisioning service) represents a centralized trust dependency.

# Example: Loading a model into an SGX enclave using Gramine (formerly Graphene)
# gramine-manifest.toml configuration for PyTorch inference
 
"""
Gramine manifest for running PyTorch model inference inside SGX enclave.
This configuration specifies trusted files (model weights) and
enclave memory settings.
"""
 
import subprocess
import json
from pathlib import Path
 
def generate_gramine_manifest(model_path: str, enclave_size_mb: int = 4096) -> str:
    """Generate a Gramine manifest for SGX-protected model inference."""
    manifest = f"""
[loader]
entrypoint = "file:{{{{ gramine.libos }}}}"
log_level = "warning"
 
[libos]
entrypoint = "/usr/bin/python3"
 
[loader.env]
LD_LIBRARY_PATH = "/lib:/usr/lib:/usr/local/lib"
PYTHONPATH = "/usr/local/lib/python3.11/site-packages"
 
[loader.argv]
argv0 = "python3"
argv1 = "inference_server.py"
 
[sys]
# Enclave memory size - must accommodate model + runtime
enclave_size = "{enclave_size_mb}M"
# Number of threads inside enclave
thread_num = 8
 
[sgx]
# Enable SGX debug mode (disable in production)
debug = false
# Remote attestation type
ra_client_spid = ""
ra_client_linkable = false
 
# Trusted files - model weights are measured at load time
[[sgx.trusted_files]]
uri = "file:{model_path}"
 
[[sgx.trusted_files]]
uri = "file:inference_server.py"
 
# Allowed (but unmeasured) files
[[sgx.allowed_files]]
uri = "file:/tmp/inference_input"
 
[[sgx.allowed_files]]
uri = "file:/tmp/inference_output"
"""
    return manifest
 
def verify_sgx_attestation(quote: bytes, expected_mrenclave: str) -> bool:
    """
    Verify an SGX attestation quote against expected enclave measurement.
 
    In production, this would contact Intel's Attestation Service (IAS)
    or use DCAP for local verification.
    """
    # Extract MRENCLAVE from quote (offset 112, 32 bytes in SGX quote structure)
    mrenclave_offset = 112
    mrenclave = quote[mrenclave_offset:mrenclave_offset + 32].hex()
 
    if mrenclave != expected_mrenclave:
        raise SecurityError(
            f"MRENCLAVE mismatch: got {mrenclave}, "
            f"expected {expected_mrenclave}"
        )
 
    # In production: verify quote signature via IAS or DCAP
    # result = dcap_verify(quote)
    # if not result.is_valid:
    #     raise SecurityError("Quote signature verification failed")
 
    return True

AMD SEV: VM-Level Encryption for Training

AMD SEV and its successors (SEV-ES, SEV-SNP) encrypt entire virtual machine memory with per-VM keys managed by the AMD Secure Processor. Unlike SGX's process-level isolation, SEV protects a full VM, making it more natural for running unmodified AI training workloads that span multiple processes.

SEV-SNP (Secure Nested Paging) adds integrity protection and attestation capabilities that earlier SEV versions lacked. For AI workloads, SEV-SNP is the minimum viable option because earlier versions were vulnerable to memory remapping attacks where a malicious hypervisor could swap encrypted memory pages between VMs or replay old pages.

Key architectural properties for AI red teaming:

Full VM encryption: Model weights, training data, and intermediate activations are all encrypted in DRAM. The hypervisor sees only ciphertext.
GPU considerations: SEV encrypts CPU memory but GPU memory (VRAM) is outside the SEV boundary. NVIDIA's Hopper architecture H100 introduces Confidential Computing mode that extends encryption to GPU memory via a secure channel between CPU and GPU, but this is a recent development with limited deployment.
Performance overhead: SEV-SNP adds approximately 2-5% overhead for compute-bound workloads, which is significantly less than SGX paging overhead for large models.

ARM TrustZone: Edge AI Protection

ARM TrustZone partitions the processor into a Secure World and a Normal World, with hardware-enforced isolation between them. This is the dominant TEE for edge AI deployments on mobile devices, IoT sensors, and embedded systems running inference on small models.

TrustZone differs fundamentally from SGX and SEV in that it provides a single secure partition (not multiple isolated enclaves/VMs). The Secure World runs a Trusted OS (such as OP-TEE) that hosts Trusted Applications (TAs), which can include AI inference engines.

For red teaming edge AI with TrustZone:

Shared memory attacks: Data must be explicitly shared between Normal and Secure worlds through shared memory buffers. Improper sanitization of these buffers is a common vulnerability.
Trusted Application vulnerabilities: TAs are written in C and frequently contain memory corruption vulnerabilities. A buffer overflow in a TA can compromise the entire Secure World.
Debug interfaces: JTAG and other debug interfaces can bypass TrustZone if not properly disabled in production.

Attack Surfaces and Exploitation Techniques

Side-Channel Attacks on TEE-Protected Models

Side-channel attacks are the primary threat to TEE-protected AI workloads. Even though the TEE encrypts memory, observable behaviors such as memory access patterns, cache timing, and power consumption can leak information about the protected computation.

Cache-based side channels are particularly devastating against neural network inference because the memory access pattern during matrix multiplication is directly correlated with the model weights and input data. The attacker and victim share the CPU cache hierarchy, and by monitoring cache line evictions, an attacker can reconstruct which memory locations the enclave accessed.

"""
Demonstration of cache timing side-channel analysis against
TEE-protected neural network inference.
 
This script analyzes timing variations in cache access patterns
to infer which neurons were activated during inference.
"""
 
import numpy as np
import ctypes
import time
from typing import List, Tuple
 
class CacheTimingAnalyzer:
    """
    Analyzes cache timing measurements to extract information
    about TEE-protected model inference.
    """
 
    # Typical L1 cache line size on x86
    CACHE_LINE_SIZE = 64
    # Threshold in CPU cycles to distinguish cache hit vs miss
    # (calibrate per-system; typical values: 100-200 cycles)
    HIT_MISS_THRESHOLD = 150
 
    def __init__(self, num_samples: int = 10000):
        self.num_samples = num_samples
        self.timing_data: List[np.ndarray] = []
 
    def prime_probe_round(
        self,
        monitored_cache_sets: List[int],
        wait_callback: callable,
    ) -> np.ndarray:
        """
        Execute one round of Prime+Probe on specified cache sets.
 
        1. PRIME: Fill target cache sets with attacker's data
        2. WAIT: Let the victim (enclave) execute
        3. PROBE: Measure access time to attacker's data
           - Slow access = victim evicted the line (accessed that cache set)
           - Fast access = victim did not use that cache set
        """
        num_sets = len(monitored_cache_sets)
        timings = np.zeros(num_sets, dtype=np.int64)
 
        # In a real attack, this would use rdtsc and carefully
        # crafted eviction sets. This is a conceptual demonstration.
        for i, cache_set in enumerate(monitored_cache_sets):
            # Prime phase: access attacker's buffer at addresses
            # mapping to the target cache set
            # (omitted: actual memory access with precise addressing)
 
            # Wait for victim computation
            wait_callback()
 
            # Probe phase: time re-access to attacker's buffer
            start = time.perf_counter_ns()
            # (omitted: actual memory access)
            elapsed = time.perf_counter_ns() - start
            timings[i] = elapsed
 
        return timings
 
    def collect_traces(
        self,
        monitored_sets: List[int],
        trigger_inference: callable,
    ) -> np.ndarray:
        """Collect multiple timing traces across inference invocations."""
        traces = []
        for _ in range(self.num_samples):
            timings = self.prime_probe_round(
                monitored_sets,
                wait_callback=trigger_inference,
            )
            traces.append(timings)
        return np.array(traces)
 
    def analyze_neuron_activation(
        self,
        traces: np.ndarray,
    ) -> np.ndarray:
        """
        Infer neuron activation patterns from cache timing traces.
 
        Cache sets that consistently show evictions (slow access)
        during inference correspond to weight matrix regions that
        were accessed, indicating activated neurons.
        """
        # Classify each measurement as hit (0) or miss (1)
        hit_miss = (traces > self.HIT_MISS_THRESHOLD).astype(np.float32)
 
        # Average miss rate per cache set across all samples
        miss_rates = hit_miss.mean(axis=0)
 
        # High miss rate = frequently accessed by enclave = active neuron
        # Low miss rate = not accessed = inactive neuron (e.g., ReLU zeroed)
        activation_threshold = 0.5
        inferred_activations = (miss_rates > activation_threshold).astype(int)
 
        return inferred_activations
 
    def reconstruct_weight_signs(
        self,
        traces_positive_input: np.ndarray,
        traces_negative_input: np.ndarray,
    ) -> np.ndarray:
        """
        Infer weight signs by comparing activation patterns
        for positive vs negative inputs.
 
        If a neuron activates for positive input but not negative,
        the corresponding weight is likely positive (assuming ReLU).
        """
        pos_activations = self.analyze_neuron_activation(traces_positive_input)
        neg_activations = self.analyze_neuron_activation(traces_negative_input)
 
        # +1 = positive weight, -1 = negative weight, 0 = indeterminate
        weight_signs = pos_activations.astype(int) - neg_activations.astype(int)
        return weight_signs

Controlled-Channel Attacks

Controlled-channel attacks exploit the fact that the operating system (which is untrusted in the SGX threat model) controls page table mappings for enclave memory. By manipulating page permissions and observing page faults, a malicious OS can determine exactly which code and data pages the enclave accesses at page granularity.

For AI inference, this is devastating because:

The sequence of accessed code pages reveals the model architecture (which layers execute in what order).
The sequence of accessed data pages reveals which weight matrices are accessed, leaking the model structure.
Input-dependent memory access patterns (such as those in attention mechanisms or conditional computation) directly leak information about the input.

#!/usr/bin/env bash
# Controlled-channel attack setup using SGX-Step framework
# SGX-Step provides precise single-stepping of enclave execution
# Reference: Van Bulck et al., "SGX-Step: A Practical Attack Framework
# for Precise Enclave Execution Control", SysTEX 2017
 
# Clone and build SGX-Step (requires SGX hardware and Linux kernel module)
git clone https://github.com/jovanbulck/sgx-step.git
cd sgx-step
 
# Build the kernel module for precise interrupt control
cd kernel
make
sudo insmod sgx-step.ko
 
# The attack uses APIC timer interrupts to single-step enclave execution.
# Each step reveals the instruction page and data page accessed.
# For a neural network inference:
#   - Code page sequence -> model architecture (layer types, order)
#   - Data page sequence -> weight access pattern -> weight values (partial)
 
# Example: Configure page-level monitoring for a target enclave
cd ../app
cat > config.h << 'CEOF'
#define VICTIM_ENCLAVE_PATH "/opt/ai-service/enclave.signed.so"
// Monitor data pages in the weight matrix region
#define MONITOR_DATA_START  0x7f0000000000ULL
#define MONITOR_DATA_END    0x7f0040000000ULL  // 1GB region for model weights
// Interrupt frequency: once per ~100 instructions for coarse tracing
// or once per instruction for fine-grained extraction
#define SINGLE_STEP_MODE    0  // Set to 1 for instruction-level granularity
CEOF
 
echo "Build the controlled-channel monitor:"
make
echo "Run with: sudo ./app/monitor --enclave-pid <target_pid>"

Voltage and Frequency Manipulation (Plundervolt/VoltPillager)

Hardware-level attacks can compromise TEE security by manipulating the CPU's operating voltage or frequency. The Plundervolt attack demonstrated that undervolting an Intel CPU during SGX enclave execution could induce computational faults, causing incorrect results in AES-NI operations and potentially corrupting model inference results.

For AI workloads specifically:

Fault injection during inference: By inducing bit flips in floating-point operations, an attacker can cause misclassification or extract model information through differential fault analysis.
Training corruption: Faults during gradient computation can steer model training toward attacker-chosen outcomes without modifying the training data.

Practical Examples

Extracting Model Architecture from Page Access Patterns

This example demonstrates how to analyze page fault logs from a controlled-channel attack to reconstruct the architecture of a neural network running inside an SGX enclave.

"""
Reconstruct neural network architecture from page fault traces
collected during controlled-channel attack on SGX enclave.
 
Each page fault reveals a 4KB-aligned memory access. By mapping
these to code and data regions of the inference binary, we can
determine layer types, sizes, and execution order.
"""
 
from dataclasses import dataclass
from enum import Enum
from typing import List, Optional
import struct
 
class LayerType(Enum):
    CONV2D = "conv2d"
    LINEAR = "linear"
    BATCHNORM = "batchnorm"
    RELU = "relu"
    ATTENTION = "attention"
    UNKNOWN = "unknown"
 
@dataclass
class InferredLayer:
    layer_type: LayerType
    estimated_params: int
    page_range: tuple[int, int]
    code_pages: List[int]
 
# Known code page signatures for common operations in PyTorch/ONNX Runtime
# These are determined by reverse-engineering the inference binary
LAYER_SIGNATURES = {
    # GEMM kernel touches specific code pages in MKL/oneDNN
    LayerType.LINEAR: {
        "code_pattern": [0x1000, 0x1001, 0x1002],  # MKL SGEMM pages
        "data_pattern": "sequential",  # Linear weight access is sequential
    },
    LayerType.CONV2D: {
        "code_pattern": [0x2000, 0x2001, 0x2002, 0x2003],  # im2col + GEMM
        "data_pattern": "strided",  # Filter weights accessed with stride
    },
    LayerType.ATTENTION: {
        "code_pattern": [0x1000, 0x1001, 0x3000, 0x3001],  # GEMM + softmax
        "data_pattern": "qkv_triple",  # Three weight matrices accessed
    },
}
 
def parse_page_fault_log(log_path: str) -> List[tuple[int, int, str]]:
    """
    Parse page fault log from SGX-Step controlled-channel attack.
    Returns list of (timestamp, page_address, access_type) tuples.
    """
    events = []
    with open(log_path, "r") as f:
        for line in f:
            parts = line.strip().split(",")
            if len(parts) != 3:
                continue
            timestamp = int(parts[0])
            page_addr = int(parts[1], 16)
            access_type = parts[2]  # "code" or "data"
            events.append((timestamp, page_addr, access_type))
    return events
 
def identify_layer_boundaries(
    events: List[tuple[int, int, str]],
) -> List[tuple[int, int]]:
    """
    Identify layer boundaries by detecting transitions between
    code regions. Each layer type uses a distinct set of code pages.
    """
    boundaries = []
    current_code_set = set()
    boundary_start = 0
 
    for i, (ts, addr, atype) in enumerate(events):
        if atype == "code":
            if addr not in current_code_set and len(current_code_set) > 2:
                # Significant change in code pages = new layer
                boundaries.append((boundary_start, i))
                boundary_start = i
                current_code_set = set()
            current_code_set.add(addr)
 
    if boundary_start < len(events):
        boundaries.append((boundary_start, len(events)))
 
    return boundaries
 
def infer_architecture(log_path: str) -> List[InferredLayer]:
    """
    Full pipeline: parse page faults -> identify layers -> classify.
    """
    events = parse_page_fault_log(log_path)
    boundaries = identify_layer_boundaries(events)
    layers = []
 
    for start, end in boundaries:
        segment = events[start:end]
        code_pages = sorted(set(
            addr for _, addr, atype in segment if atype == "code"
        ))
        data_pages = sorted(set(
            addr for _, addr, atype in segment if atype == "data"
        ))
 
        # Match code pages against known signatures
        layer_type = LayerType.UNKNOWN
        for ltype, sig in LAYER_SIGNATURES.items():
            pattern = sig["code_pattern"]
            if all(p in code_pages for p in pattern):
                layer_type = ltype
                break
 
        # Estimate parameter count from number of unique data pages
        # Each page is 4KB; fp32 weights are 4 bytes each
        estimated_params = len(data_pages) * (4096 // 4)
 
        layers.append(InferredLayer(
            layer_type=layer_type,
            estimated_params=estimated_params,
            page_range=(data_pages[0] if data_pages else 0,
                       data_pages[-1] if data_pages else 0),
            code_pages=code_pages,
        ))
 
    return layers
 
if __name__ == "__main__":
    import sys
    if len(sys.argv) != 2:
        print(f"Usage: {sys.argv[0]} <page_fault_log>")
        sys.exit(1)
 
    layers = infer_architecture(sys.argv[1])
    print(f"Inferred {len(layers)} layers:")
    for i, layer in enumerate(layers):
        print(
            f"  Layer {i}: {layer.layer_type.value}, "
            f"~{layer.estimated_params:,} params, "
            f"pages {layer.page_range[0]:#x}-{layer.page_range[1]:#x}"
        )

Auditing SEV-SNP Attestation in Cloud Deployments

"""
Audit script for verifying AMD SEV-SNP attestation reports
in confidential AI VM deployments.
 
Checks that the VM is running with expected security properties:
- SNP is enabled (not just SEV or SEV-ES)
- Firmware version meets minimum requirements
- Guest policy enforces migration restrictions
- The TCB (Trusted Computing Base) version is current
"""
 
import hashlib
import struct
import json
from pathlib import Path
from dataclasses import dataclass
from typing import Optional
 
@dataclass
class SNPAttestationReport:
    """Parsed AMD SEV-SNP attestation report fields."""
    version: int
    guest_svn: int
    policy: int
    family_id: bytes
    image_id: bytes
    vmpl: int
    signature_algo: int
    current_tcb: int
    platform_info: int
    author_key_en: int
    report_data: bytes
    measurement: bytes  # 384-bit hash of initial guest memory
    host_data: bytes
    id_key_digest: bytes
    author_key_digest: bytes
    report_id: bytes
    chip_id: bytes
    committed_tcb: int
    current_build: int
    committed_build: int
    launch_tcb: int
    signature: bytes
 
def parse_attestation_report(raw_report: bytes) -> SNPAttestationReport:
    """
    Parse a raw SEV-SNP attestation report (0x4A0 bytes).
    Reference: AMD SEV-SNP ABI Specification, Table 21.
    """
    if len(raw_report) < 0x4A0:
        raise ValueError(
            f"Report too short: {len(raw_report)} bytes, "
            f"expected at least 0x4A0"
        )
 
    version = struct.unpack_from("<I", raw_report, 0x0)[0]
    guest_svn = struct.unpack_from("<I", raw_report, 0x4)[0]
    policy = struct.unpack_from("<Q", raw_report, 0x8)[0]
    family_id = raw_report[0x10:0x20]
    image_id = raw_report[0x20:0x30]
    vmpl = struct.unpack_from("<I", raw_report, 0x30)[0]
    signature_algo = struct.unpack_from("<I", raw_report, 0x34)[0]
    current_tcb = struct.unpack_from("<Q", raw_report, 0x38)[0]
    platform_info = struct.unpack_from("<Q", raw_report, 0x40)[0]
    author_key_en = struct.unpack_from("<I", raw_report, 0x48)[0]
    report_data = raw_report[0x50:0x90]
    measurement = raw_report[0x90:0xC0]
    host_data = raw_report[0xC0:0xE0]
    id_key_digest = raw_report[0xE0:0x110]
    author_key_digest = raw_report[0x110:0x140]
    report_id = raw_report[0x140:0x160]
    chip_id = raw_report[0x1A0:0x1E0]
    committed_tcb = struct.unpack_from("<Q", raw_report, 0x1E0)[0]
    current_build = struct.unpack_from("<I", raw_report, 0x1E8)[0]
    committed_build = struct.unpack_from("<I", raw_report, 0x1EC)[0]
    launch_tcb = struct.unpack_from("<Q", raw_report, 0x1F0)[0]
    signature = raw_report[0x2A0:0x4A0]
 
    return SNPAttestationReport(
        version=version, guest_svn=guest_svn, policy=policy,
        family_id=family_id, image_id=image_id, vmpl=vmpl,
        signature_algo=signature_algo, current_tcb=current_tcb,
        platform_info=platform_info, author_key_en=author_key_en,
        report_data=report_data, measurement=measurement,
        host_data=host_data, id_key_digest=id_key_digest,
        author_key_digest=author_key_digest, report_id=report_id,
        chip_id=chip_id, committed_tcb=committed_tcb,
        current_build=current_build, committed_build=committed_build,
        launch_tcb=launch_tcb, signature=signature,
    )
 
def audit_snp_report(
    report: SNPAttestationReport,
    expected_measurement: str,
    min_tcb_version: int = 0x03000000000008,
) -> list[str]:
    """
    Audit an SEV-SNP attestation report for security issues.
    Returns a list of findings.
    """
    findings = []
 
    # Check report version
    if report.version < 2:
        findings.append(
            "CRITICAL: Report version < 2. SNP attestation requires v2+."
        )
 
    # Verify measurement matches expected value
    measurement_hex = report.measurement.hex()
    if measurement_hex != expected_measurement:
        findings.append(
            f"CRITICAL: Measurement mismatch. "
            f"Expected {expected_measurement}, got {measurement_hex}. "
            f"The VM image may have been tampered with."
        )
 
    # Check VMPL (Virtual Machine Privilege Level)
    if report.vmpl != 0:
        findings.append(
            f"WARNING: VMPL is {report.vmpl}, not 0. "
            f"Higher VMPL has reduced privileges."
        )
 
    # Check guest policy flags
    # Bit 0: SMT allowed (potential side-channel risk)
    # Bit 1: Reserved
    # Bit 2: Migration allowed (confidentiality risk)
    if report.policy & 0x1:
        findings.append(
            "WARNING: SMT is allowed by guest policy. "
            "Simultaneous multithreading enables side-channel attacks."
        )
    if report.policy & 0x4:
        findings.append(
            "CRITICAL: Migration is allowed by guest policy. "
            "A malicious hypervisor could migrate the VM to extract secrets."
        )
 
    # Check TCB version
    if report.current_tcb < min_tcb_version:
        findings.append(
            f"WARNING: TCB version {report.current_tcb:#x} is below "
            f"minimum {min_tcb_version:#x}. Firmware may have known "
            f"vulnerabilities."
        )
 
    # Check platform info
    if report.platform_info & 0x1:
        findings.append(
            "INFO: Platform reports SMT enabled at hardware level."
        )
 
    if not findings:
        findings.append("PASS: All attestation checks passed.")
 
    return findings
 
if __name__ == "__main__":
    import sys
 
    if len(sys.argv) != 3:
        print(f"Usage: {sys.argv[0]} <report_file> <expected_measurement>")
        sys.exit(1)
 
    raw = Path(sys.argv[1]).read_bytes()
    report = parse_attestation_report(raw)
    findings = audit_snp_report(report, sys.argv[2])
 
    for finding in findings:
        print(finding)

Defense and Mitigation

Defending AI workloads in TEEs requires a layered approach that acknowledges the limitations of hardware-based isolation:

Oblivious computation techniques such as Oblivious RAM (ORAM) can hide memory access patterns at significant performance cost. For AI inference, data-oblivious algorithms ensure that the sequence of memory accesses is independent of the input data, eliminating the primary side-channel vector. The OBLIVIATE framework demonstrated ORAM-based file I/O for SGX that could be applied to model weight loading.

Constant-time implementations for critical operations ensure that execution time does not depend on secret values. While challenging for floating-point neural network operations, techniques such as padding branch execution to constant time and avoiding data-dependent memory access patterns can reduce leakage.

Attestation verification must be performed rigorously before sending sensitive data to a TEE-protected service. This includes:

Verifying the enclave/VM measurement matches a known-good build
Checking that security-relevant configuration flags (SMT policy, migration policy, debug mode) are set correctly
Validating TCB versions against known vulnerability databases
Implementing certificate pinning for attestation verification endpoints

Noise injection adds random dummy memory accesses and computation to obscure the real access patterns. While this does not eliminate side channels, it significantly increases the number of traces an attacker needs to collect, raising the attack cost.

Hardware-software co-design approaches such as NVIDIA's Confidential Computing on H100 GPUs extend the TEE boundary to include GPU memory and computation, addressing the fundamental limitation of CPU-only TEEs for GPU-accelerated AI workloads. Organizations deploying confidential AI should prioritize these newer platforms.

Physical security measures including tamper-evident cases, voltage monitoring, and disabling debug interfaces protect against hardware-level attacks such as Plundervolt and VoltPillager. For cloud deployments, this responsibility falls on the cloud provider, making provider security certifications (SOC 2, Common Criteria) relevant trust signals.

Organizations should also implement defense-in-depth by combining TEEs with other protections: differential privacy to limit what can be learned even if inference inputs leak, model watermarking to detect stolen weights, and comprehensive audit logging of attestation events.

References

Van Bulck, J., Piessens, F., & Strackx, R. (2017). "SGX-Step: A Practical Attack Framework for Precise Enclave Execution Control." Proceedings of the 2nd Workshop on System Software for Trusted Execution (SysTEX). https://github.com/jovanbulck/sgx-step
AMD. (2024). "SEV-SNP ABI Specification." Revision 1.55. https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/56860.pdf
MITRE ATLAS. "Machine Learning Attack Techniques: ML Model Theft via Side Channel." https://atlas.mitre.org/techniques/AML.T0024
Murdock, K., Oswald, D., Garcia, F. D., Van Bulck, J., Gruss, D., & Piessens, F. (2020). "Plundervolt: Software-based Fault Injection Attacks against Intel SGX." Proceedings of the IEEE Symposium on Security and Privacy (S&P).
NVIDIA. (2024). "Confidential Computing on NVIDIA Hopper GPUs." https://docs.nvidia.com/datacenter/tesla/confidential-computing/

Edit this page on GitHub

Trusted Execution Environments for AI Workloads

advanced19 min readUpdated 2026-03-21

Security analysis of Intel SGX, AMD SEV, and ARM TrustZone for protecting AI model inference and training in untrusted environments

infrastructure confidential-computing tee hardware-security side-channels

Overview

TEE Architecture for AI Workloads

Intel SGX: Enclave-Based Model Protection

Memory constraints: SGX enclaves are limited to the Enclave Page Cache (EPC), which was 128MB in earlier generations and up to 512GB in 4th Gen Xeon Scalable processors. Large language models can exceed these limits, requiring paging that introduces performance overhead and side-channel risks.
No GPU support: SGX enclaves run on the CPU only. GPU-accelerated inference must either happen outside the enclave (losing confidentiality) or use software-based inference inside the enclave at significantly reduced performance.
Attestation complexity: Remote attestation proves to a verifier that code is running inside a genuine SGX enclave, but the attestation infrastructure itself (Intel's provisioning service) represents a centralized trust dependency.

# Example: Loading a model into an SGX enclave using Gramine (formerly Graphene)
# gramine-manifest.toml configuration for PyTorch inference
 
"""
Gramine manifest for running PyTorch model inference inside SGX enclave.
This configuration specifies trusted files (model weights) and
enclave memory settings.
"""
 
import subprocess
import json
from pathlib import Path
 
def generate_gramine_manifest(model_path: str, enclave_size_mb: int = 4096) -> str:
    """Generate a Gramine manifest for SGX-protected model inference."""
    manifest = f"""
[loader]
entrypoint = "file:{{{{ gramine.libos }}}}"
log_level = "warning"
 
[libos]
entrypoint = "/usr/bin/python3"
 
[loader.env]
LD_LIBRARY_PATH = "/lib:/usr/lib:/usr/local/lib"
PYTHONPATH = "/usr/local/lib/python3.11/site-packages"
 
[loader.argv]
argv0 = "python3"
argv1 = "inference_server.py"
 
[sys]
# Enclave memory size - must accommodate model + runtime
enclave_size = "{enclave_size_mb}M"
# Number of threads inside enclave
thread_num = 8
 
[sgx]
# Enable SGX debug mode (disable in production)
debug = false
# Remote attestation type
ra_client_spid = ""
ra_client_linkable = false
 
# Trusted files - model weights are measured at load time
[[sgx.trusted_files]]
uri = "file:{model_path}"
 
[[sgx.trusted_files]]
uri = "file:inference_server.py"
 
# Allowed (but unmeasured) files
[[sgx.allowed_files]]
uri = "file:/tmp/inference_input"
 
[[sgx.allowed_files]]
uri = "file:/tmp/inference_output"
"""
    return manifest
 
def verify_sgx_attestation(quote: bytes, expected_mrenclave: str) -> bool:
    """
    Verify an SGX attestation quote against expected enclave measurement.
 
    In production, this would contact Intel's Attestation Service (IAS)
    or use DCAP for local verification.
    """
    # Extract MRENCLAVE from quote (offset 112, 32 bytes in SGX quote structure)
    mrenclave_offset = 112
    mrenclave = quote[mrenclave_offset:mrenclave_offset + 32].hex()
 
    if mrenclave != expected_mrenclave:
        raise SecurityError(
            f"MRENCLAVE mismatch: got {mrenclave}, "
            f"expected {expected_mrenclave}"
        )
 
    # In production: verify quote signature via IAS or DCAP
    # result = dcap_verify(quote)
    # if not result.is_valid:
    #     raise SecurityError("Quote signature verification failed")
 
    return True

AMD SEV: VM-Level Encryption for Training

Key architectural properties for AI red teaming:

Full VM encryption: Model weights, training data, and intermediate activations are all encrypted in DRAM. The hypervisor sees only ciphertext.
GPU considerations: SEV encrypts CPU memory but GPU memory (VRAM) is outside the SEV boundary. NVIDIA's Hopper architecture H100 introduces Confidential Computing mode that extends encryption to GPU memory via a secure channel between CPU and GPU, but this is a recent development with limited deployment.
Performance overhead: SEV-SNP adds approximately 2-5% overhead for compute-bound workloads, which is significantly less than SGX paging overhead for large models.

ARM TrustZone: Edge AI Protection

For red teaming edge AI with TrustZone:

Shared memory attacks: Data must be explicitly shared between Normal and Secure worlds through shared memory buffers. Improper sanitization of these buffers is a common vulnerability.
Trusted Application vulnerabilities: TAs are written in C and frequently contain memory corruption vulnerabilities. A buffer overflow in a TA can compromise the entire Secure World.
Debug interfaces: JTAG and other debug interfaces can bypass TrustZone if not properly disabled in production.

Attack Surfaces and Exploitation Techniques

Side-Channel Attacks on TEE-Protected Models

"""
Demonstration of cache timing side-channel analysis against
TEE-protected neural network inference.
 
This script analyzes timing variations in cache access patterns
to infer which neurons were activated during inference.
"""
 
import numpy as np
import ctypes
import time
from typing import List, Tuple
 
class CacheTimingAnalyzer:
    """
    Analyzes cache timing measurements to extract information
    about TEE-protected model inference.
    """
 
    # Typical L1 cache line size on x86
    CACHE_LINE_SIZE = 64
    # Threshold in CPU cycles to distinguish cache hit vs miss
    # (calibrate per-system; typical values: 100-200 cycles)
    HIT_MISS_THRESHOLD = 150
 
    def __init__(self, num_samples: int = 10000):
        self.num_samples = num_samples
        self.timing_data: List[np.ndarray] = []
 
    def prime_probe_round(
        self,
        monitored_cache_sets: List[int],
        wait_callback: callable,
    ) -> np.ndarray:
        """
        Execute one round of Prime+Probe on specified cache sets.
 
        1. PRIME: Fill target cache sets with attacker's data
        2. WAIT: Let the victim (enclave) execute
        3. PROBE: Measure access time to attacker's data
           - Slow access = victim evicted the line (accessed that cache set)
           - Fast access = victim did not use that cache set
        """
        num_sets = len(monitored_cache_sets)
        timings = np.zeros(num_sets, dtype=np.int64)
 
        # In a real attack, this would use rdtsc and carefully
        # crafted eviction sets. This is a conceptual demonstration.
        for i, cache_set in enumerate(monitored_cache_sets):
            # Prime phase: access attacker's buffer at addresses
            # mapping to the target cache set
            # (omitted: actual memory access with precise addressing)
 
            # Wait for victim computation
            wait_callback()
 
            # Probe phase: time re-access to attacker's buffer
            start = time.perf_counter_ns()
            # (omitted: actual memory access)
            elapsed = time.perf_counter_ns() - start
            timings[i] = elapsed
 
        return timings
 
    def collect_traces(
        self,
        monitored_sets: List[int],
        trigger_inference: callable,
    ) -> np.ndarray:
        """Collect multiple timing traces across inference invocations."""
        traces = []
        for _ in range(self.num_samples):
            timings = self.prime_probe_round(
                monitored_sets,
                wait_callback=trigger_inference,
            )
            traces.append(timings)
        return np.array(traces)
 
    def analyze_neuron_activation(
        self,
        traces: np.ndarray,
    ) -> np.ndarray:
        """
        Infer neuron activation patterns from cache timing traces.
 
        Cache sets that consistently show evictions (slow access)
        during inference correspond to weight matrix regions that
        were accessed, indicating activated neurons.
        """
        # Classify each measurement as hit (0) or miss (1)
        hit_miss = (traces > self.HIT_MISS_THRESHOLD).astype(np.float32)
 
        # Average miss rate per cache set across all samples
        miss_rates = hit_miss.mean(axis=0)
 
        # High miss rate = frequently accessed by enclave = active neuron
        # Low miss rate = not accessed = inactive neuron (e.g., ReLU zeroed)
        activation_threshold = 0.5
        inferred_activations = (miss_rates > activation_threshold).astype(int)
 
        return inferred_activations
 
    def reconstruct_weight_signs(
        self,
        traces_positive_input: np.ndarray,
        traces_negative_input: np.ndarray,
    ) -> np.ndarray:
        """
        Infer weight signs by comparing activation patterns
        for positive vs negative inputs.
 
        If a neuron activates for positive input but not negative,
        the corresponding weight is likely positive (assuming ReLU).
        """
        pos_activations = self.analyze_neuron_activation(traces_positive_input)
        neg_activations = self.analyze_neuron_activation(traces_negative_input)
 
        # +1 = positive weight, -1 = negative weight, 0 = indeterminate
        weight_signs = pos_activations.astype(int) - neg_activations.astype(int)
        return weight_signs

Controlled-Channel Attacks

For AI inference, this is devastating because:

The sequence of accessed code pages reveals the model architecture (which layers execute in what order).
The sequence of accessed data pages reveals which weight matrices are accessed, leaking the model structure.
Input-dependent memory access patterns (such as those in attention mechanisms or conditional computation) directly leak information about the input.

#!/usr/bin/env bash
# Controlled-channel attack setup using SGX-Step framework
# SGX-Step provides precise single-stepping of enclave execution
# Reference: Van Bulck et al., "SGX-Step: A Practical Attack Framework
# for Precise Enclave Execution Control", SysTEX 2017
 
# Clone and build SGX-Step (requires SGX hardware and Linux kernel module)
git clone https://github.com/jovanbulck/sgx-step.git
cd sgx-step
 
# Build the kernel module for precise interrupt control
cd kernel
make
sudo insmod sgx-step.ko
 
# The attack uses APIC timer interrupts to single-step enclave execution.
# Each step reveals the instruction page and data page accessed.
# For a neural network inference:
#   - Code page sequence -> model architecture (layer types, order)
#   - Data page sequence -> weight access pattern -> weight values (partial)
 
# Example: Configure page-level monitoring for a target enclave
cd ../app
cat > config.h << 'CEOF'
#define VICTIM_ENCLAVE_PATH "/opt/ai-service/enclave.signed.so"
// Monitor data pages in the weight matrix region
#define MONITOR_DATA_START  0x7f0000000000ULL
#define MONITOR_DATA_END    0x7f0040000000ULL  // 1GB region for model weights
// Interrupt frequency: once per ~100 instructions for coarse tracing
// or once per instruction for fine-grained extraction
#define SINGLE_STEP_MODE    0  // Set to 1 for instruction-level granularity
CEOF
 
echo "Build the controlled-channel monitor:"
make
echo "Run with: sudo ./app/monitor --enclave-pid <target_pid>"

Voltage and Frequency Manipulation (Plundervolt/VoltPillager)

For AI workloads specifically:

Fault injection during inference: By inducing bit flips in floating-point operations, an attacker can cause misclassification or extract model information through differential fault analysis.
Training corruption: Faults during gradient computation can steer model training toward attacker-chosen outcomes without modifying the training data.

Practical Examples

Extracting Model Architecture from Page Access Patterns

This example demonstrates how to analyze page fault logs from a controlled-channel attack to reconstruct the architecture of a neural network running inside an SGX enclave.

"""
Reconstruct neural network architecture from page fault traces
collected during controlled-channel attack on SGX enclave.
 
Each page fault reveals a 4KB-aligned memory access. By mapping
these to code and data regions of the inference binary, we can
determine layer types, sizes, and execution order.
"""
 
from dataclasses import dataclass
from enum import Enum
from typing import List, Optional
import struct
 
class LayerType(Enum):
    CONV2D = "conv2d"
    LINEAR = "linear"
    BATCHNORM = "batchnorm"
    RELU = "relu"
    ATTENTION = "attention"
    UNKNOWN = "unknown"
 
@dataclass
class InferredLayer:
    layer_type: LayerType
    estimated_params: int
    page_range: tuple[int, int]
    code_pages: List[int]
 
# Known code page signatures for common operations in PyTorch/ONNX Runtime
# These are determined by reverse-engineering the inference binary
LAYER_SIGNATURES = {
    # GEMM kernel touches specific code pages in MKL/oneDNN
    LayerType.LINEAR: {
        "code_pattern": [0x1000, 0x1001, 0x1002],  # MKL SGEMM pages
        "data_pattern": "sequential",  # Linear weight access is sequential
    },
    LayerType.CONV2D: {
        "code_pattern": [0x2000, 0x2001, 0x2002, 0x2003],  # im2col + GEMM
        "data_pattern": "strided",  # Filter weights accessed with stride
    },
    LayerType.ATTENTION: {
        "code_pattern": [0x1000, 0x1001, 0x3000, 0x3001],  # GEMM + softmax
        "data_pattern": "qkv_triple",  # Three weight matrices accessed
    },
}
 
def parse_page_fault_log(log_path: str) -> List[tuple[int, int, str]]:
    """
    Parse page fault log from SGX-Step controlled-channel attack.
    Returns list of (timestamp, page_address, access_type) tuples.
    """
    events = []
    with open(log_path, "r") as f:
        for line in f:
            parts = line.strip().split(",")
            if len(parts) != 3:
                continue
            timestamp = int(parts[0])
            page_addr = int(parts[1], 16)
            access_type = parts[2]  # "code" or "data"
            events.append((timestamp, page_addr, access_type))
    return events
 
def identify_layer_boundaries(
    events: List[tuple[int, int, str]],
) -> List[tuple[int, int]]:
    """
    Identify layer boundaries by detecting transitions between
    code regions. Each layer type uses a distinct set of code pages.
    """
    boundaries = []
    current_code_set = set()
    boundary_start = 0
 
    for i, (ts, addr, atype) in enumerate(events):
        if atype == "code":
            if addr not in current_code_set and len(current_code_set) > 2:
                # Significant change in code pages = new layer
                boundaries.append((boundary_start, i))
                boundary_start = i
                current_code_set = set()
            current_code_set.add(addr)
 
    if boundary_start < len(events):
        boundaries.append((boundary_start, len(events)))
 
    return boundaries
 
def infer_architecture(log_path: str) -> List[InferredLayer]:
    """
    Full pipeline: parse page faults -> identify layers -> classify.
    """
    events = parse_page_fault_log(log_path)
    boundaries = identify_layer_boundaries(events)
    layers = []
 
    for start, end in boundaries:
        segment = events[start:end]
        code_pages = sorted(set(
            addr for _, addr, atype in segment if atype == "code"
        ))
        data_pages = sorted(set(
            addr for _, addr, atype in segment if atype == "data"
        ))
 
        # Match code pages against known signatures
        layer_type = LayerType.UNKNOWN
        for ltype, sig in LAYER_SIGNATURES.items():
            pattern = sig["code_pattern"]
            if all(p in code_pages for p in pattern):
                layer_type = ltype
                break
 
        # Estimate parameter count from number of unique data pages
        # Each page is 4KB; fp32 weights are 4 bytes each
        estimated_params = len(data_pages) * (4096 // 4)
 
        layers.append(InferredLayer(
            layer_type=layer_type,
            estimated_params=estimated_params,
            page_range=(data_pages[0] if data_pages else 0,
                       data_pages[-1] if data_pages else 0),
            code_pages=code_pages,
        ))
 
    return layers
 
if __name__ == "__main__":
    import sys
    if len(sys.argv) != 2:
        print(f"Usage: {sys.argv[0]} <page_fault_log>")
        sys.exit(1)
 
    layers = infer_architecture(sys.argv[1])
    print(f"Inferred {len(layers)} layers:")
    for i, layer in enumerate(layers):
        print(
            f"  Layer {i}: {layer.layer_type.value}, "
            f"~{layer.estimated_params:,} params, "
            f"pages {layer.page_range[0]:#x}-{layer.page_range[1]:#x}"
        )

Auditing SEV-SNP Attestation in Cloud Deployments

"""
Audit script for verifying AMD SEV-SNP attestation reports
in confidential AI VM deployments.
 
Checks that the VM is running with expected security properties:
- SNP is enabled (not just SEV or SEV-ES)
- Firmware version meets minimum requirements
- Guest policy enforces migration restrictions
- The TCB (Trusted Computing Base) version is current
"""
 
import hashlib
import struct
import json
from pathlib import Path
from dataclasses import dataclass
from typing import Optional
 
@dataclass
class SNPAttestationReport:
    """Parsed AMD SEV-SNP attestation report fields."""
    version: int
    guest_svn: int
    policy: int
    family_id: bytes
    image_id: bytes
    vmpl: int
    signature_algo: int
    current_tcb: int
    platform_info: int
    author_key_en: int
    report_data: bytes
    measurement: bytes  # 384-bit hash of initial guest memory
    host_data: bytes
    id_key_digest: bytes
    author_key_digest: bytes
    report_id: bytes
    chip_id: bytes
    committed_tcb: int
    current_build: int
    committed_build: int
    launch_tcb: int
    signature: bytes
 
def parse_attestation_report(raw_report: bytes) -> SNPAttestationReport:
    """
    Parse a raw SEV-SNP attestation report (0x4A0 bytes).
    Reference: AMD SEV-SNP ABI Specification, Table 21.
    """
    if len(raw_report) < 0x4A0:
        raise ValueError(
            f"Report too short: {len(raw_report)} bytes, "
            f"expected at least 0x4A0"
        )
 
    version = struct.unpack_from("<I", raw_report, 0x0)[0]
    guest_svn = struct.unpack_from("<I", raw_report, 0x4)[0]
    policy = struct.unpack_from("<Q", raw_report, 0x8)[0]
    family_id = raw_report[0x10:0x20]
    image_id = raw_report[0x20:0x30]
    vmpl = struct.unpack_from("<I", raw_report, 0x30)[0]
    signature_algo = struct.unpack_from("<I", raw_report, 0x34)[0]
    current_tcb = struct.unpack_from("<Q", raw_report, 0x38)[0]
    platform_info = struct.unpack_from("<Q", raw_report, 0x40)[0]
    author_key_en = struct.unpack_from("<I", raw_report, 0x48)[0]
    report_data = raw_report[0x50:0x90]
    measurement = raw_report[0x90:0xC0]
    host_data = raw_report[0xC0:0xE0]
    id_key_digest = raw_report[0xE0:0x110]
    author_key_digest = raw_report[0x110:0x140]
    report_id = raw_report[0x140:0x160]
    chip_id = raw_report[0x1A0:0x1E0]
    committed_tcb = struct.unpack_from("<Q", raw_report, 0x1E0)[0]
    current_build = struct.unpack_from("<I", raw_report, 0x1E8)[0]
    committed_build = struct.unpack_from("<I", raw_report, 0x1EC)[0]
    launch_tcb = struct.unpack_from("<Q", raw_report, 0x1F0)[0]
    signature = raw_report[0x2A0:0x4A0]
 
    return SNPAttestationReport(
        version=version, guest_svn=guest_svn, policy=policy,
        family_id=family_id, image_id=image_id, vmpl=vmpl,
        signature_algo=signature_algo, current_tcb=current_tcb,
        platform_info=platform_info, author_key_en=author_key_en,
        report_data=report_data, measurement=measurement,
        host_data=host_data, id_key_digest=id_key_digest,
        author_key_digest=author_key_digest, report_id=report_id,
        chip_id=chip_id, committed_tcb=committed_tcb,
        current_build=current_build, committed_build=committed_build,
        launch_tcb=launch_tcb, signature=signature,
    )
 
def audit_snp_report(
    report: SNPAttestationReport,
    expected_measurement: str,
    min_tcb_version: int = 0x03000000000008,
) -> list[str]:
    """
    Audit an SEV-SNP attestation report for security issues.
    Returns a list of findings.
    """
    findings = []
 
    # Check report version
    if report.version < 2:
        findings.append(
            "CRITICAL: Report version < 2. SNP attestation requires v2+."
        )
 
    # Verify measurement matches expected value
    measurement_hex = report.measurement.hex()
    if measurement_hex != expected_measurement:
        findings.append(
            f"CRITICAL: Measurement mismatch. "
            f"Expected {expected_measurement}, got {measurement_hex}. "
            f"The VM image may have been tampered with."
        )
 
    # Check VMPL (Virtual Machine Privilege Level)
    if report.vmpl != 0:
        findings.append(
            f"WARNING: VMPL is {report.vmpl}, not 0. "
            f"Higher VMPL has reduced privileges."
        )
 
    # Check guest policy flags
    # Bit 0: SMT allowed (potential side-channel risk)
    # Bit 1: Reserved
    # Bit 2: Migration allowed (confidentiality risk)
    if report.policy & 0x1:
        findings.append(
            "WARNING: SMT is allowed by guest policy. "
            "Simultaneous multithreading enables side-channel attacks."
        )
    if report.policy & 0x4:
        findings.append(
            "CRITICAL: Migration is allowed by guest policy. "
            "A malicious hypervisor could migrate the VM to extract secrets."
        )
 
    # Check TCB version
    if report.current_tcb < min_tcb_version:
        findings.append(
            f"WARNING: TCB version {report.current_tcb:#x} is below "
            f"minimum {min_tcb_version:#x}. Firmware may have known "
            f"vulnerabilities."
        )
 
    # Check platform info
    if report.platform_info & 0x1:
        findings.append(
            "INFO: Platform reports SMT enabled at hardware level."
        )
 
    if not findings:
        findings.append("PASS: All attestation checks passed.")
 
    return findings
 
if __name__ == "__main__":
    import sys
 
    if len(sys.argv) != 3:
        print(f"Usage: {sys.argv[0]} <report_file> <expected_measurement>")
        sys.exit(1)
 
    raw = Path(sys.argv[1]).read_bytes()
    report = parse_attestation_report(raw)
    findings = audit_snp_report(report, sys.argv[2])
 
    for finding in findings:
        print(finding)

Defense and Mitigation

Defending AI workloads in TEEs requires a layered approach that acknowledges the limitations of hardware-based isolation:

Attestation verification must be performed rigorously before sending sensitive data to a TEE-protected service. This includes:

Verifying the enclave/VM measurement matches a known-good build
Checking that security-relevant configuration flags (SMT policy, migration policy, debug mode) are set correctly
Validating TCB versions against known vulnerability databases
Implementing certificate pinning for attestation verification endpoints

References

Van Bulck, J., Piessens, F., & Strackx, R. (2017). "SGX-Step: A Practical Attack Framework for Precise Enclave Execution Control." Proceedings of the 2nd Workshop on System Software for Trusted Execution (SysTEX). https://github.com/jovanbulck/sgx-step
AMD. (2024). "SEV-SNP ABI Specification." Revision 1.55. https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/56860.pdf
MITRE ATLAS. "Machine Learning Attack Techniques: ML Model Theft via Side Channel." https://atlas.mitre.org/techniques/AML.T0024
Murdock, K., Oswald, D., Garcia, F. D., Van Bulck, J., Gruss, D., & Piessens, F. (2020). "Plundervolt: Software-based Fault Injection Attacks against Intel SGX." Proceedings of the IEEE Symposium on Security and Privacy (S&P).
NVIDIA. (2024). "Confidential Computing on NVIDIA Hopper GPUs." https://docs.nvidia.com/datacenter/tesla/confidential-computing/

Edit this page on GitHub

Trusted Execution Environments for AI Workloads

Related articles

Trusted Execution Environments for AI Workloads

Related articles