Zero Trust Architecture for AI Infrastructure

Intermediate18 min readUpdated 2026-03-21

Implementing and attacking zero trust principles across ML training pipelines, inference endpoints, and model registries

infrastructure zero-trust network-security identity microsegmentation

概覽

Zero trust architecture (ZTA) operates on the principle that no network location, user, or system should be inherently trusted. Every access request must be authenticated, authorized, and continuously validated regardless of where it originates. While zero trust has been widely adopted for traditional enterprise infrastructure, its application to AI systems introduces unique challenges that create gaps attackers can 利用.

AI infrastructure has characteristics that strain zero trust implementations. Training clusters require high-bandwidth, low-latency GPU-to-GPU communication (often via RDMA/InfiniBand) that is difficult to intercept and inspect without introducing unacceptable performance overhead. Model artifacts are large (hundreds of gigabytes for modern LLMs) and must be transferred between registries, 訓練 systems, and serving infrastructure — creating pressure to bypass 安全 controls for performance. Feature stores, experiment trackers, and data pipelines often use service-to-service 認證 with long-lived credentials 因為 the overhead of 符元 rotation is seen as impractical for long-running 訓練 jobs.

Inference endpoints must respond in milliseconds, making per-request 授權 checks a performance concern.

These tensions between 安全 and performance create predictable gaps in zero trust implementations that red teamers can 識別 and 利用. This article examines how to apply zero trust principles to AI infrastructure, where implementations typically fall short, and how attackers target those gaps. The content aligns with NIST SP 800-207 (Zero Trust Architecture) and NIST AI RMF for AI-specific risk considerations.

Zero Trust Principles Applied to AI Infrastructure

Identity: Every Component Gets a Verifiable Identity

In a zero trust AI infrastructure, every component — from 訓練 jobs to 推論 endpoints to data pipelines — must have a cryptographically verifiable identity. This goes beyond user 認證 to include workload identity for automated processes.

SPIFFE (Secure Production Identity Framework For Everyone) provides a standard for workload identity that is well-suited to AI infrastructure. Each workload receives a SPIFFE Verifiable Identity Document (SVID), typically an X.509 certificate, that encodes its identity as a URI (e.g., spiffe://ai-platform/訓練/job-12345).

"""
SPIFFE-based workload identity verification for AI pipeline components.
Demonstrates how to verify that a 訓練 job, model registry, or
推論 endpoint has a valid identity before allowing access.
"""
 
import ssl
import socket
import json
from dataclasses import dataclass
from typing import Optional
from urllib.parse import urlparse
from cryptography import x509
from cryptography.x509.oid import ExtensionOID, NameOID
from cryptography.hazmat.primitives import hashes
 
 
@dataclass
class WorkloadIdentity:
    """Parsed SPIFFE identity from an X.509 SVID."""
    spiffe_id: str
    trust_domain: str
    workload_path: str
    certificate_hash: str
    not_valid_after: str
 
    @property
    def component_type(self) -> str:
        """Extract AI component type from SPIFFE path."""
        parts = self.workload_path.strip("/").split("/")
        if parts:
            return parts[0]  # e.g., "訓練", "推論", "registry"
        return "unknown"
 
 
def extract_spiffe_id(cert: x509.Certificate) -> Optional[WorkloadIdentity]:
    """
    Extract SPIFFE ID from X.509 certificate SAN extension.
    SPIFFE IDs are encoded as URI SANs in the format:
    spiffe://<trust-domain>/<workload-path>
    """
    try:
        san_ext = cert.extensions.get_extension_for_oid(
            ExtensionOID.SUBJECT_ALTERNATIVE_NAME
        )
        san = san_ext.value
 
        for uri in san.get_values_for_type(x509.UniformResourceIdentifier):
            if uri.startswith("spiffe://"):
                parsed = urlparse(uri)
                cert_hash = cert.fingerprint(hashes.SHA256()).hex()
                return WorkloadIdentity(
                    spiffe_id=uri,
                    trust_domain=parsed.hostname or "",
                    workload_path=parsed.path,
                    certificate_hash=cert_hash,
                    not_valid_after=str(cert.not_valid_after_utc),
                )
    except x509.ExtensionNotFound:
        return None
 
    return None
 
 
class AIZeroTrustVerifier:
    """
    Verify workload identity and enforce access policies
    for AI infrastructure components.
    """
 
    # Access control matrix: which components can access which
    ACCESS_POLICIES = {
        "訓練": {
            "allowed_targets": [
                "data-store",
                "registry",
                "experiment-tracker",
                "feature-store",
            ],
            "denied_targets": ["推論", "監控-admin"],
        },
        "推論": {
            "allowed_targets": [
                "registry",  # Read-only for model loading
                "feature-store",  # For feature retrieval
            ],
            "denied_targets": [
                "訓練",
                "data-store",  # Inference should not access raw 訓練資料
            ],
        },
        "registry": {
            "allowed_targets": ["data-store"],
            "denied_targets": ["訓練", "推論"],
        },
        "pipeline": {
            "allowed_targets": [
                "訓練",
                "registry",
                "data-store",
                "feature-store",
            ],
            "denied_targets": ["推論"],
        },
    }
 
    def __init__(self, trust_domain: str):
        self.trust_domain = trust_domain
 
    def verify_access(
        self,
        source: WorkloadIdentity,
        target_component: str,
    ) -> tuple[bool, str]:
        """
        Verify if a source workload is allowed to access a target component.
        Returns (allowed, reason).
        """
        # Verify trust domain
        if source.trust_domain != self.trust_domain:
            return False, (
                f"Trust domain mismatch: {source.trust_domain} "
                f"!= {self.trust_domain}"
            )
 
        # Look up policy for source component type
        policy = self.ACCESS_POLICIES.get(source.component_type)
        if policy is None:
            return False, (
                f"No policy defined for component type: "
                f"{source.component_type}"
            )
 
        if target_component in policy.get("denied_targets", []):
            return False, (
                f"{source.component_type} is explicitly denied "
                f"access to {target_component}"
            )
 
        if target_component in policy.get("allowed_targets", []):
            return True, "Access permitted by policy"
 
        # Default deny
        return False, (
            f"No explicit allow for {source.component_type} -> "
            f"{target_component}"
        )
 
    def audit_connection(
        self,
        peer_cert_pem: bytes,
        target_component: str,
    ) -> dict:
        """
        Full audit of an incoming connection:
        1. Parse certificate
        2. Extract SPIFFE identity
        3. Check access policy
        """
        cert = x509.load_pem_x509_certificate(peer_cert_pem)
        identity = extract_spiffe_id(cert)
 
        if identity is None:
            return {
                "allowed": False,
                "reason": "No SPIFFE ID in certificate",
                "identity": None,
            }
 
        allowed, reason = self.verify_access(identity, target_component)
        return {
            "allowed": allowed,
            "reason": reason,
            "identity": {
                "spiffe_id": identity.spiffe_id,
                "component_type": identity.component_type,
                "cert_hash": identity.certificate_hash,
            },
        }

Microsegmentation for AI Networks

AI 訓練 clusters typically use high-speed interconnects (InfiniBand, RoCE) for GPU-to-GPU communication during distributed 訓練. These networks are often treated as trusted 因為 of the performance sensitivity of collective operations (AllReduce, AllGather). This creates a significant blind spot in zero trust implementations.

The InfiniBand trust gap: InfiniBand networks used in GPU clusters do not support the same network policy enforcement available in Ethernet-based Kubernetes networks. Tools like Calico and Cilium can enforce microsegmentation for pod-to-pod Ethernet traffic, but InfiniBand traffic bypasses the kernel networking stack entirely through RDMA, making it invisible to eBPF-based network policies.

"""
Audit script for identifying zero trust gaps in AI infrastructure
network segmentation, with focus on GPU cluster interconnects.
"""
 
import subprocess
import json
import re
from typing import Optional
 
 
def audit_kubernetes_network_policies(namespace: str = "ml-platform") -> list[dict]:
    """
    Audit Kubernetes network policies for AI workload namespaces.
    Identifies missing policies that would allow unrestricted
    communication between components.
    """
    findings = []
 
    # Get all pods in the namespace
    result = subprocess.run(
        ["kubectl", "get", "pods", "-n", namespace, "-o", "json"],
        capture_output=True, text=True, timeout=30,
    )
    pods = json.loads(result.stdout)
 
    # Get network policies
    result = subprocess.run(
        ["kubectl", "get", "networkpolicies", "-n", namespace, "-o", "json"],
        capture_output=True, text=True, timeout=30,
    )
    policies = json.loads(result.stdout)
 
    # Check if default-deny exists
    has_default_deny = any(
        policy["metadata"]["name"].startswith("default-deny")
        for policy in policies.get("items", [])
    )
 
    if not has_default_deny:
        findings.append({
            "severity": "HIGH",
            "title": f"No default-deny policy in namespace {namespace}",
            "detail": (
                "Without a default-deny ingress/egress policy, all pods "
                "can communicate freely. AI components (訓練, 推論, "
                "registry) should be isolated by default."
            ),
        })
 
    # Check for pods with host networking
    for pod in pods.get("items", []):
        pod_name = pod["metadata"]["name"]
        spec = pod.get("spec", {})
 
        if spec.get("hostNetwork", False):
            findings.append({
                "severity": "HIGH",
                "title": f"Pod {pod_name} uses host networking",
                "detail": (
                    "Host networking bypasses all Kubernetes network "
                    "policies. This pod has unrestricted network access "
                    "to the node and potentially the InfiniBand fabric."
                ),
            })
 
        # Check for privileged containers (common for GPU workloads)
        for container in spec.get("containers", []):
            sec_ctx = container.get("securityContext", {})
            if sec_ctx.get("privileged", False):
                findings.append({
                    "severity": "HIGH",
                    "title": (
                        f"Privileged container: {pod_name}/"
                        f"{container['name']}"
                    ),
                    "detail": (
                        "Privileged containers can access all host "
                        "devices including InfiniBand HCAs, bypass "
                        "network namespaces, and escape container "
                        "isolation."
                    ),
                })
 
    # Check for RDMA/InfiniBand device access
    for pod in pods.get("items", []):
        pod_name = pod["metadata"]["name"]
        for container in pod.get("spec", {}).get("containers", []):
            resources = container.get("resources", {})
            limits = resources.get("limits", {})
            requests = resources.get("requests", {})
            all_resources = {**limits, **requests}
 
            for resource_name in all_resources:
                if "rdma" in resource_name or "infiniband" in resource_name:
                    findings.append({
                        "severity": "MEDIUM",
                        "title": (
                            f"RDMA device access: {pod_name}/"
                            f"{container['name']}"
                        ),
                        "detail": (
                            f"Container requests {resource_name}. RDMA "
                            f"traffic bypasses kernel networking and is "
                            f"not subject to NetworkPolicy enforcement."
                        ),
                    })
 
    return findings
 
 
def check_service_mesh_coverage(namespace: str = "ml-platform") -> list[dict]:
    """
    Verify that a service mesh (Istio/Linkerd) covers AI workloads
    and that mTLS is enforced.
    """
    findings = []
 
    # Check for Istio sidecar injection
    result = subprocess.run(
        [
            "kubectl", "get", "pods", "-n", namespace,
            "-o", "jsonpath={range .items[*]}{.metadata.name}{"
            "\\t}{.spec.containers[*].name}{\\n}{end}",
        ],
        capture_output=True, text=True, timeout=30,
    )
 
    for line in result.stdout.strip().split("\n"):
        if not line.strip():
            continue
        parts = line.split("\t")
        if len(parts) < 2:
            continue
        pod_name = parts[0]
        containers = parts[1].split()
 
        has_sidecar = any(
            c in containers
            for c in ["istio-proxy", "linkerd-proxy", "envoy-sidecar"]
        )
 
        if not has_sidecar:
            findings.append({
                "severity": "MEDIUM",
                "title": f"No service mesh sidecar: {pod_name}",
                "detail": (
                    "This pod communicates without mTLS enforcement. "
                    "Traffic can be intercepted or spoofed by adjacent "
                    "workloads."
                ),
            })
 
    # Check Istio PeerAuthentication policy
    result = subprocess.run(
        [
            "kubectl", "get", "peerauthentication", "-n", namespace,
            "-o", "json",
        ],
        capture_output=True, text=True, timeout=30,
    )
 
    if result.returncode == 0:
        pa_policies = json.loads(result.stdout)
        strict_mtls = any(
            policy.get("spec", {}).get("mtls", {}).get("mode") == "STRICT"
            for policy in pa_policies.get("items", [])
        )
        if not strict_mtls:
            findings.append({
                "severity": "HIGH",
                "title": "mTLS not set to STRICT mode",
                "detail": (
                    "PERMISSIVE mTLS allows plaintext connections. "
                    "攻擊者 in the mesh can intercept 推論 "
                    "requests, model weights, and 訓練資料."
                ),
            })
 
    return findings

Attacking Zero Trust Gaps in AI Pipelines

Exploiting Implicit Trust Between Pipeline Stages

ML pipelines (built with tools like Kubeflow, Airflow, or custom systems) often establish trust between stages implicitly. A 訓練 stage produces a model artifact that the 評估 stage consumes, and the 評估 stage's approval triggers deployment. If the pipeline trusts artifacts from previous stages without verification, 攻擊者 who compromises any single stage can propagate through the entire pipeline.

"""
Demonstrate trust boundary violations in ML pipelines.
This script identifies pipeline stages that accept artifacts
from upstream stages without integrity verification.
"""
 
import hashlib
import json
import os
from pathlib import Path
from dataclasses import dataclass
from typing import Optional
 
 
@dataclass
class PipelineArtifact:
    """Represents an artifact passed between pipeline stages."""
    stage_name: str
    artifact_path: str
    expected_hash: Optional[str]
    actual_hash: Optional[str]
    is_signed: bool
    signature_valid: Optional[bool]
 
 
def audit_pipeline_artifacts(
    pipeline_run_dir: str,
) -> list[dict]:
    """
    Audit artifacts in a pipeline run directory for
    integrity verification gaps.
    """
    findings = []
    run_path = Path(pipeline_run_dir)
 
    if not run_path.exists():
        return [{"severity": "ERROR", "title": "Pipeline run directory not found",
                 "detail": f"{pipeline_run_dir} does not exist"}]
 
    # Look for common pipeline metadata files
    metadata_files = list(run_path.rglob("**/metadata.json")) + \
                     list(run_path.rglob("**/artifact_info.json"))
 
    for meta_file in metadata_files:
        try:
            with open(meta_file) as f:
                metadata = json.load(f)
        except (json.JSONDecodeError, IOError):
            continue
 
        stage_name = metadata.get("stage", meta_file.parent.name)
        artifacts = metadata.get("output_artifacts", [])
 
        for artifact in artifacts:
            art_path = artifact.get("path", "")
            has_hash = "sha256" in artifact or "hash" in artifact
            has_signature = "signature" in artifact
 
            if not has_hash:
                findings.append({
                    "severity": "HIGH",
                    "title": f"No integrity hash: {stage_name}/{art_path}",
                    "detail": (
                        f"Artifact from stage '{stage_name}' has no hash. "
                        f"A compromised upstream stage could substitute "
                        f"a malicious artifact (e.g., poisoned model weights)."
                    ),
                })
 
            if not has_signature:
                findings.append({
                    "severity": "MEDIUM",
                    "title": f"No signature: {stage_name}/{art_path}",
                    "detail": (
                        f"Artifact is not cryptographically signed. "
                        f"Even with a hash, the hash itself could be "
                        f"modified by a compromised pipeline controller."
                    ),
                })
 
            # Verify hash if present
            if has_hash:
                expected = artifact.get("sha256") or artifact.get("hash")
                full_path = run_path / art_path
                if full_path.exists():
                    actual = hashlib.sha256(
                        full_path.read_bytes()
                    ).hexdigest()
                    if actual != expected:
                        findings.append({
                            "severity": "CRITICAL",
                            "title": (
                                f"Hash mismatch: {stage_name}/{art_path}"
                            ),
                            "detail": (
                                f"Expected {expected}, got {actual}. "
                                f"Artifact may have been tampered with."
                            ),
                        })
 
    # Check for credential passing between stages
    env_files = list(run_path.rglob("**/.env")) + \
                list(run_path.rglob("**/secrets.*"))
    for env_file in env_files:
        findings.append({
            "severity": "HIGH",
            "title": f"Credentials in pipeline artifacts: {env_file}",
            "detail": (
                "Secrets stored in pipeline artifacts can be accessed "
                "by downstream stages and persisted in artifact storage."
            ),
        })
 
    return findings

Token and Credential 攻擊

Long-running 訓練 jobs often use service account 符元 or API keys with extended validity. In zero trust architectures, these should be short-lived and continuously validated. Common gaps include:

Static service account 符元 in Kubernetes that do not expire (pre-v1.24 default behavior)
雲端 IAM roles with overly broad 權限 attached to 訓練 node pools
Model registry credentials embedded in pipeline configurations
Experiment tracking API keys shared across all team members

攻擊者 who obtains a 訓練 job's credentials gains access to everything that 訓練 job can access: 訓練資料, 模型 registry, experiment tracking, and potentially other 雲端 resources through role chaining or federation.

"""
Credential exposure analysis for AI workloads in Kubernetes.
Identifies overly broad credentials, long-lived 符元, and
credential sharing patterns that violate zero trust principles.
"""
 
import subprocess
import json
import base64
from typing import Any
 
 
def audit_ai_credentials(namespace: str = "ml-platform") -> list[dict]:
    """
    Audit credentials available to AI workloads for zero trust
    violations: excessive scope, long validity, and sharing.
    """
    findings = []
 
    # Get all service accounts in the namespace
    result = subprocess.run(
        ["kubectl", "get", "serviceaccounts", "-n", namespace, "-o", "json"],
        capture_output=True, text=True, timeout=30,
    )
    if result.returncode != 0:
        return findings
 
    service_accounts = json.loads(result.stdout)
 
    for sa in service_accounts.get("items", []):
        sa_name = sa["metadata"]["name"]
 
        # Check for mounted secrets
        secrets = sa.get("secrets", [])
        if len(secrets) > 0:
            findings.append({
                "severity": "MEDIUM",
                "title": f"Service account has bound secrets: {sa_name}",
                "detail": (
                    f"SA {sa_name} has {len(secrets)} bound secrets. "
                    f"In zero trust, prefer projected service account 符元 "
                    f"with expiration over static secrets."
                ),
            })
 
        # Check annotations for 雲端 IAM bindings
        annotations = sa.get("metadata", {}).get("annotations", {})
 
        # GKE Workload Identity
        gke_sa = annotations.get(
            "iam.gke.io/gcp-service-account", ""
        )
        if gke_sa:
            findings.append({
                "severity": "INFO",
                "title": f"GKE Workload Identity binding: {sa_name}",
                "detail": f"Bound to GCP SA: {gke_sa}. Verify scope is minimal.",
            })
 
        # EKS IRSA
        eks_role = annotations.get(
            "eks.amazonaws.com/role-arn", ""
        )
        if eks_role:
            findings.append({
                "severity": "INFO",
                "title": f"EKS IRSA binding: {sa_name}",
                "detail": f"Bound to IAM role: {eks_role}. Verify role policy scope.",
            })
 
    # Check for pods with environment variable credentials
    pods_result = subprocess.run(
        ["kubectl", "get", "pods", "-n", namespace, "-o", "json"],
        capture_output=True, text=True, timeout=30,
    )
    if pods_result.returncode == 0:
        pods = json.loads(pods_result.stdout)
        sensitive_env_patterns = [
            "KEY", "SECRET", "PASSWORD", "TOKEN", "CREDENTIAL",
            "API_KEY", "ACCESS_KEY", "PRIVATE_KEY",
        ]
 
        for pod in pods.get("items", []):
            pod_name = pod["metadata"]["name"]
            for container in pod.get("spec", {}).get("containers", []):
                for env in container.get("env", []):
                    env_name = env.get("name", "").upper()
                    if any(p in env_name for p in sensitive_env_patterns):
                        # Check if it's from a secret reference (better)
                        # or a plaintext value (worse)
                        if "value" in env and env["value"]:
                            findings.append({
                                "severity": "CRITICAL",
                                "title": (
                                    f"Hardcoded credential: {pod_name} "
                                    f"env {env['name']}"
                                ),
                                "detail": (
                                    "Credential is hardcoded in pod spec "
                                    "as a plaintext value. Use Kubernetes "
                                    "secrets with projected volumes or "
                                    "external secret managers."
                                ),
                            })
                        elif "valueFrom" in env:
                            source = env["valueFrom"]
                            if "secretKeyRef" in source:
                                findings.append({
                                    "severity": "LOW",
                                    "title": (
                                        f"Secret-backed credential: "
                                        f"{pod_name} env {env['name']}"
                                    ),
                                    "detail": (
                                        f"From secret: "
                                        f"{source['secretKeyRef'].get('name')}. "
                                        f"Verify rotation policy."
                                    ),
                                })
 
    return findings

Continuous Verification and Device Posture

Zero trust architectures require continuous verification — not just authenticating once at connection time. For AI workloads, 這意味著:

Runtime integrity checking: Verify that the 訓練 script, model handler, or serving binary has not been modified since deployment. Container image digests should be verified at pod admission and periodically at runtime.
Node attestation: GPU nodes should attest their integrity before being trusted with sensitive model weights or 訓練資料. Hardware-based attestation (TPM, TEE attestation) provides stronger guarantees than software-only checks.
Behavioral 監控: Continuously monitor AI workload behavior for anomalies. A 訓練 job that suddenly begins making outbound network connections it has never made before, or an 推論 endpoint whose response latency distribution changes dramatically, may be compromised.
Token refresh under policy re-評估: When credentials are refreshed, the 授權 decision should be re-evaluated against current policy. This ensures that policy changes (such as revoking a team's access to a model) take effect within the 符元 lifetime.

Practical 範例

Zero Trust Compliance Checker for AI Platforms

#!/usr/bin/env bash
# Zero trust compliance audit for AI infrastructure on Kubernetes
# Checks for common violations of zero trust principles
 
set -euo pipefail
 
NAMESPACE="${1:-ml-platform}"
echo "=== Zero Trust Audit: Namespace $NAMESPACE ==="
 
echo ""
echo "--- 1. Default Deny Network Policies ---"
DENY_POLICIES=$(kubectl get networkpolicies -n "$NAMESPACE" \
    -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' 2>/dev/null \
    | grep -c "default-deny" || true)
if [ "$DENY_POLICIES" -eq 0 ]; then
    echo "[FAIL] No default-deny network policy found"
else
    echo "[PASS] Default-deny policy exists"
fi
 
echo ""
echo "--- 2. Service Mesh mTLS ---"
STRICT_MTLS=$(kubectl get peerauthentication -n "$NAMESPACE" \
    -o jsonpath='{range .items[*]}{.spec.mtls.mode}{"\n"}{end}' 2>/dev/null \
    | grep -c "STRICT" || true)
if [ "$STRICT_MTLS" -eq 0 ]; then
    echo "[FAIL] No STRICT mTLS PeerAuthentication policy"
else
    echo "[PASS] STRICT mTLS enforced"
fi
 
echo ""
echo "--- 3. Service Account Token Projection ---"
# Check for pods using legacy non-expiring 符元
LEGACY_TOKENS=$(kubectl get pods -n "$NAMESPACE" -o json 2>/dev/null \
    | python3 -c "
import json, sys
data = json.load(sys.stdin)
count = 0
for pod in data.get('items', []):
    for vol in pod.get('spec', {}).get('volumes', []):
        if 'secret' in vol and '符元' in vol.get('secret', {}).get('secretName', '').lower():
            count += 1
            print(f'  Legacy 符元: {pod[\"metadata\"][\"name\"]}')
print(f'Total: {count}')
" 2>/dev/null)
echo "$LEGACY_TOKENS"
 
echo ""
echo "--- 4. Privileged Containers (Zero Trust Violation) ---"
kubectl get pods -n "$NAMESPACE" -o json 2>/dev/null \
    | python3 -c "
import json, sys
data = json.load(sys.stdin)
for pod in data.get('items', []):
    for c in pod.get('spec', {}).get('containers', []):
        sc = c.get('securityContext', {})
        if sc.get('privileged'):
            print(f'  [FAIL] {pod[\"metadata\"][\"name\"]}/{c[\"name\"]} is privileged')
        if sc.get('runAsUser') == 0:
            print(f'  [WARN] {pod[\"metadata\"][\"name\"]}/{c[\"name\"]} runs as root')
" 2>/dev/null
 
echo ""
echo "--- 5. External Access Points ---"
echo "Services with LoadBalancer or NodePort (exposed externally):"
kubectl get svc -n "$NAMESPACE" -o json 2>/dev/null \
    | python3 -c "
import json, sys
data = json.load(sys.stdin)
for svc in data.get('items', []):
    svc_type = svc.get('spec', {}).get('type', 'ClusterIP')
    if svc_type in ('LoadBalancer', 'NodePort'):
        name = svc['metadata']['name']
        ports = svc['spec'].get('ports', [])
        port_str = ', '.join(str(p.get('port', '?')) for p in ports)
        print(f'  [WARN] {name} ({svc_type}): ports {port_str}')
" 2>/dev/null
 
echo ""
echo "=== Audit Complete ==="

防禦 and 緩解

實作 workload identity everywhere: Use SPIFFE/SPIRE or 雲端-native workload identity (GKE Workload Identity, EKS IRSA) for all AI components. Eliminate static credentials and service account keys. Training jobs should use short-lived 符元 that are rotated automatically.

Enforce mTLS for all service-to-service communication: Deploy a service mesh (Istio, Linkerd) in STRICT mTLS mode. For high-performance 訓練 networks using RDMA/InfiniBand, 實作 application-layer encryption where kernel-bypass networking prevents mesh-level enforcement.

Apply default-deny network policies: Every AI namespace should have a default-deny ingress and egress policy. Explicitly allow only required communication paths: 訓練 to data store, 推論 to model registry, pipeline controller to individual stages.

Verify artifacts at every boundary: Every pipeline stage should verify the integrity of incoming artifacts using cryptographic signatures, not just hashes. Use tools like Sigstore/cosign for model artifact signing and verification.

Short-lived credentials with continuous validation: Training jobs should receive credentials that expire before the job completes, requiring renewal through a 符元 exchange that re-evaluates 授權. This limits the blast radius of credential theft.

Monitor and alert on policy violations: 實作 continuous compliance 監控 that detects network policy changes, new privileged workloads, service mesh bypass, and credential anomalies. Integrate with SIEM for correlation with other 安全 events.

實作 data-level zero trust: Beyond network and identity, apply zero trust principles to data itself. 訓練資料 should carry metadata about its provenance and integrity. Model artifacts should be signed and verified at every loading point. Inference inputs and outputs should be validated against expected schemas and distributions. This data-level zero trust approach catches attacks that bypass network controls, such as 資料投毒 through legitimate pipeline components.

Segment by sensitivity level: Not all AI workloads require the same 安全 posture. A model that classifies product images has different risk than one that processes medical records. 實作 tiered zero trust zones where the strictest controls (hardware attestation, encrypted 推論, fully isolated networks) are reserved for the most sensitive AI workloads, while less sensitive workloads use lighter controls. This prevents the performance overhead of maximum 安全 from becoming a barrier to adoption.

Regularly 測試 zero trust controls: Zero trust architectures degrade over time as exceptions accumulate, configurations drift, and new components are added without proper integration. Schedule regular penetration 測試 specifically targeting zero trust boundaries — attempt lateral movement from 訓練 to 推論, from one tenant to another, from a compromised container to the host, and from an internal position to external data exfiltration. Each 測試 validates that the controls actually work, not just that they are configured.

參考文獻

Rose, S., Borchert, O., Mitchell, S., & Connelly, S. (2020). "Zero Trust Architecture." NIST Special Publication 800-207. https://doi.org/10.6028/NIST.SP.800-207
SPIFFE. (2024). "Secure Production Identity Framework for Everyone." https://spiffe.io/docs/latest/spiffe-about/overview/
NIST. (2023). "Artificial Intelligence Risk Management Framework (AI RMF 1.0)." https://www.nist.gov/artificial-intelligence/executive-order-safe-secure-and-trustworthy-artificial-intelligence
MITRE ATLAS. "Techniques: ML Supply Chain Compromise." https://atlas.mitre.org/techniques/AML.T0010
Google. (2024). "BeyondCorp: A New Approach to Enterprise 安全." https://雲端.google.com/beyondcorp
Kubernetes. (2024). "Network Policies." https://kubernetes.io/docs/concepts/services-networking/network-policies/

Zero Trust Architecture for AI Infrastructure

Intermediate18 min readUpdated 2026-03-21

Implementing and attacking zero trust principles across ML training pipelines, inference endpoints, and model registries

infrastructure zero-trust network-security identity microsegmentation

概覽

Inference endpoints must respond in milliseconds, making per-request 授權 checks a performance concern.

Zero Trust Principles Applied to AI Infrastructure

Identity: Every Component Gets a Verifiable Identity

"""
SPIFFE-based workload identity verification for AI pipeline components.
Demonstrates how to verify that a 訓練 job, model registry, or
推論 endpoint has a valid identity before allowing access.
"""
 
import ssl
import socket
import json
from dataclasses import dataclass
from typing import Optional
from urllib.parse import urlparse
from cryptography import x509
from cryptography.x509.oid import ExtensionOID, NameOID
from cryptography.hazmat.primitives import hashes
 
 
@dataclass
class WorkloadIdentity:
    """Parsed SPIFFE identity from an X.509 SVID."""
    spiffe_id: str
    trust_domain: str
    workload_path: str
    certificate_hash: str
    not_valid_after: str
 
    @property
    def component_type(self) -> str:
        """Extract AI component type from SPIFFE path."""
        parts = self.workload_path.strip("/").split("/")
        if parts:
            return parts[0]  # e.g., "訓練", "推論", "registry"
        return "unknown"
 
 
def extract_spiffe_id(cert: x509.Certificate) -> Optional[WorkloadIdentity]:
    """
    Extract SPIFFE ID from X.509 certificate SAN extension.
    SPIFFE IDs are encoded as URI SANs in the format:
    spiffe://<trust-domain>/<workload-path>
    """
    try:
        san_ext = cert.extensions.get_extension_for_oid(
            ExtensionOID.SUBJECT_ALTERNATIVE_NAME
        )
        san = san_ext.value
 
        for uri in san.get_values_for_type(x509.UniformResourceIdentifier):
            if uri.startswith("spiffe://"):
                parsed = urlparse(uri)
                cert_hash = cert.fingerprint(hashes.SHA256()).hex()
                return WorkloadIdentity(
                    spiffe_id=uri,
                    trust_domain=parsed.hostname or "",
                    workload_path=parsed.path,
                    certificate_hash=cert_hash,
                    not_valid_after=str(cert.not_valid_after_utc),
                )
    except x509.ExtensionNotFound:
        return None
 
    return None
 
 
class AIZeroTrustVerifier:
    """
    Verify workload identity and enforce access policies
    for AI infrastructure components.
    """
 
    # Access control matrix: which components can access which
    ACCESS_POLICIES = {
        "訓練": {
            "allowed_targets": [
                "data-store",
                "registry",
                "experiment-tracker",
                "feature-store",
            ],
            "denied_targets": ["推論", "監控-admin"],
        },
        "推論": {
            "allowed_targets": [
                "registry",  # Read-only for model loading
                "feature-store",  # For feature retrieval
            ],
            "denied_targets": [
                "訓練",
                "data-store",  # Inference should not access raw 訓練資料
            ],
        },
        "registry": {
            "allowed_targets": ["data-store"],
            "denied_targets": ["訓練", "推論"],
        },
        "pipeline": {
            "allowed_targets": [
                "訓練",
                "registry",
                "data-store",
                "feature-store",
            ],
            "denied_targets": ["推論"],
        },
    }
 
    def __init__(self, trust_domain: str):
        self.trust_domain = trust_domain
 
    def verify_access(
        self,
        source: WorkloadIdentity,
        target_component: str,
    ) -> tuple[bool, str]:
        """
        Verify if a source workload is allowed to access a target component.
        Returns (allowed, reason).
        """
        # Verify trust domain
        if source.trust_domain != self.trust_domain:
            return False, (
                f"Trust domain mismatch: {source.trust_domain} "
                f"!= {self.trust_domain}"
            )
 
        # Look up policy for source component type
        policy = self.ACCESS_POLICIES.get(source.component_type)
        if policy is None:
            return False, (
                f"No policy defined for component type: "
                f"{source.component_type}"
            )
 
        if target_component in policy.get("denied_targets", []):
            return False, (
                f"{source.component_type} is explicitly denied "
                f"access to {target_component}"
            )
 
        if target_component in policy.get("allowed_targets", []):
            return True, "Access permitted by policy"
 
        # Default deny
        return False, (
            f"No explicit allow for {source.component_type} -> "
            f"{target_component}"
        )
 
    def audit_connection(
        self,
        peer_cert_pem: bytes,
        target_component: str,
    ) -> dict:
        """
        Full audit of an incoming connection:
        1. Parse certificate
        2. Extract SPIFFE identity
        3. Check access policy
        """
        cert = x509.load_pem_x509_certificate(peer_cert_pem)
        identity = extract_spiffe_id(cert)
 
        if identity is None:
            return {
                "allowed": False,
                "reason": "No SPIFFE ID in certificate",
                "identity": None,
            }
 
        allowed, reason = self.verify_access(identity, target_component)
        return {
            "allowed": allowed,
            "reason": reason,
            "identity": {
                "spiffe_id": identity.spiffe_id,
                "component_type": identity.component_type,
                "cert_hash": identity.certificate_hash,
            },
        }

Microsegmentation for AI Networks

"""
Audit script for identifying zero trust gaps in AI infrastructure
network segmentation, with focus on GPU cluster interconnects.
"""
 
import subprocess
import json
import re
from typing import Optional
 
 
def audit_kubernetes_network_policies(namespace: str = "ml-platform") -> list[dict]:
    """
    Audit Kubernetes network policies for AI workload namespaces.
    Identifies missing policies that would allow unrestricted
    communication between components.
    """
    findings = []
 
    # Get all pods in the namespace
    result = subprocess.run(
        ["kubectl", "get", "pods", "-n", namespace, "-o", "json"],
        capture_output=True, text=True, timeout=30,
    )
    pods = json.loads(result.stdout)
 
    # Get network policies
    result = subprocess.run(
        ["kubectl", "get", "networkpolicies", "-n", namespace, "-o", "json"],
        capture_output=True, text=True, timeout=30,
    )
    policies = json.loads(result.stdout)
 
    # Check if default-deny exists
    has_default_deny = any(
        policy["metadata"]["name"].startswith("default-deny")
        for policy in policies.get("items", [])
    )
 
    if not has_default_deny:
        findings.append({
            "severity": "HIGH",
            "title": f"No default-deny policy in namespace {namespace}",
            "detail": (
                "Without a default-deny ingress/egress policy, all pods "
                "can communicate freely. AI components (訓練, 推論, "
                "registry) should be isolated by default."
            ),
        })
 
    # Check for pods with host networking
    for pod in pods.get("items", []):
        pod_name = pod["metadata"]["name"]
        spec = pod.get("spec", {})
 
        if spec.get("hostNetwork", False):
            findings.append({
                "severity": "HIGH",
                "title": f"Pod {pod_name} uses host networking",
                "detail": (
                    "Host networking bypasses all Kubernetes network "
                    "policies. This pod has unrestricted network access "
                    "to the node and potentially the InfiniBand fabric."
                ),
            })
 
        # Check for privileged containers (common for GPU workloads)
        for container in spec.get("containers", []):
            sec_ctx = container.get("securityContext", {})
            if sec_ctx.get("privileged", False):
                findings.append({
                    "severity": "HIGH",
                    "title": (
                        f"Privileged container: {pod_name}/"
                        f"{container['name']}"
                    ),
                    "detail": (
                        "Privileged containers can access all host "
                        "devices including InfiniBand HCAs, bypass "
                        "network namespaces, and escape container "
                        "isolation."
                    ),
                })
 
    # Check for RDMA/InfiniBand device access
    for pod in pods.get("items", []):
        pod_name = pod["metadata"]["name"]
        for container in pod.get("spec", {}).get("containers", []):
            resources = container.get("resources", {})
            limits = resources.get("limits", {})
            requests = resources.get("requests", {})
            all_resources = {**limits, **requests}
 
            for resource_name in all_resources:
                if "rdma" in resource_name or "infiniband" in resource_name:
                    findings.append({
                        "severity": "MEDIUM",
                        "title": (
                            f"RDMA device access: {pod_name}/"
                            f"{container['name']}"
                        ),
                        "detail": (
                            f"Container requests {resource_name}. RDMA "
                            f"traffic bypasses kernel networking and is "
                            f"not subject to NetworkPolicy enforcement."
                        ),
                    })
 
    return findings
 
 
def check_service_mesh_coverage(namespace: str = "ml-platform") -> list[dict]:
    """
    Verify that a service mesh (Istio/Linkerd) covers AI workloads
    and that mTLS is enforced.
    """
    findings = []
 
    # Check for Istio sidecar injection
    result = subprocess.run(
        [
            "kubectl", "get", "pods", "-n", namespace,
            "-o", "jsonpath={range .items[*]}{.metadata.name}{"
            "\\t}{.spec.containers[*].name}{\\n}{end}",
        ],
        capture_output=True, text=True, timeout=30,
    )
 
    for line in result.stdout.strip().split("\n"):
        if not line.strip():
            continue
        parts = line.split("\t")
        if len(parts) < 2:
            continue
        pod_name = parts[0]
        containers = parts[1].split()
 
        has_sidecar = any(
            c in containers
            for c in ["istio-proxy", "linkerd-proxy", "envoy-sidecar"]
        )
 
        if not has_sidecar:
            findings.append({
                "severity": "MEDIUM",
                "title": f"No service mesh sidecar: {pod_name}",
                "detail": (
                    "This pod communicates without mTLS enforcement. "
                    "Traffic can be intercepted or spoofed by adjacent "
                    "workloads."
                ),
            })
 
    # Check Istio PeerAuthentication policy
    result = subprocess.run(
        [
            "kubectl", "get", "peerauthentication", "-n", namespace,
            "-o", "json",
        ],
        capture_output=True, text=True, timeout=30,
    )
 
    if result.returncode == 0:
        pa_policies = json.loads(result.stdout)
        strict_mtls = any(
            policy.get("spec", {}).get("mtls", {}).get("mode") == "STRICT"
            for policy in pa_policies.get("items", [])
        )
        if not strict_mtls:
            findings.append({
                "severity": "HIGH",
                "title": "mTLS not set to STRICT mode",
                "detail": (
                    "PERMISSIVE mTLS allows plaintext connections. "
                    "攻擊者 in the mesh can intercept 推論 "
                    "requests, model weights, and 訓練資料."
                ),
            })
 
    return findings

Attacking Zero Trust Gaps in AI Pipelines

Exploiting Implicit Trust Between Pipeline Stages

"""
Demonstrate trust boundary violations in ML pipelines.
This script identifies pipeline stages that accept artifacts
from upstream stages without integrity verification.
"""
 
import hashlib
import json
import os
from pathlib import Path
from dataclasses import dataclass
from typing import Optional
 
 
@dataclass
class PipelineArtifact:
    """Represents an artifact passed between pipeline stages."""
    stage_name: str
    artifact_path: str
    expected_hash: Optional[str]
    actual_hash: Optional[str]
    is_signed: bool
    signature_valid: Optional[bool]
 
 
def audit_pipeline_artifacts(
    pipeline_run_dir: str,
) -> list[dict]:
    """
    Audit artifacts in a pipeline run directory for
    integrity verification gaps.
    """
    findings = []
    run_path = Path(pipeline_run_dir)
 
    if not run_path.exists():
        return [{"severity": "ERROR", "title": "Pipeline run directory not found",
                 "detail": f"{pipeline_run_dir} does not exist"}]
 
    # Look for common pipeline metadata files
    metadata_files = list(run_path.rglob("**/metadata.json")) + \
                     list(run_path.rglob("**/artifact_info.json"))
 
    for meta_file in metadata_files:
        try:
            with open(meta_file) as f:
                metadata = json.load(f)
        except (json.JSONDecodeError, IOError):
            continue
 
        stage_name = metadata.get("stage", meta_file.parent.name)
        artifacts = metadata.get("output_artifacts", [])
 
        for artifact in artifacts:
            art_path = artifact.get("path", "")
            has_hash = "sha256" in artifact or "hash" in artifact
            has_signature = "signature" in artifact
 
            if not has_hash:
                findings.append({
                    "severity": "HIGH",
                    "title": f"No integrity hash: {stage_name}/{art_path}",
                    "detail": (
                        f"Artifact from stage '{stage_name}' has no hash. "
                        f"A compromised upstream stage could substitute "
                        f"a malicious artifact (e.g., poisoned model weights)."
                    ),
                })
 
            if not has_signature:
                findings.append({
                    "severity": "MEDIUM",
                    "title": f"No signature: {stage_name}/{art_path}",
                    "detail": (
                        f"Artifact is not cryptographically signed. "
                        f"Even with a hash, the hash itself could be "
                        f"modified by a compromised pipeline controller."
                    ),
                })
 
            # Verify hash if present
            if has_hash:
                expected = artifact.get("sha256") or artifact.get("hash")
                full_path = run_path / art_path
                if full_path.exists():
                    actual = hashlib.sha256(
                        full_path.read_bytes()
                    ).hexdigest()
                    if actual != expected:
                        findings.append({
                            "severity": "CRITICAL",
                            "title": (
                                f"Hash mismatch: {stage_name}/{art_path}"
                            ),
                            "detail": (
                                f"Expected {expected}, got {actual}. "
                                f"Artifact may have been tampered with."
                            ),
                        })
 
    # Check for credential passing between stages
    env_files = list(run_path.rglob("**/.env")) + \
                list(run_path.rglob("**/secrets.*"))
    for env_file in env_files:
        findings.append({
            "severity": "HIGH",
            "title": f"Credentials in pipeline artifacts: {env_file}",
            "detail": (
                "Secrets stored in pipeline artifacts can be accessed "
                "by downstream stages and persisted in artifact storage."
            ),
        })
 
    return findings

Token and Credential 攻擊

Long-running 訓練 jobs often use service account 符元 or API keys with extended validity. In zero trust architectures, these should be short-lived and continuously validated. Common gaps include:

Static service account 符元 in Kubernetes that do not expire (pre-v1.24 default behavior)
雲端 IAM roles with overly broad 權限 attached to 訓練 node pools
Model registry credentials embedded in pipeline configurations
Experiment tracking API keys shared across all team members

"""
Credential exposure analysis for AI workloads in Kubernetes.
Identifies overly broad credentials, long-lived 符元, and
credential sharing patterns that violate zero trust principles.
"""
 
import subprocess
import json
import base64
from typing import Any
 
 
def audit_ai_credentials(namespace: str = "ml-platform") -> list[dict]:
    """
    Audit credentials available to AI workloads for zero trust
    violations: excessive scope, long validity, and sharing.
    """
    findings = []
 
    # Get all service accounts in the namespace
    result = subprocess.run(
        ["kubectl", "get", "serviceaccounts", "-n", namespace, "-o", "json"],
        capture_output=True, text=True, timeout=30,
    )
    if result.returncode != 0:
        return findings
 
    service_accounts = json.loads(result.stdout)
 
    for sa in service_accounts.get("items", []):
        sa_name = sa["metadata"]["name"]
 
        # Check for mounted secrets
        secrets = sa.get("secrets", [])
        if len(secrets) > 0:
            findings.append({
                "severity": "MEDIUM",
                "title": f"Service account has bound secrets: {sa_name}",
                "detail": (
                    f"SA {sa_name} has {len(secrets)} bound secrets. "
                    f"In zero trust, prefer projected service account 符元 "
                    f"with expiration over static secrets."
                ),
            })
 
        # Check annotations for 雲端 IAM bindings
        annotations = sa.get("metadata", {}).get("annotations", {})
 
        # GKE Workload Identity
        gke_sa = annotations.get(
            "iam.gke.io/gcp-service-account", ""
        )
        if gke_sa:
            findings.append({
                "severity": "INFO",
                "title": f"GKE Workload Identity binding: {sa_name}",
                "detail": f"Bound to GCP SA: {gke_sa}. Verify scope is minimal.",
            })
 
        # EKS IRSA
        eks_role = annotations.get(
            "eks.amazonaws.com/role-arn", ""
        )
        if eks_role:
            findings.append({
                "severity": "INFO",
                "title": f"EKS IRSA binding: {sa_name}",
                "detail": f"Bound to IAM role: {eks_role}. Verify role policy scope.",
            })
 
    # Check for pods with environment variable credentials
    pods_result = subprocess.run(
        ["kubectl", "get", "pods", "-n", namespace, "-o", "json"],
        capture_output=True, text=True, timeout=30,
    )
    if pods_result.returncode == 0:
        pods = json.loads(pods_result.stdout)
        sensitive_env_patterns = [
            "KEY", "SECRET", "PASSWORD", "TOKEN", "CREDENTIAL",
            "API_KEY", "ACCESS_KEY", "PRIVATE_KEY",
        ]
 
        for pod in pods.get("items", []):
            pod_name = pod["metadata"]["name"]
            for container in pod.get("spec", {}).get("containers", []):
                for env in container.get("env", []):
                    env_name = env.get("name", "").upper()
                    if any(p in env_name for p in sensitive_env_patterns):
                        # Check if it's from a secret reference (better)
                        # or a plaintext value (worse)
                        if "value" in env and env["value"]:
                            findings.append({
                                "severity": "CRITICAL",
                                "title": (
                                    f"Hardcoded credential: {pod_name} "
                                    f"env {env['name']}"
                                ),
                                "detail": (
                                    "Credential is hardcoded in pod spec "
                                    "as a plaintext value. Use Kubernetes "
                                    "secrets with projected volumes or "
                                    "external secret managers."
                                ),
                            })
                        elif "valueFrom" in env:
                            source = env["valueFrom"]
                            if "secretKeyRef" in source:
                                findings.append({
                                    "severity": "LOW",
                                    "title": (
                                        f"Secret-backed credential: "
                                        f"{pod_name} env {env['name']}"
                                    ),
                                    "detail": (
                                        f"From secret: "
                                        f"{source['secretKeyRef'].get('name')}. "
                                        f"Verify rotation policy."
                                    ),
                                })
 
    return findings

Continuous Verification and Device Posture

Zero trust architectures require continuous verification — not just authenticating once at connection time. For AI workloads, 這意味著:

Runtime integrity checking: Verify that the 訓練 script, model handler, or serving binary has not been modified since deployment. Container image digests should be verified at pod admission and periodically at runtime.
Node attestation: GPU nodes should attest their integrity before being trusted with sensitive model weights or 訓練資料. Hardware-based attestation (TPM, TEE attestation) provides stronger guarantees than software-only checks.
Behavioral 監控: Continuously monitor AI workload behavior for anomalies. A 訓練 job that suddenly begins making outbound network connections it has never made before, or an 推論 endpoint whose response latency distribution changes dramatically, may be compromised.
Token refresh under policy re-評估: When credentials are refreshed, the 授權 decision should be re-evaluated against current policy. This ensures that policy changes (such as revoking a team's access to a model) take effect within the 符元 lifetime.

Practical 範例

Zero Trust Compliance Checker for AI Platforms

#!/usr/bin/env bash
# Zero trust compliance audit for AI infrastructure on Kubernetes
# Checks for common violations of zero trust principles
 
set -euo pipefail
 
NAMESPACE="${1:-ml-platform}"
echo "=== Zero Trust Audit: Namespace $NAMESPACE ==="
 
echo ""
echo "--- 1. Default Deny Network Policies ---"
DENY_POLICIES=$(kubectl get networkpolicies -n "$NAMESPACE" \
    -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' 2>/dev/null \
    | grep -c "default-deny" || true)
if [ "$DENY_POLICIES" -eq 0 ]; then
    echo "[FAIL] No default-deny network policy found"
else
    echo "[PASS] Default-deny policy exists"
fi
 
echo ""
echo "--- 2. Service Mesh mTLS ---"
STRICT_MTLS=$(kubectl get peerauthentication -n "$NAMESPACE" \
    -o jsonpath='{range .items[*]}{.spec.mtls.mode}{"\n"}{end}' 2>/dev/null \
    | grep -c "STRICT" || true)
if [ "$STRICT_MTLS" -eq 0 ]; then
    echo "[FAIL] No STRICT mTLS PeerAuthentication policy"
else
    echo "[PASS] STRICT mTLS enforced"
fi
 
echo ""
echo "--- 3. Service Account Token Projection ---"
# Check for pods using legacy non-expiring 符元
LEGACY_TOKENS=$(kubectl get pods -n "$NAMESPACE" -o json 2>/dev/null \
    | python3 -c "
import json, sys
data = json.load(sys.stdin)
count = 0
for pod in data.get('items', []):
    for vol in pod.get('spec', {}).get('volumes', []):
        if 'secret' in vol and '符元' in vol.get('secret', {}).get('secretName', '').lower():
            count += 1
            print(f'  Legacy 符元: {pod[\"metadata\"][\"name\"]}')
print(f'Total: {count}')
" 2>/dev/null)
echo "$LEGACY_TOKENS"
 
echo ""
echo "--- 4. Privileged Containers (Zero Trust Violation) ---"
kubectl get pods -n "$NAMESPACE" -o json 2>/dev/null \
    | python3 -c "
import json, sys
data = json.load(sys.stdin)
for pod in data.get('items', []):
    for c in pod.get('spec', {}).get('containers', []):
        sc = c.get('securityContext', {})
        if sc.get('privileged'):
            print(f'  [FAIL] {pod[\"metadata\"][\"name\"]}/{c[\"name\"]} is privileged')
        if sc.get('runAsUser') == 0:
            print(f'  [WARN] {pod[\"metadata\"][\"name\"]}/{c[\"name\"]} runs as root')
" 2>/dev/null
 
echo ""
echo "--- 5. External Access Points ---"
echo "Services with LoadBalancer or NodePort (exposed externally):"
kubectl get svc -n "$NAMESPACE" -o json 2>/dev/null \
    | python3 -c "
import json, sys
data = json.load(sys.stdin)
for svc in data.get('items', []):
    svc_type = svc.get('spec', {}).get('type', 'ClusterIP')
    if svc_type in ('LoadBalancer', 'NodePort'):
        name = svc['metadata']['name']
        ports = svc['spec'].get('ports', [])
        port_str = ', '.join(str(p.get('port', '?')) for p in ports)
        print(f'  [WARN] {name} ({svc_type}): ports {port_str}')
" 2>/dev/null
 
echo ""
echo "=== Audit Complete ==="

防禦 and 緩解

參考文獻

Rose, S., Borchert, O., Mitchell, S., & Connelly, S. (2020). "Zero Trust Architecture." NIST Special Publication 800-207. https://doi.org/10.6028/NIST.SP.800-207
SPIFFE. (2024). "Secure Production Identity Framework for Everyone." https://spiffe.io/docs/latest/spiffe-about/overview/
NIST. (2023). "Artificial Intelligence Risk Management Framework (AI RMF 1.0)." https://www.nist.gov/artificial-intelligence/executive-order-safe-secure-and-trustworthy-artificial-intelligence
MITRE ATLAS. "Techniques: ML Supply Chain Compromise." https://atlas.mitre.org/techniques/AML.T0010
Google. (2024). "BeyondCorp: A New Approach to Enterprise 安全." https://雲端.google.com/beyondcorp
Kubernetes. (2024). "Network Policies." https://kubernetes.io/docs/concepts/services-networking/network-policies/

Zero Trust Architecture for AI Infrastructure

Related articles

Zero Trust Architecture for AI Infrastructure

Related articles