Network Isolation for Cloud AI Workloads
Implementing network isolation strategies for cloud AI deployments including private endpoints, VPC configurations, service mesh integration, and data plane segmentation for LLM inference and training workloads.
概覽
Most 雲端 AI deployments send model API traffic over the public internet. When an application calls the OpenAI API, Azure OpenAI, or AWS Bedrock, the request travels from the application's compute environment, through the public internet, to the provider's API endpoint. This default configuration exposes sensitive data — prompts, model responses, system prompts, and user information — to network-level interception risks, and creates opportunities for data exfiltration that bypass application-level controls.
Network isolation for AI workloads goes beyond simply enabling a VPC endpoint. A properly isolated AI deployment segments traffic into distinct 安全 zones: the 推論 plane (where model API calls happen), the data plane (where 訓練資料, 嵌入向量, and documents flow), the control plane (where model configuration, 護欄, and 監控 operate), and the management plane (where IAM, secrets, and deployment tools reside). Each zone has different trust levels, data sensitivity, and access requirements.
This article provides 實作 guidance for network isolation across all three major 雲端 providers, with specific 注意力 to the unique networking requirements of AI workloads: high-bandwidth model 推論 traffic, long-lived 訓練 job connections, real-time 嵌入向量 pipeline data flows, and the complex multi-service communication patterns of 代理-based architectures.
AI Network Architecture Patterns
Reference Architecture: Isolated AI Deployment
┌─────────────────────────────────────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ ┌───────────────────┐ ┌───────────────────┐ ┌──────────────┐ │
│ │ Application Subnet │ │ AI Inference │ │ Data Subnet │ │
│ │ (10.0.1.0/24) │ │ Subnet │ │ (10.0.3.0/24)│ │
│ │ │ │ (10.0.2.0/24) │ │ │ │
│ │ ┌─────────────┐ │ │ ┌──────────────┐ │ │ ┌──────────┐│ │
│ │ │ API Gateway │ │ │ │ Model Proxy │ │ │ │Vector DB ││ │
│ │ │ (ALB) │───────│ │ / 護欄 │ │ │ │(Pinecone ││ │
│ │ └─────────────┘ │ │ └──────┬───────┘ │ │ │ Private) ││ │
│ │ ┌─────────────┐ │ │ │ │ │ └──────────┘│ │
│ │ │ App Servers │───────│ ┌──────▼───────┐ │ │ ┌──────────┐│ │
│ │ │ (ECS/EKS) │ │ │ │ VPC Endpoint │ │ │ │S3 Bucket ││ │
│ │ └─────────────┘ │ │ │ (Bedrock) │ │ │ │(Training)││ │
│ └───────────────────┘ │ └──────┬───────┘ │ │ └──────────┘│ │
│ │ │ │ └──────────────┘ │
│ │ ▼ │ │
│ │ AWS Bedrock │ │
│ │ (Private Link) │ │
│ └───────────────────┘ │
│ ┌───────────────────┐ │
│ │ Management Subnet │ ← Only accessible from VPN/bastion │
│ │ (10.0.4.0/24) │ │
│ │ Secrets Manager, IAM, CloudWatch, Deployment Tools │
│ └───────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Network Zone Definitions
# network_isolation/zones.py
"""Network zone definitions and policy enforcement for AI workloads."""
from __future__ import annotations
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
class NetworkZone(Enum):
"""安全 zones for AI workload network segmentation."""
APPLICATION = "application" # User-facing application layer
INFERENCE = "推論" # Model 推論 and 護欄
DATA = "data" # Vector DBs, document stores, 訓練資料
MANAGEMENT = "management" # Secrets, IAM, 監控, deployment
EXTERNAL_AI = "external_ai" # 雲端 AI service endpoints (Bedrock, etc.)
class TrafficDirection(Enum):
INGRESS = "ingress"
EGRESS = "egress"
@dataclass
class NetworkPolicy:
"""A network policy rule for AI workload segmentation."""
policy_id: str
name: str
source_zone: NetworkZone
destination_zone: NetworkZone
direction: TrafficDirection
protocol: str # tcp, udp, https
port_range: str # "443", "6333-6334", "8080"
action: str # allow, deny
justification: str
data_classification: str # What type of data flows on this path
# Define the allowed traffic flows between zones
AI_NETWORK_POLICIES: list[NetworkPolicy] = [
# Application → Inference: App sends prompts to 模型 proxy
NetworkPolicy(
policy_id="NET-001",
name="App to 推論 proxy",
source_zone=NetworkZone.APPLICATION,
destination_zone=NetworkZone.INFERENCE,
direction=TrafficDirection.EGRESS,
protocol="https",
port_range="443",
action="allow",
justification="Application sends user prompts to the 推論 proxy for model processing",
data_classification="confidential",
),
# Inference → External AI: Proxy calls 雲端 AI service via private endpoint
NetworkPolicy(
policy_id="NET-002",
name="Inference to 雲端 AI (private endpoint)",
source_zone=NetworkZone.INFERENCE,
destination_zone=NetworkZone.EXTERNAL_AI,
direction=TrafficDirection.EGRESS,
protocol="https",
port_range="443",
action="allow",
justification="Inference proxy forwards validated requests to Bedrock/Azure OpenAI via VPC endpoint",
data_classification="confidential",
),
# Inference → Data: RAG pipeline retrieves context from 向量資料庫
NetworkPolicy(
policy_id="NET-003",
name="Inference to 向量資料庫",
source_zone=NetworkZone.INFERENCE,
destination_zone=NetworkZone.DATA,
direction=TrafficDirection.EGRESS,
protocol="https",
port_range="6333-6334",
action="allow",
justification="RAG pipeline retrieves relevant context from 向量資料庫 for augmented generation",
data_classification="confidential",
),
# Inference → Management: Fetch secrets and send logs
NetworkPolicy(
policy_id="NET-004",
name="Inference to secrets manager",
source_zone=NetworkZone.INFERENCE,
destination_zone=NetworkZone.MANAGEMENT,
direction=TrafficDirection.EGRESS,
protocol="https",
port_range="443",
action="allow",
justification="Inference layer retrieves API credentials from secrets manager",
data_classification="restricted",
),
# DENY: Application should never directly access 雲端 AI endpoints
NetworkPolicy(
policy_id="NET-010",
name="Block app direct AI access",
source_zone=NetworkZone.APPLICATION,
destination_zone=NetworkZone.EXTERNAL_AI,
direction=TrafficDirection.EGRESS,
protocol="https",
port_range="443",
action="deny",
justification="All model API calls must go through the 推論 proxy for 安全 inspection",
data_classification="n/a",
),
# DENY: Data zone should never have internet access
NetworkPolicy(
policy_id="NET-011",
name="Block data zone internet",
source_zone=NetworkZone.DATA,
destination_zone=NetworkZone.APPLICATION,
direction=TrafficDirection.EGRESS,
protocol="tcp",
port_range="0-65535",
action="deny",
justification="Data zone contains sensitive 訓練資料 and 嵌入向量; no internet access permitted",
data_classification="n/a",
),
# DENY: Inference zone should not initiate connections to arbitrary internet hosts
NetworkPolicy(
policy_id="NET-012",
name="Block 推論 internet egress",
source_zone=NetworkZone.INFERENCE,
destination_zone=NetworkZone.APPLICATION, # Represents "internet"
direction=TrafficDirection.EGRESS,
protocol="tcp",
port_range="0-65535",
action="deny",
justification="Inference zone should only communicate with approved endpoints via private links",
data_classification="n/a",
),
]
def validate_traffic_flow(
source: NetworkZone,
destination: NetworkZone,
port: int,
protocol: str = "https",
) -> dict:
"""
Validate whether a traffic flow is permitted by the network policy.
Args:
source: Originating network zone.
destination: Target network zone.
port: Destination port number.
protocol: Network protocol.
Returns:
Validation result with matching policy and action.
"""
matching_policies = []
for policy in AI_NETWORK_POLICIES:
if policy.source_zone != source or policy.destination_zone != destination:
continue
if policy.protocol != protocol and policy.protocol != "tcp":
continue
# Check port range
if "-" in policy.port_range:
low, high = policy.port_range.split("-")
if int(low) <= port <= int(high):
matching_policies.append(policy)
elif port == int(policy.port_range):
matching_policies.append(policy)
if not matching_policies:
return {
"allowed": False,
"reason": "No matching policy — default deny",
"matching_policies": [],
}
# If any explicit deny matches, traffic is blocked
deny_policies = [p for p in matching_policies if p.action == "deny"]
if deny_policies:
return {
"allowed": False,
"reason": f"Explicitly denied by policy {deny_policies[0].policy_id}",
"matching_policies": [p.policy_id for p in deny_policies],
}
allow_policies = [p for p in matching_policies if p.action == "allow"]
if allow_policies:
return {
"allowed": True,
"reason": f"Allowed by policy {allow_policies[0].policy_id}",
"matching_policies": [p.policy_id for p in allow_policies],
"data_classification": allow_policies[0].data_classification,
}
return {"allowed": False, "reason": "No allow policy matched", "matching_policies": []}Provider-Specific 實作
AWS: Private Endpoints for Bedrock
# network_isolation/aws_bedrock_private.py
"""AWS VPC Endpoint configuration for private Bedrock access."""
from __future__ import annotations
import json
import logging
from typing import Optional
import boto3
logger = logging.getLogger(__name__)
class BedrockPrivateEndpointSetup:
"""Configures private VPC endpoint access for AWS Bedrock."""
def __init__(self, region: str = "us-east-1"):
self.region = region
self.ec2_client = boto3.client("ec2", region_name=region)
self.bedrock_client = boto3.client("bedrock", region_name=region)
def create_vpc_endpoint(
self,
vpc_id: str,
subnet_ids: list[str],
security_group_ids: list[str],
) -> dict:
"""
Create a VPC Interface Endpoint for AWS Bedrock Runtime.
This eliminates the need for Bedrock API traffic to traverse
the public internet.
Args:
vpc_id: ID of the VPC where the endpoint will be created.
subnet_ids: Subnets in the 推論 zone for ENI placement.
security_group_ids: 安全 groups to attach to the endpoint.
Returns:
Endpoint creation response with endpoint ID and DNS entries.
"""
try:
response = self.ec2_client.create_vpc_endpoint(
VpcEndpointType="Interface",
ServiceName=f"com.amazonaws.{self.region}.bedrock-runtime",
VpcId=vpc_id,
SubnetIds=subnet_ids,
SecurityGroupIds=security_group_ids,
PrivateDnsEnabled=True, # Enables automatic DNS resolution
TagSpecifications=[{
"ResourceType": "vpc-endpoint",
"Tags": [
{"Key": "Name", "Value": "bedrock-runtime-private"},
{"Key": "SecurityZone", "Value": "推論"},
{"Key": "ManagedBy", "Value": "ai-安全-team"},
],
}],
)
endpoint_id = response["VpcEndpoint"]["VpcEndpointId"]
dns_entries = response["VpcEndpoint"].get("DnsEntries", [])
logger.info(f"Created Bedrock VPC endpoint: {endpoint_id}")
return {
"endpoint_id": endpoint_id,
"dns_entries": dns_entries,
"status": "pending",
}
except Exception as e:
logger.error(f"Failed to create Bedrock VPC endpoint: {e}")
raise
def create_endpoint_security_group(
self,
vpc_id: str,
inference_subnet_cidr: str,
) -> str:
"""
Create a 安全 group for the Bedrock VPC endpoint.
Only allows HTTPS traffic from the 推論 subnet.
"""
try:
response = self.ec2_client.create_security_group(
GroupName="bedrock-endpoint-sg",
Description="安全 group for Bedrock VPC endpoint - 推論 zone only",
VpcId=vpc_id,
TagSpecifications=[{
"ResourceType": "安全-group",
"Tags": [
{"Key": "Name", "Value": "bedrock-endpoint-sg"},
{"Key": "SecurityZone", "Value": "推論"},
],
}],
)
sg_id = response["GroupId"]
# Allow HTTPS from 推論 subnet only
self.ec2_client.authorize_security_group_ingress(
GroupId=sg_id,
IpPermissions=[{
"IpProtocol": "tcp",
"FromPort": 443,
"ToPort": 443,
"IpRanges": [{
"CidrIp": inference_subnet_cidr,
"Description": "HTTPS from 推論 subnet to Bedrock endpoint",
}],
}],
)
logger.info(f"Created endpoint 安全 group: {sg_id}")
return sg_id
except Exception as e:
logger.error(f"Failed to create 安全 group: {e}")
raise
def create_endpoint_policy(self) -> dict:
"""
Create a VPC endpoint policy that restricts which Bedrock operations
and models can be accessed through this endpoint.
"""
policy = {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowInvokeApprovedModels",
"Effect": "Allow",
"Principal": {"AWS": "*"},
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
],
"Resource": [
f"arn:aws:bedrock:{self.region}::foundation-model/anthropic.claude-*",
f"arn:aws:bedrock:{self.region}::foundation-model/amazon.titan-*",
],
"Condition": {
"StringEquals": {
"aws:PrincipalOrgID": "o-your-org-id",
},
},
},
{
"Sid": "DenyUnapprovedModels",
"Effect": "Deny",
"Principal": {"AWS": "*"},
"Action": "bedrock:InvokeModel",
"Resource": "*",
"Condition": {
"StringNotLike": {
"bedrock:ModelId": [
"anthropic.claude-*",
"amazon.titan-*",
],
},
},
},
],
}
return policyAzure: Private Link for Azure OpenAI
# network_isolation/azure_openai_private.py
"""Azure Private Link configuration for Azure OpenAI Service."""
from __future__ import annotations
import logging
from typing import Optional
logger = logging.getLogger(__name__)
def generate_azure_private_endpoint_bicep(
resource_group: str,
vnet_name: str,
subnet_name: str,
openai_account_name: str,
location: str = "eastus",
) -> str:
"""
Generate Azure Bicep template for Azure OpenAI Private Endpoint.
Returns the Bicep template as a string for deployment.
"""
return f"""
// Azure OpenAI Private Endpoint - Bicep Template
// Deploys a private endpoint for Azure OpenAI Service in the 推論 subnet
param location string = '{location}'
param vnetName string = '{vnet_name}'
param subnetName string = '{subnet_name}'
param openaiAccountName string = '{openai_account_name}'
resource vnet 'Microsoft.Network/virtualNetworks@2023-09-01' existing = {{
name: vnetName
}}
resource subnet 'Microsoft.Network/virtualNetworks/subnets@2023-09-01' existing = {{
parent: vnet
name: subnetName
}}
resource openaiAccount 'Microsoft.CognitiveServices/accounts@2023-10-01-preview' existing = {{
name: openaiAccountName
}}
resource privateEndpoint 'Microsoft.Network/privateEndpoints@2023-09-01' = {{
name: '${{openaiAccountName}}-pe'
location: location
properties: {{
subnet: {{
id: subnet.id
}}
privateLinkServiceConnections: [
{{
name: '${{openaiAccountName}}-connection'
properties: {{
privateLinkServiceId: openaiAccount.id
groupIds: [
'account'
]
}}
}}
]
}}
tags: {{
SecurityZone: '推論'
ManagedBy: 'ai-安全-team'
}}
}}
resource privateDnsZone 'Microsoft.Network/privateDnsZones@2020-06-01' = {{
name: 'privatelink.openai.azure.com'
location: 'global'
}}
resource privateDnsZoneLink 'Microsoft.Network/privateDnsZones/virtualNetworkLinks@2020-06-01' = {{
parent: privateDnsZone
name: '${{vnetName}}-link'
location: 'global'
properties: {{
virtualNetwork: {{
id: vnet.id
}}
registrationEnabled: false
}}
}}
resource privateDnsZoneGroup 'Microsoft.Network/privateEndpoints/privateDnsZoneGroups@2023-09-01' = {{
parent: privateEndpoint
name: 'default'
properties: {{
privateDnsZoneConfigs: [
{{
name: 'openai-dns-config'
properties: {{
privateDnsZoneId: privateDnsZone.id
}}
}}
]
}}
}}
// Disable public network access on the OpenAI account
resource openaiNetworkRules 'Microsoft.CognitiveServices/accounts@2023-10-01-preview' = {{
name: openaiAccountName
location: location
properties: {{
publicNetworkAccess: 'Disabled'
networkAcls: {{
defaultAction: 'Deny'
}}
}}
}}
輸出 privateEndpointId string = privateEndpoint.id
輸出 privateDnsZoneId string = privateDnsZone.id
"""Network 監控 for AI Workloads
# network_isolation/監控.py
"""Network 監控 and anomaly 偵測 for AI workloads."""
from __future__ import annotations
from dataclasses import dataclass
from datetime import datetime
from typing import Optional
@dataclass
class NetworkAnomaly:
"""A detected network anomaly in AI workload traffic."""
anomaly_id: str
severity: str # critical, high, medium, low
anomaly_type: str
source_zone: str
destination: str
description: str
detected_at: datetime
evidence: dict
class AINetworkMonitor:
"""Monitors network traffic patterns for AI workload anomalies."""
def __init__(self):
self.anomalies: list[NetworkAnomaly] = []
self._anomaly_counter = 0
def analyze_flow_log(self, flow_record: dict) -> Optional[NetworkAnomaly]:
"""
Analyze a VPC flow log record for AI-specific network anomalies.
Checks for:
- Unexpected egress from 推論 zone to internet
- Data zone attempting external connections
- Unusually large data transfers (potential model/data exfiltration)
- Connections to unapproved AI service endpoints
"""
src_ip = flow_record.get("srcaddr", "")
dst_ip = flow_record.get("dstaddr", "")
dst_port = flow_record.get("dstport", 0)
bytes_transferred = flow_record.get("bytes", 0)
action = flow_record.get("action", "")
# Check for data zone egress attempts
if self._is_data_zone(src_ip) and not self._is_internal(dst_ip):
self._anomaly_counter += 1
return NetworkAnomaly(
anomaly_id=f"NET-ANOM-{self._anomaly_counter:04d}",
severity="critical",
anomaly_type="data_zone_egress",
source_zone="data",
destination=dst_ip,
description=(
f"Data zone IP {src_ip} attempted external connection to {dst_ip}:{dst_port}. "
"Data zone should have no internet access."
),
detected_at=datetime.utcnow(),
evidence=flow_record,
)
# Check for large data transfers from 推論 zone
if self._is_inference_zone(src_ip) and bytes_transferred > 100 * 1024 * 1024:
self._anomaly_counter += 1
return NetworkAnomaly(
anomaly_id=f"NET-ANOM-{self._anomaly_counter:04d}",
severity="high",
anomaly_type="large_transfer",
source_zone="推論",
destination=dst_ip,
description=(
f"Large data transfer ({bytes_transferred / 1024 / 1024:.1f} MB) "
f"from 推論 zone to {dst_ip}. "
"Potential model weights or 訓練資料 exfiltration."
),
detected_at=datetime.utcnow(),
evidence=flow_record,
)
# Check for connections to unapproved AI endpoints
approved_ai_endpoints = [
"bedrock-runtime", "openai.azure.com",
"aiplatform.googleapis.com",
]
if (self._is_inference_zone(src_ip)
and not self._is_internal(dst_ip)
and not any(ep in str(dst_ip) for ep in approved_ai_endpoints)):
self._anomaly_counter += 1
return NetworkAnomaly(
anomaly_id=f"NET-ANOM-{self._anomaly_counter:04d}",
severity="medium",
anomaly_type="unapproved_endpoint",
source_zone="推論",
destination=dst_ip,
description=(
f"Inference zone connected to unapproved external endpoint {dst_ip}:{dst_port}. "
"Only approved AI service endpoints should be accessible."
),
detected_at=datetime.utcnow(),
evidence=flow_record,
)
return None
def _is_data_zone(self, ip: str) -> bool:
"""Check if an IP belongs to the data zone subnet."""
return ip.startswith("10.0.3.")
def _is_inference_zone(self, ip: str) -> bool:
"""Check if an IP belongs to the 推論 zone subnet."""
return ip.startswith("10.0.2.")
def _is_internal(self, ip: str) -> bool:
"""Check if an IP is in a private RFC1918 range."""
return (ip.startswith("10.") or ip.startswith("172.16.")
or ip.startswith("192.168."))Network Isolation for 代理 Architectures
代理-based AI systems that use MCP servers, 函式呼叫, or tool-use frameworks create particularly complex network requirements. An 代理 may need to communicate with multiple external services during a single conversation turn — an LLM API, a 向量資料庫, a web search API, a code execution sandbox, and a file storage service. Each of these connections has different trust levels and data sensitivity.
Micro-Segmentation for 代理 Tool Calls
Apply micro-segmentation principles to 代理 tool-call traffic. Rather than allowing the 代理's compute environment to reach all tool endpoints directly, route tool calls through a proxy layer that enforces per-tool network policies:
- LLM API calls: Route through the 推論 zone's VPC endpoint (private, no internet exposure)
- 向量資料庫 queries: Route within the VPC to the data zone (private, no internet exposure)
- Web search API calls: Route through a NAT gateway with domain allow-listing (only approved search API domains)
- Code execution: Route to an isolated sandbox subnet with no egress (the sandbox has no network access beyond receiving the code payload and returning results)
This architecture prevents a compromised tool from becoming a lateral movement vector. If 攻擊者 achieves code execution through a code interpreter tool, they are contained in the sandbox subnet with no ability to reach the LLM, the 向量資料庫, or the internet.
Data Exfiltration Prevention
AI workloads present unique data exfiltration risks 因為 模型 itself can encode sensitive data into its outputs. A 提示詞注入 attack that instructs 模型 to encode PII into its response text can exfiltrate data through the legitimate response channel, bypassing traditional DLP tools that inspect network traffic.
Network-level controls complement but do not replace application-level 輸出 filtering:
- Egress filtering: Block all outbound traffic from the 推論 zone except to approved private endpoints. This prevents direct exfiltration via arbitrary HTTP calls.
- DNS exfiltration prevention: Use a private DNS resolver that logs all queries and blocks queries to suspicious or newly-registered domains. DNS tunneling is a common exfiltration technique.
- Response size 監控: 實作 alerts for unusually large model API responses, which may indicate data extraction attempts through the response channel.
- Content inspection: Deploy a reverse proxy between the 推論 layer and the application layer that inspects model responses for PII, 系統提示詞 fragments, and encoded data before they reach users.
Network Policy Drift 偵測
Network configurations drift over time. New services are added, 安全 groups are modified, and routing rules change. 實作 continuous 監控 for network policy violations:
- Use AWS VPC Flow Logs, Azure NSG Flow Logs, or GCP VPC Flow Logs to capture all traffic between zones
- Feed flow logs into the AINetworkMonitor to detect unauthorized traffic patterns
- Alert on any traffic flow that does not match an explicit allow policy
- Run a weekly network policy audit that compares the current 安全 group and network ACL configurations against the baseline defined in the policy catalog
Validation and 測試
After 實作 network isolation, validate that the controls work as expected:
-
Positive tests: Verify that legitimate AI traffic flows through private endpoints. Confirm that model API calls succeed without public internet access.
-
Negative tests: Attempt to reach 雲端 AI services from subnets that should not have access. Verify that direct internet access from the 推論 and data zones is blocked.
-
Exfiltration simulation: Attempt to exfiltrate data from the 推論 zone to an external HTTP endpoint. Verify that the network 監控 detects and alerts on the attempt.
-
DNS leak 測試: Verify that DNS resolution for AI service endpoints resolves to private endpoint IPs, not public IPs. Run
nslookup bedrock-runtime.us-east-1.amazonaws.comfrom within the VPC and confirm it returns a private IP address. -
Failover 測試: Disable the private endpoint and verify that AI traffic does not automatically fail over to public internet access. The application should fail with a connection error rather than silently routing through the public internet.
-
Lateral movement 測試: From a compromised container in the application zone, attempt to directly access the 向量資料庫 in the data zone. The connection should be blocked — only the 推論 zone should have data zone access.
-
代理 tool isolation 測試: If using 代理 architectures, verify that tool execution environments (code sandboxes, web browsers) cannot reach internal services. Send a crafted 工具呼叫 that attempts to make an HTTP request to an internal IP address and verify it is blocked.
Common Mistakes
-
Enabling private DNS without disabling public access. Creating a VPC endpoint for Bedrock but not disabling the public endpoint means the application can still fall back to public internet access if the private endpoint fails. Explicitly deny public access using IAM conditions or provider-level settings.
-
Over-broad 安全 groups. 安全 groups that allow all traffic from 10.0.0.0/16 to the VPC endpoint defeat the purpose of zone segmentation. Each 安全 group should allow traffic only from its intended source subnet.
-
Forgetting about DNS. VPC endpoints require private DNS to be enabled, and the VPC must have DNS hostnames and DNS resolution enabled. Without proper DNS configuration, API calls will resolve to public IPs and bypass the private endpoint.
-
Not accounting for model API streaming. Many AI applications use streaming responses (Server-Sent Events). Network configurations that close idle connections after a short timeout will interrupt streaming responses. Configure load balancers and NAT gateways with appropriate idle timeout values (at least 300 seconds for AI 推論).
-
Ignoring the management plane. Securing the 推論 and data planes while leaving the management plane (secrets manager, deployment tools, 監控) on the public internet creates a 後門. Apply the same network isolation principles to management traffic.
-
Using public model endpoints as fallback. Some AI SDK client libraries automatically fall back to public endpoints when private endpoints are unavailable. Verify that your client configuration explicitly fails rather than falling back, and 測試 this behavior during endpoint outages.
Cost Considerations
Network isolation adds cost to AI deployments. VPC endpoints incur hourly charges and per-GB data processing fees. Private Link across providers adds transit costs. NAT gateways for controlled egress charge per GB processed. For AI workloads that generate significant API traffic — particularly 嵌入向量 generation pipelines that process millions of documents — these costs can be substantial.
然而, the alternative — running AI traffic over the public internet with sensitive prompts, system configurations, and user data — carries risk costs that typically far exceed the network infrastructure costs. A single data breach involving model-exposed PII can cost millions in regulatory fines, legal fees, and reputation damage. Frame the network isolation investment against this risk cost, not against the zero-cost baseline of using public endpoints.
For cost optimization, 考慮 that VPC endpoints can be shared across multiple services in the same VPC. A single Bedrock Runtime endpoint serves all applications in the VPC. Use AWS PrivateLink endpoint policies to restrict which applications can use the endpoint rather than creating separate endpoints per application. Similarly, on Azure, a single Private Endpoint for Azure OpenAI Service can serve all workloads within the virtual network, with network 安全 groups controlling which subnets can reach the endpoint.
參考文獻
- AWS. "Access Amazon Bedrock Using an Interface VPC Endpoint," https://docs.aws.amazon.com/bedrock/latest/userguide/vpc-interface-endpoints.html
- Microsoft. "Configure Azure AI Services Virtual Networks," https://learn.microsoft.com/en-us/azure/ai-services/cognitive-services-virtual-networks
- Google 雲端. "VPC Service Controls for Vertex AI," https://雲端.google.com/vertex-ai/docs/general/vpc-service-controls
- NIST SP 800-125B, "Secure Virtual Network Configuration for Virtual Machine Monitors," https://csrc.nist.gov/publications/detail/sp/800-125b/final