Network Isolation for Cloud AI Workloads
Implementing network isolation strategies for cloud AI deployments including private endpoints, VPC configurations, service mesh integration, and data plane segmentation for LLM inference and training workloads.
Overview
Most cloud AI deployments send model API traffic over the public internet. When an application calls the OpenAI API, Azure OpenAI, or AWS Bedrock, the request travels from the application's compute environment, through the public internet, to the provider's API endpoint. This default configuration exposes sensitive data — prompts, model responses, system prompts, and user information — to network-level interception risks, and creates opportunities for data exfiltration that bypass application-level controls.
Network isolation for AI workloads goes beyond simply enabling a VPC endpoint. A properly isolated AI deployment segments traffic into distinct security zones: the inference plane (where model API calls happen), the data plane (where training data, embeddings, and documents flow), the control plane (where model configuration, guardrails, and monitoring operate), and the management plane (where IAM, secrets, and deployment tools reside). Each zone has different trust levels, data sensitivity, and access requirements.
This article provides implementation guidance for network isolation across all three major cloud providers, with specific attention to the unique networking requirements of AI workloads: high-bandwidth model inference traffic, long-lived training job connections, real-time embedding pipeline data flows, and the complex multi-service communication patterns of agent-based architectures.
AI Network Architecture Patterns
Reference Architecture: Isolated AI Deployment
┌─────────────────────────────────────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ ┌───────────────────┐ ┌───────────────────┐ ┌──────────────┐ │
│ │ Application Subnet │ │ AI Inference │ │ Data Subnet │ │
│ │ (10.0.1.0/24) │ │ Subnet │ │ (10.0.3.0/24)│ │
│ │ │ │ (10.0.2.0/24) │ │ │ │
│ │ ┌─────────────┐ │ │ ┌──────────────┐ │ │ ┌──────────┐│ │
│ │ │ API Gateway │ │ │ │ Model Proxy │ │ │ │Vector DB ││ │
│ │ │ (ALB) │───────│ │ / Guardrails │ │ │ │(Pinecone ││ │
│ │ └─────────────┘ │ │ └──────┬───────┘ │ │ │ Private) ││ │
│ │ ┌─────────────┐ │ │ │ │ │ └──────────┘│ │
│ │ │ App Servers │───────│ ┌──────▼───────┐ │ │ ┌──────────┐│ │
│ │ │ (ECS/EKS) │ │ │ │ VPC Endpoint │ │ │ │S3 Bucket ││ │
│ │ └─────────────┘ │ │ │ (Bedrock) │ │ │ │(Training)││ │
│ └───────────────────┘ │ └──────┬───────┘ │ │ └──────────┘│ │
│ │ │ │ └──────────────┘ │
│ │ ▼ │ │
│ │ AWS Bedrock │ │
│ │ (Private Link) │ │
│ └───────────────────┘ │
│ ┌───────────────────┐ │
│ │ Management Subnet │ ← Only accessible from VPN/bastion │
│ │ (10.0.4.0/24) │ │
│ │ Secrets Manager, IAM, CloudWatch, Deployment Tools │
│ └───────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Network Zone Definitions
# network_isolation/zones.py
"""Network zone definitions and policy enforcement for AI workloads."""
from __future__ import annotations
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
class NetworkZone(Enum):
"""Security zones for AI workload network segmentation."""
APPLICATION = "application" # User-facing application layer
INFERENCE = "inference" # Model inference and guardrails
DATA = "data" # Vector DBs, document stores, training data
MANAGEMENT = "management" # Secrets, IAM, monitoring, deployment
EXTERNAL_AI = "external_ai" # Cloud AI service endpoints (Bedrock, etc.)
class TrafficDirection(Enum):
INGRESS = "ingress"
EGRESS = "egress"
@dataclass
class NetworkPolicy:
"""A network policy rule for AI workload segmentation."""
policy_id: str
name: str
source_zone: NetworkZone
destination_zone: NetworkZone
direction: TrafficDirection
protocol: str # tcp, udp, https
port_range: str # "443", "6333-6334", "8080"
action: str # allow, deny
justification: str
data_classification: str # What type of data flows on this path
# Define the allowed traffic flows between zones
AI_NETWORK_POLICIES: list[NetworkPolicy] = [
# Application → Inference: App sends prompts to the model proxy
NetworkPolicy(
policy_id="NET-001",
name="App to inference proxy",
source_zone=NetworkZone.APPLICATION,
destination_zone=NetworkZone.INFERENCE,
direction=TrafficDirection.EGRESS,
protocol="https",
port_range="443",
action="allow",
justification="Application sends user prompts to the inference proxy for model processing",
data_classification="confidential",
),
# Inference → External AI: Proxy calls cloud AI service via private endpoint
NetworkPolicy(
policy_id="NET-002",
name="Inference to cloud AI (private endpoint)",
source_zone=NetworkZone.INFERENCE,
destination_zone=NetworkZone.EXTERNAL_AI,
direction=TrafficDirection.EGRESS,
protocol="https",
port_range="443",
action="allow",
justification="Inference proxy forwards validated requests to Bedrock/Azure OpenAI via VPC endpoint",
data_classification="confidential",
),
# Inference → Data: RAG pipeline retrieves context from vector database
NetworkPolicy(
policy_id="NET-003",
name="Inference to vector database",
source_zone=NetworkZone.INFERENCE,
destination_zone=NetworkZone.DATA,
direction=TrafficDirection.EGRESS,
protocol="https",
port_range="6333-6334",
action="allow",
justification="RAG pipeline retrieves relevant context from vector database for augmented generation",
data_classification="confidential",
),
# Inference → Management: Fetch secrets and send logs
NetworkPolicy(
policy_id="NET-004",
name="Inference to secrets manager",
source_zone=NetworkZone.INFERENCE,
destination_zone=NetworkZone.MANAGEMENT,
direction=TrafficDirection.EGRESS,
protocol="https",
port_range="443",
action="allow",
justification="Inference layer retrieves API credentials from secrets manager",
data_classification="restricted",
),
# DENY: Application should never directly access cloud AI endpoints
NetworkPolicy(
policy_id="NET-010",
name="Block app direct AI access",
source_zone=NetworkZone.APPLICATION,
destination_zone=NetworkZone.EXTERNAL_AI,
direction=TrafficDirection.EGRESS,
protocol="https",
port_range="443",
action="deny",
justification="All model API calls must go through the inference proxy for security inspection",
data_classification="n/a",
),
# DENY: Data zone should never have internet access
NetworkPolicy(
policy_id="NET-011",
name="Block data zone internet",
source_zone=NetworkZone.DATA,
destination_zone=NetworkZone.APPLICATION,
direction=TrafficDirection.EGRESS,
protocol="tcp",
port_range="0-65535",
action="deny",
justification="Data zone contains sensitive training data and embeddings; no internet access permitted",
data_classification="n/a",
),
# DENY: Inference zone should not initiate connections to arbitrary internet hosts
NetworkPolicy(
policy_id="NET-012",
name="Block inference internet egress",
source_zone=NetworkZone.INFERENCE,
destination_zone=NetworkZone.APPLICATION, # Represents "internet"
direction=TrafficDirection.EGRESS,
protocol="tcp",
port_range="0-65535",
action="deny",
justification="Inference zone should only communicate with approved endpoints via private links",
data_classification="n/a",
),
]
def validate_traffic_flow(
source: NetworkZone,
destination: NetworkZone,
port: int,
protocol: str = "https",
) -> dict:
"""
Validate whether a traffic flow is permitted by the network policy.
Args:
source: Originating network zone.
destination: Target network zone.
port: Destination port number.
protocol: Network protocol.
Returns:
Validation result with matching policy and action.
"""
matching_policies = []
for policy in AI_NETWORK_POLICIES:
if policy.source_zone != source or policy.destination_zone != destination:
continue
if policy.protocol != protocol and policy.protocol != "tcp":
continue
# Check port range
if "-" in policy.port_range:
low, high = policy.port_range.split("-")
if int(low) <= port <= int(high):
matching_policies.append(policy)
elif port == int(policy.port_range):
matching_policies.append(policy)
if not matching_policies:
return {
"allowed": False,
"reason": "No matching policy — default deny",
"matching_policies": [],
}
# If any explicit deny matches, traffic is blocked
deny_policies = [p for p in matching_policies if p.action == "deny"]
if deny_policies:
return {
"allowed": False,
"reason": f"Explicitly denied by policy {deny_policies[0].policy_id}",
"matching_policies": [p.policy_id for p in deny_policies],
}
allow_policies = [p for p in matching_policies if p.action == "allow"]
if allow_policies:
return {
"allowed": True,
"reason": f"Allowed by policy {allow_policies[0].policy_id}",
"matching_policies": [p.policy_id for p in allow_policies],
"data_classification": allow_policies[0].data_classification,
}
return {"allowed": False, "reason": "No allow policy matched", "matching_policies": []}Provider-Specific Implementation
AWS: Private Endpoints for Bedrock
# network_isolation/aws_bedrock_private.py
"""AWS VPC Endpoint configuration for private Bedrock access."""
from __future__ import annotations
import json
import logging
from typing import Optional
import boto3
logger = logging.getLogger(__name__)
class BedrockPrivateEndpointSetup:
"""Configures private VPC endpoint access for AWS Bedrock."""
def __init__(self, region: str = "us-east-1"):
self.region = region
self.ec2_client = boto3.client("ec2", region_name=region)
self.bedrock_client = boto3.client("bedrock", region_name=region)
def create_vpc_endpoint(
self,
vpc_id: str,
subnet_ids: list[str],
security_group_ids: list[str],
) -> dict:
"""
Create a VPC Interface Endpoint for AWS Bedrock Runtime.
This eliminates the need for Bedrock API traffic to traverse
the public internet.
Args:
vpc_id: ID of the VPC where the endpoint will be created.
subnet_ids: Subnets in the inference zone for ENI placement.
security_group_ids: Security groups to attach to the endpoint.
Returns:
Endpoint creation response with endpoint ID and DNS entries.
"""
try:
response = self.ec2_client.create_vpc_endpoint(
VpcEndpointType="Interface",
ServiceName=f"com.amazonaws.{self.region}.bedrock-runtime",
VpcId=vpc_id,
SubnetIds=subnet_ids,
SecurityGroupIds=security_group_ids,
PrivateDnsEnabled=True, # Enables automatic DNS resolution
TagSpecifications=[{
"ResourceType": "vpc-endpoint",
"Tags": [
{"Key": "Name", "Value": "bedrock-runtime-private"},
{"Key": "SecurityZone", "Value": "inference"},
{"Key": "ManagedBy", "Value": "ai-security-team"},
],
}],
)
endpoint_id = response["VpcEndpoint"]["VpcEndpointId"]
dns_entries = response["VpcEndpoint"].get("DnsEntries", [])
logger.info(f"Created Bedrock VPC endpoint: {endpoint_id}")
return {
"endpoint_id": endpoint_id,
"dns_entries": dns_entries,
"status": "pending",
}
except Exception as e:
logger.error(f"Failed to create Bedrock VPC endpoint: {e}")
raise
def create_endpoint_security_group(
self,
vpc_id: str,
inference_subnet_cidr: str,
) -> str:
"""
Create a security group for the Bedrock VPC endpoint.
Only allows HTTPS traffic from the inference subnet.
"""
try:
response = self.ec2_client.create_security_group(
GroupName="bedrock-endpoint-sg",
Description="Security group for Bedrock VPC endpoint - inference zone only",
VpcId=vpc_id,
TagSpecifications=[{
"ResourceType": "security-group",
"Tags": [
{"Key": "Name", "Value": "bedrock-endpoint-sg"},
{"Key": "SecurityZone", "Value": "inference"},
],
}],
)
sg_id = response["GroupId"]
# Allow HTTPS from inference subnet only
self.ec2_client.authorize_security_group_ingress(
GroupId=sg_id,
IpPermissions=[{
"IpProtocol": "tcp",
"FromPort": 443,
"ToPort": 443,
"IpRanges": [{
"CidrIp": inference_subnet_cidr,
"Description": "HTTPS from inference subnet to Bedrock endpoint",
}],
}],
)
logger.info(f"Created endpoint security group: {sg_id}")
return sg_id
except Exception as e:
logger.error(f"Failed to create security group: {e}")
raise
def create_endpoint_policy(self) -> dict:
"""
Create a VPC endpoint policy that restricts which Bedrock operations
and models can be accessed through this endpoint.
"""
policy = {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowInvokeApprovedModels",
"Effect": "Allow",
"Principal": {"AWS": "*"},
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
],
"Resource": [
f"arn:aws:bedrock:{self.region}::foundation-model/anthropic.claude-*",
f"arn:aws:bedrock:{self.region}::foundation-model/amazon.titan-*",
],
"Condition": {
"StringEquals": {
"aws:PrincipalOrgID": "o-your-org-id",
},
},
},
{
"Sid": "DenyUnapprovedModels",
"Effect": "Deny",
"Principal": {"AWS": "*"},
"Action": "bedrock:InvokeModel",
"Resource": "*",
"Condition": {
"StringNotLike": {
"bedrock:ModelId": [
"anthropic.claude-*",
"amazon.titan-*",
],
},
},
},
],
}
return policyAzure: Private Link for Azure OpenAI
# network_isolation/azure_openai_private.py
"""Azure Private Link configuration for Azure OpenAI Service."""
from __future__ import annotations
import logging
from typing import Optional
logger = logging.getLogger(__name__)
def generate_azure_private_endpoint_bicep(
resource_group: str,
vnet_name: str,
subnet_name: str,
openai_account_name: str,
location: str = "eastus",
) -> str:
"""
Generate Azure Bicep template for Azure OpenAI Private Endpoint.
Returns the Bicep template as a string for deployment.
"""
return f"""
// Azure OpenAI Private Endpoint - Bicep Template
// Deploys a private endpoint for Azure OpenAI Service in the inference subnet
param location string = '{location}'
param vnetName string = '{vnet_name}'
param subnetName string = '{subnet_name}'
param openaiAccountName string = '{openai_account_name}'
resource vnet 'Microsoft.Network/virtualNetworks@2023-09-01' existing = {{
name: vnetName
}}
resource subnet 'Microsoft.Network/virtualNetworks/subnets@2023-09-01' existing = {{
parent: vnet
name: subnetName
}}
resource openaiAccount 'Microsoft.CognitiveServices/accounts@2023-10-01-preview' existing = {{
name: openaiAccountName
}}
resource privateEndpoint 'Microsoft.Network/privateEndpoints@2023-09-01' = {{
name: '${{openaiAccountName}}-pe'
location: location
properties: {{
subnet: {{
id: subnet.id
}}
privateLinkServiceConnections: [
{{
name: '${{openaiAccountName}}-connection'
properties: {{
privateLinkServiceId: openaiAccount.id
groupIds: [
'account'
]
}}
}}
]
}}
tags: {{
SecurityZone: 'inference'
ManagedBy: 'ai-security-team'
}}
}}
resource privateDnsZone 'Microsoft.Network/privateDnsZones@2020-06-01' = {{
name: 'privatelink.openai.azure.com'
location: 'global'
}}
resource privateDnsZoneLink 'Microsoft.Network/privateDnsZones/virtualNetworkLinks@2020-06-01' = {{
parent: privateDnsZone
name: '${{vnetName}}-link'
location: 'global'
properties: {{
virtualNetwork: {{
id: vnet.id
}}
registrationEnabled: false
}}
}}
resource privateDnsZoneGroup 'Microsoft.Network/privateEndpoints/privateDnsZoneGroups@2023-09-01' = {{
parent: privateEndpoint
name: 'default'
properties: {{
privateDnsZoneConfigs: [
{{
name: 'openai-dns-config'
properties: {{
privateDnsZoneId: privateDnsZone.id
}}
}}
]
}}
}}
// Disable public network access on the OpenAI account
resource openaiNetworkRules 'Microsoft.CognitiveServices/accounts@2023-10-01-preview' = {{
name: openaiAccountName
location: location
properties: {{
publicNetworkAccess: 'Disabled'
networkAcls: {{
defaultAction: 'Deny'
}}
}}
}}
output privateEndpointId string = privateEndpoint.id
output privateDnsZoneId string = privateDnsZone.id
"""Network Monitoring for AI Workloads
# network_isolation/monitoring.py
"""Network monitoring and anomaly detection for AI workloads."""
from __future__ import annotations
from dataclasses import dataclass
from datetime import datetime
from typing import Optional
@dataclass
class NetworkAnomaly:
"""A detected network anomaly in AI workload traffic."""
anomaly_id: str
severity: str # critical, high, medium, low
anomaly_type: str
source_zone: str
destination: str
description: str
detected_at: datetime
evidence: dict
class AINetworkMonitor:
"""Monitors network traffic patterns for AI workload anomalies."""
def __init__(self):
self.anomalies: list[NetworkAnomaly] = []
self._anomaly_counter = 0
def analyze_flow_log(self, flow_record: dict) -> Optional[NetworkAnomaly]:
"""
Analyze a VPC flow log record for AI-specific network anomalies.
Checks for:
- Unexpected egress from inference zone to internet
- Data zone attempting external connections
- Unusually large data transfers (potential model/data exfiltration)
- Connections to unapproved AI service endpoints
"""
src_ip = flow_record.get("srcaddr", "")
dst_ip = flow_record.get("dstaddr", "")
dst_port = flow_record.get("dstport", 0)
bytes_transferred = flow_record.get("bytes", 0)
action = flow_record.get("action", "")
# Check for data zone egress attempts
if self._is_data_zone(src_ip) and not self._is_internal(dst_ip):
self._anomaly_counter += 1
return NetworkAnomaly(
anomaly_id=f"NET-ANOM-{self._anomaly_counter:04d}",
severity="critical",
anomaly_type="data_zone_egress",
source_zone="data",
destination=dst_ip,
description=(
f"Data zone IP {src_ip} attempted external connection to {dst_ip}:{dst_port}. "
"Data zone should have no internet access."
),
detected_at=datetime.utcnow(),
evidence=flow_record,
)
# Check for large data transfers from inference zone
if self._is_inference_zone(src_ip) and bytes_transferred > 100 * 1024 * 1024:
self._anomaly_counter += 1
return NetworkAnomaly(
anomaly_id=f"NET-ANOM-{self._anomaly_counter:04d}",
severity="high",
anomaly_type="large_transfer",
source_zone="inference",
destination=dst_ip,
description=(
f"Large data transfer ({bytes_transferred / 1024 / 1024:.1f} MB) "
f"from inference zone to {dst_ip}. "
"Potential model weights or training data exfiltration."
),
detected_at=datetime.utcnow(),
evidence=flow_record,
)
# Check for connections to unapproved AI endpoints
approved_ai_endpoints = [
"bedrock-runtime", "openai.azure.com",
"aiplatform.googleapis.com",
]
if (self._is_inference_zone(src_ip)
and not self._is_internal(dst_ip)
and not any(ep in str(dst_ip) for ep in approved_ai_endpoints)):
self._anomaly_counter += 1
return NetworkAnomaly(
anomaly_id=f"NET-ANOM-{self._anomaly_counter:04d}",
severity="medium",
anomaly_type="unapproved_endpoint",
source_zone="inference",
destination=dst_ip,
description=(
f"Inference zone connected to unapproved external endpoint {dst_ip}:{dst_port}. "
"Only approved AI service endpoints should be accessible."
),
detected_at=datetime.utcnow(),
evidence=flow_record,
)
return None
def _is_data_zone(self, ip: str) -> bool:
"""Check if an IP belongs to the data zone subnet."""
return ip.startswith("10.0.3.")
def _is_inference_zone(self, ip: str) -> bool:
"""Check if an IP belongs to the inference zone subnet."""
return ip.startswith("10.0.2.")
def _is_internal(self, ip: str) -> bool:
"""Check if an IP is in a private RFC1918 range."""
return (ip.startswith("10.") or ip.startswith("172.16.")
or ip.startswith("192.168."))Network Isolation for Agent Architectures
Agent-based AI systems that use MCP servers, function calling, or tool-use frameworks create particularly complex network requirements. An agent may need to communicate with multiple external services during a single conversation turn — an LLM API, a vector database, a web search API, a code execution sandbox, and a file storage service. Each of these connections has different trust levels and data sensitivity.
Micro-Segmentation for Agent Tool Calls
Apply micro-segmentation principles to agent tool-call traffic. Rather than allowing the agent's compute environment to reach all tool endpoints directly, route tool calls through a proxy layer that enforces per-tool network policies:
- LLM API calls: Route through the inference zone's VPC endpoint (private, no internet exposure)
- Vector database queries: Route within the VPC to the data zone (private, no internet exposure)
- Web search API calls: Route through a NAT gateway with domain allow-listing (only approved search API domains)
- Code execution: Route to an isolated sandbox subnet with no egress (the sandbox has no network access beyond receiving the code payload and returning results)
This architecture prevents a compromised tool from becoming a lateral movement vector. If an attacker achieves code execution through a code interpreter tool, they are contained in the sandbox subnet with no ability to reach the LLM, the vector database, or the internet.
Data Exfiltration Prevention
AI workloads present unique data exfiltration risks because the model itself can encode sensitive data into its outputs. A prompt injection attack that instructs the model to encode PII into its response text can exfiltrate data through the legitimate response channel, bypassing traditional DLP tools that inspect network traffic.
Network-level controls complement but do not replace application-level output filtering:
- Egress filtering: Block all outbound traffic from the inference zone except to approved private endpoints. This prevents direct exfiltration via arbitrary HTTP calls.
- DNS exfiltration prevention: Use a private DNS resolver that logs all queries and blocks queries to suspicious or newly-registered domains. DNS tunneling is a common exfiltration technique.
- Response size monitoring: Implement alerts for unusually large model API responses, which may indicate data extraction attempts through the response channel.
- Content inspection: Deploy a reverse proxy between the inference layer and the application layer that inspects model responses for PII, system prompt fragments, and encoded data before they reach users.
Network Policy Drift Detection
Network configurations drift over time. New services are added, security groups are modified, and routing rules change. Implement continuous monitoring for network policy violations:
- Use AWS VPC Flow Logs, Azure NSG Flow Logs, or GCP VPC Flow Logs to capture all traffic between zones
- Feed flow logs into the AINetworkMonitor to detect unauthorized traffic patterns
- Alert on any traffic flow that does not match an explicit allow policy
- Run a weekly network policy audit that compares the current security group and network ACL configurations against the baseline defined in the policy catalog
Validation and Testing
After implementing network isolation, validate that the controls work as expected:
-
Positive tests: Verify that legitimate AI traffic flows through private endpoints. Confirm that model API calls succeed without public internet access.
-
Negative tests: Attempt to reach cloud AI services from subnets that should not have access. Verify that direct internet access from the inference and data zones is blocked.
-
Exfiltration simulation: Attempt to exfiltrate data from the inference zone to an external HTTP endpoint. Verify that the network monitoring detects and alerts on the attempt.
-
DNS leak testing: Verify that DNS resolution for AI service endpoints resolves to private endpoint IPs, not public IPs. Run
nslookup bedrock-runtime.us-east-1.amazonaws.comfrom within the VPC and confirm it returns a private IP address. -
Failover testing: Disable the private endpoint and verify that AI traffic does not automatically fail over to public internet access. The application should fail with a connection error rather than silently routing through the public internet.
-
Lateral movement testing: From a compromised container in the application zone, attempt to directly access the vector database in the data zone. The connection should be blocked — only the inference zone should have data zone access.
-
Agent tool isolation testing: If using agent architectures, verify that tool execution environments (code sandboxes, web browsers) cannot reach internal services. Send a crafted tool call that attempts to make an HTTP request to an internal IP address and verify it is blocked.
Common Mistakes
-
Enabling private DNS without disabling public access. Creating a VPC endpoint for Bedrock but not disabling the public endpoint means the application can still fall back to public internet access if the private endpoint fails. Explicitly deny public access using IAM conditions or provider-level settings.
-
Over-broad security groups. Security groups that allow all traffic from 10.0.0.0/16 to the VPC endpoint defeat the purpose of zone segmentation. Each security group should allow traffic only from its intended source subnet.
-
Forgetting about DNS. VPC endpoints require private DNS to be enabled, and the VPC must have DNS hostnames and DNS resolution enabled. Without proper DNS configuration, API calls will resolve to public IPs and bypass the private endpoint.
-
Not accounting for model API streaming. Many AI applications use streaming responses (Server-Sent Events). Network configurations that close idle connections after a short timeout will interrupt streaming responses. Configure load balancers and NAT gateways with appropriate idle timeout values (at least 300 seconds for AI inference).
-
Ignoring the management plane. Securing the inference and data planes while leaving the management plane (secrets manager, deployment tools, monitoring) on the public internet creates a backdoor. Apply the same network isolation principles to management traffic.
-
Using public model endpoints as fallback. Some AI SDK client libraries automatically fall back to public endpoints when private endpoints are unavailable. Verify that your client configuration explicitly fails rather than falling back, and test this behavior during endpoint outages.
Cost Considerations
Network isolation adds cost to AI deployments. VPC endpoints incur hourly charges and per-GB data processing fees. Private Link across providers adds transit costs. NAT gateways for controlled egress charge per GB processed. For AI workloads that generate significant API traffic — particularly embedding generation pipelines that process millions of documents — these costs can be substantial.
However, the alternative — running AI traffic over the public internet with sensitive prompts, system configurations, and user data — carries risk costs that typically far exceed the network infrastructure costs. A single data breach involving model-exposed PII can cost millions in regulatory fines, legal fees, and reputation damage. Frame the network isolation investment against this risk cost, not against the zero-cost baseline of using public endpoints.
For cost optimization, consider that VPC endpoints can be shared across multiple services in the same VPC. A single Bedrock Runtime endpoint serves all applications in the VPC. Use AWS PrivateLink endpoint policies to restrict which applications can use the endpoint rather than creating separate endpoints per application. Similarly, on Azure, a single Private Endpoint for Azure OpenAI Service can serve all workloads within the virtual network, with network security groups controlling which subnets can reach the endpoint.
References
- AWS. "Access Amazon Bedrock Using an Interface VPC Endpoint," https://docs.aws.amazon.com/bedrock/latest/userguide/vpc-interface-endpoints.html
- Microsoft. "Configure Azure AI Services Virtual Networks," https://learn.microsoft.com/en-us/azure/ai-services/cognitive-services-virtual-networks
- Google Cloud. "VPC Service Controls for Vertex AI," https://cloud.google.com/vertex-ai/docs/general/vpc-service-controls
- NIST SP 800-125B, "Secure Virtual Network Configuration for Virtual Machine Monitors," https://csrc.nist.gov/publications/detail/sp/800-125b/final