Data Residency and Sovereignty for Cloud AI
Managing data residency, sovereignty, and cross-border transfer requirements for cloud AI services including GDPR, AI Act, and regional model deployment strategies.
Overview
Data residency and sovereignty are among the most complex challenges in cloud AI security. When an organization sends a prompt to a cloud-hosted foundation model, the data in that prompt may cross jurisdictional boundaries, be logged in regions the organization did not anticipate, or be used for model improvement in ways that violate data protection commitments. For custom model training, the training data, model weights, and inference results all have their own residency requirements that may differ from each other.
The regulatory landscape makes this even more demanding. The EU AI Act (entered force August 2024, with provisions phasing in through 2027) imposes obligations on providers and deployers of AI systems that interact with data sovereignty. GDPR's data transfer restrictions apply to any personal data in prompts or training sets. Industry-specific regulations (HIPAA, financial services regulations, government classification requirements) add further constraints.
This article maps out the data flows that create residency risks, shows how to configure region-locked deployments on each major cloud, and provides assessment techniques for verifying that data stays where it should.
Understanding AI Data Flows
Where Data Moves in Cloud AI
A single API call to a cloud AI service creates multiple data flows, each with its own residency implications:
User Input (Prompt)
|
v
[Application Layer] -- Region A
|
v
[API Gateway / Load Balancer] -- May route to any region
|
v
[AI Service Endpoint] -- Region B (configured)
|
+---> [Model Inference] -- Region B
| |
| v
| [Response Generation]
|
+---> [Logging / Telemetry] -- Region C (may differ)
|
+---> [Content Safety Filter] -- Region D (may differ)
|
+---> [Abuse Monitoring] -- Region E (global)
|
+---> [Model Improvement] -- Varies by provider opt-in/opt-out
|
v
[Response to Application]
The critical insight is that even when you deploy the AI service in a specific region, supporting services (logging, content filtering, abuse monitoring) may process data in different regions. Each cloud provider handles this differently.
Data Classification for AI Workloads
Before configuring residency controls, classify the data flowing through your AI system:
| Data Type | Examples | Residency Sensitivity | Typical Regulation |
|---|---|---|---|
| Prompts containing PII | Customer support conversations, HR queries | High | GDPR Art. 44-49, CCPA |
| Prompts with trade secrets | Code review, strategy documents | High | Trade secret law, NDA obligations |
| Training data with personal data | Customer interaction logs, annotated datasets | Very High | GDPR Art. 44-49, sector regulations |
| Model weights (fine-tuned) | Custom models trained on organization data | Medium | May encode personal data (GDPR recital 26) |
| Inference results | Model outputs, classifications, summaries | Medium-High | Derived data may inherit source classification |
| Usage metadata | Token counts, latency, model version | Low | Generally not regulated |
| Content filter logs | Blocked prompts and reasons | High | Contains the harmful/sensitive content itself |
AWS Data Residency for AI Services
Bedrock Region Configuration
AWS Bedrock processes data in the region where the service is invoked. However, model availability varies by region, creating tension between residency requirements and model access.
import boto3
from botocore.config import Config
def create_region_locked_bedrock_client(
region: str,
enforce_region: bool = True,
) -> dict:
"""Create a Bedrock client locked to a specific region with verification."""
# Prevent any cross-region fallback
config = Config(
region_name=region,
retries={"max_attempts": 3, "mode": "standard"},
)
session = boto3.Session(region_name=region)
bedrock = session.client("bedrock", config=config)
bedrock_runtime = session.client("bedrock-runtime", config=config)
if enforce_region:
# Verify we are talking to the intended region
available_models = bedrock.list_foundation_models()
model_count = len(available_models.get("modelSummaries", []))
return {
"region": region,
"runtime_endpoint": f"https://bedrock-runtime.{region}.amazonaws.com",
"management_endpoint": f"https://bedrock.{region}.amazonaws.com",
"models_available": model_count,
"clients": {
"bedrock": bedrock,
"bedrock_runtime": bedrock_runtime,
},
}
return {"bedrock": bedrock, "bedrock_runtime": bedrock_runtime}
def generate_region_restriction_scp(allowed_regions: list) -> dict:
"""Generate an SCP that restricts Bedrock to specific regions."""
return {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RestrictBedrockRegions",
"Effect": "Deny",
"Action": [
"bedrock:*",
],
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:RequestedRegion": allowed_regions
}
},
},
{
"Sid": "RestrictSageMakerRegions",
"Effect": "Deny",
"Action": [
"sagemaker:*",
],
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:RequestedRegion": allowed_regions
}
},
},
],
}AWS Data Processing Addendum Considerations
AWS commits to processing data in the region you select for Bedrock invocations. However, several ancillary data flows require attention:
-
Model invocation logging: If enabled, logs are stored in the S3 bucket or CloudWatch log group you specify. These must be in the same region as the Bedrock endpoint.
-
Abuse monitoring: AWS may process data for abuse detection outside the selected region, subject to the AWS Data Processing Addendum. Organizations with strict residency requirements should review the DPA carefully.
-
CloudTrail logs: Management events are logged in the region where the API call is made. Data events (if enabled) follow the same pattern. CloudTrail organization trails may aggregate logs to a central bucket in a different region.
def audit_bedrock_data_residency(
session: boto3.Session,
target_region: str,
) -> dict:
"""Audit Bedrock deployment for data residency compliance."""
findings = []
# Check invocation logging destination
bedrock = session.client("bedrock", region_name=target_region)
try:
logging_config = bedrock.get_model_invocation_logging_configuration()
config = logging_config.get("loggingConfig", {})
# Check S3 logging destination region
s3_config = config.get("s3Config", {})
if s3_config:
bucket_name = s3_config.get("bucketName", "")
s3 = session.client("s3")
try:
location = s3.get_bucket_location(Bucket=bucket_name)
bucket_region = location.get("LocationConstraint") or "us-east-1"
if bucket_region != target_region:
findings.append({
"severity": "HIGH",
"finding": "Invocation logs stored in different region",
"detail": f"Bedrock is in {target_region} but logs go to "
f"S3 bucket in {bucket_region}. This creates "
f"cross-region data transfer of prompts/responses.",
})
except Exception:
pass
# Check CloudWatch logging destination
cw_config = config.get("cloudWatchConfig", {})
if cw_config:
log_group = cw_config.get("logGroupName", "")
findings.append({
"severity": "INFO",
"finding": f"CloudWatch logging to {log_group}",
"detail": "CloudWatch log groups are regional. Verify this log group "
"is in the same region as the Bedrock endpoint.",
})
except Exception:
findings.append({
"severity": "MEDIUM",
"finding": "Cannot verify invocation logging configuration",
})
# Check CloudTrail configuration
cloudtrail = session.client("cloudtrail", region_name=target_region)
try:
trails = cloudtrail.describe_trails()
for trail in trails.get("trailList", []):
if trail.get("IsMultiRegionTrail"):
s3_bucket = trail.get("S3BucketName", "")
findings.append({
"severity": "MEDIUM",
"finding": "Multi-region CloudTrail trail detected",
"detail": f"Trail {trail['Name']} aggregates logs from all regions "
f"to bucket {s3_bucket}. Bedrock management events from "
f"{target_region} may be stored in a different region.",
})
except Exception:
pass
return {"target_region": target_region, "findings": findings}Azure Data Residency for AI Services
Azure OpenAI Regional Deployment
Azure OpenAI Service processes data in the region where the resource is created. Azure provides stronger regional commitments than some other providers through its EU Data Boundary program.
from azure.identity import DefaultAzureCredential
from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient
def audit_azure_openai_data_residency(
subscription_id: str,
resource_group: str,
account_name: str,
required_geography: str = "europe",
) -> dict:
"""Audit Azure OpenAI resource for data residency compliance."""
credential = DefaultAzureCredential()
client = CognitiveServicesManagementClient(credential, subscription_id)
findings = []
account = client.accounts.get(resource_group, account_name)
resource_location = account.location
# Map Azure regions to geographies
eu_regions = [
"westeurope", "northeurope", "francecentral", "francesouth",
"germanywestcentral", "germanynorth", "swedencentral",
"swedensouth", "switzerlandnorth", "switzerlandwest",
"norwayeast", "norwaywest", "uksouth", "ukwest",
]
if required_geography == "europe" and resource_location not in eu_regions:
findings.append({
"severity": "CRITICAL",
"finding": f"Resource deployed outside EU: {resource_location}",
"detail": "For GDPR compliance, Azure OpenAI resources processing "
"EU personal data must be in an EU region.",
})
# Check data storage and processing commitments
# Azure OpenAI with abuse monitoring opt-out
if account.properties.restrict_outbound_network_access:
findings.append({
"severity": "INFO",
"finding": "Outbound network access restricted",
"detail": "Reduces risk of data leaving the deployment region.",
})
# Check diagnostic settings for cross-region logging
# (Diagnostic settings might send data to Log Analytics in another region)
from azure.mgmt.monitor import MonitorManagementClient
monitor = MonitorManagementClient(credential, subscription_id)
resource_uri = account.id
for setting in monitor.diagnostic_settings.list(resource_uri):
if setting.workspace_id:
# Check if Log Analytics workspace is in the same region
workspace_parts = setting.workspace_id.split("/")
findings.append({
"severity": "MEDIUM",
"finding": "Diagnostic logs sent to Log Analytics workspace",
"detail": f"Verify workspace {setting.workspace_id} is in the same "
f"region as the OpenAI resource ({resource_location}). "
f"Cross-region diagnostic data transfer can violate residency.",
})
# List deployments and check model availability
deployments = client.deployments.list(resource_group, account_name)
for deployment in deployments:
findings.append({
"severity": "INFO",
"finding": f"Model deployment: {deployment.name}",
"detail": f"Model: {deployment.properties.model.name}, "
f"Version: {deployment.properties.model.version}. "
f"Data processed in {resource_location}.",
})
return {
"resource_location": resource_location,
"required_geography": required_geography,
"findings": findings,
}Azure EU Data Boundary
For organizations operating under EU data sovereignty requirements, Azure's EU Data Boundary commitment (generally available since January 2024) ensures that customer data for covered services stays within the EU. Azure OpenAI is included in the EU Data Boundary when:
- The resource is deployed in an EU region
- Abuse monitoring data processing stays within the EU (organizations can request modified abuse monitoring that keeps data in-region)
- Content filtering logs remain in the deployment region
GCP Data Residency for AI Services
Vertex AI Regional Configuration
GCP Vertex AI endpoints are regional resources. Data processed by Vertex AI stays in the specified region for inference. However, some features (such as model evaluation and certain pipeline operations) may process data in the US even when the endpoint is in another region.
from google.cloud import aiplatform
def audit_vertex_data_residency(
project_id: str,
required_region: str = "europe-west4",
) -> dict:
"""Audit Vertex AI resources for data residency compliance."""
findings = []
# Check all locations where Vertex AI resources exist
locations_to_check = [
"us-central1", "us-east1", "us-west1",
"europe-west1", "europe-west4", "europe-west2",
"asia-southeast1", "asia-northeast1",
]
for location in locations_to_check:
try:
aiplatform.init(project=project_id, location=location)
endpoints = aiplatform.Endpoint.list()
models = aiplatform.Model.list()
if endpoints or models:
in_required = location == required_region
severity = "INFO" if in_required else "HIGH"
findings.append({
"severity": severity,
"location": location,
"finding": f"Vertex AI resources found in {location}",
"detail": f"{len(endpoints)} endpoints, {len(models)} models. "
+ ("In required region." if in_required
else f"Outside required region ({required_region})."),
})
except Exception:
continue
# Check organization policy constraints for location restriction
try:
from google.cloud import orgpolicy_v2
orgpolicy_client = orgpolicy_v2.OrgPolicyClient()
policy = orgpolicy_client.get_policy(
name=f"projects/{project_id}/policies/gcp.resourceLocations"
)
if policy.spec and policy.spec.rules:
allowed_locations = []
for rule in policy.spec.rules:
if rule.values and rule.values.allowed_values:
allowed_locations.extend(rule.values.allowed_values)
findings.append({
"severity": "INFO",
"finding": "Organization policy restricts resource locations",
"detail": f"Allowed locations: {', '.join(allowed_locations)}",
})
else:
findings.append({
"severity": "HIGH",
"finding": "No location restriction organization policy",
"detail": "Without gcp.resourceLocations constraint, Vertex AI "
"resources can be created in any region.",
})
except Exception:
findings.append({
"severity": "MEDIUM",
"finding": "Cannot check organization policy for location restrictions",
})
return {"required_region": required_region, "findings": findings}GCP Organization Policy Constraints
Use GCP Organization Policy to enforce data residency at the organizational level:
def recommend_residency_org_policies(target_geography: str) -> dict:
"""Recommend organization policies for data residency enforcement."""
eu_locations = [
"in:eu-locations", # GCP's built-in EU location group
]
return {
"resource_location_constraint": {
"constraint": "constraints/gcp.resourceLocations",
"description": "Restrict where Vertex AI and other resources can be created",
"spec": {
"rules": [{
"values": {
"allowedValues": eu_locations if target_geography == "eu" else [],
}
}]
},
},
"vertex_ai_specific": {
"constraint": "constraints/aiplatform.allowedModels",
"description": "Restrict which models can be deployed (some models "
"may have different data processing locations)",
"note": "Not all models are available in all regions. Restricting "
"models to those available in your target region prevents "
"users from deploying models that require cross-region processing.",
},
"cloud_storage": {
"constraint": "constraints/storage.locations",
"description": "Restrict Cloud Storage bucket locations for training data",
"spec": {
"rules": [{
"values": {
"allowedValues": eu_locations if target_geography == "eu" else [],
}
}]
},
},
}Compliance Assessment Framework
Data Flow Mapping Checklist
For each AI workload, map and document these data flows:
- Prompt data path: Application -> API gateway -> AI service endpoint -> model inference -> response
- Logging data path: AI service -> logging service -> storage (S3/GCS/Azure Storage)
- Content filtering data path: AI service -> content safety service -> filter logs
- Training data path: Storage -> AI service -> model artifacts -> artifact storage
- Monitoring data path: AI service -> telemetry service -> monitoring dashboard
- Backup/DR data path: Primary region -> replication target
For each path, document:
- Source and destination regions
- Whether personal data is present
- Whether cross-border transfer occurs
- What legal basis covers the transfer (if cross-border)
- What technical controls enforce the documented path
Regulatory Mapping
| Regulation | Applies When | Key Requirement | Cloud AI Implication |
|---|---|---|---|
| GDPR Art. 44-49 | Processing EU personal data | Adequate transfer mechanisms for cross-border | AI endpoints must be in EU or covered by adequacy decision |
| EU AI Act Art. 10 | Training high-risk AI systems | Data governance including geographic considerations | Training data location must be documented and controlled |
| CCPA/CPRA | Processing California consumer data | Disclosure of cross-border transfers | AI service data processing locations must be disclosed |
| China PIPL | Processing Chinese personal data | Data localization for certain categories | AI processing must remain in mainland China for covered data |
| India DPDP Act | Processing Indian personal data | Government-notified transfer restrictions | May require in-country AI processing for certain data |
References
- European Commission, "Adequacy decisions," https://commission.europa.eu/law/law-topic/data-protection/international-dimension-data-protection/adequacy-decisions_en
- European Parliament, "Regulation (EU) 2024/1689 (AI Act)," https://eur-lex.europa.eu/eli/reg/2024/1689
- NIST, "AI Risk Management Framework (AI RMF 1.0)," January 2023, https://www.nist.gov/itl/ai-risk-management-framework
- Microsoft, "Azure EU Data Boundary," https://learn.microsoft.com/en-us/privacy/eudb/eu-data-boundary-learn
- AWS, "AWS Data Residency," https://aws.amazon.com/compliance/data-residency/
An organization deploys Azure OpenAI in the westeurope region for GDPR compliance. What additional data flow could create a cross-border transfer?
What is the most effective technical control for preventing AI resources from being created outside approved regions on GCP?