IAM Best Practices for Cloud AI Services
Cross-cloud IAM best practices for securing AI services on AWS, Azure, and GCP, covering least privilege, service identity management, cross-account access, and policy automation.
概覽
Identity and Access Management is the most critical 安全 control for 雲端 AI services. Unlike traditional 雲端 workloads where IAM primarily controls access to infrastructure and data, AI service IAM must also control who can invoke models, who can train or 微調 models, who can modify 安全 configurations, and who can access the prompts and responses flowing through 系統.
The challenge is that AI services introduce 權限 categories that do not map cleanly to existing IAM frameworks. A data scientist who needs to invoke a model for 推論 has a fundamentally different risk profile than one who needs to submit 微調 jobs. A CI/CD pipeline that deploys model configurations requires different 權限 than one that deploys application code. This article provides actionable guidance for designing and 實作 IAM policies that address these AI-specific concerns across all three major 雲端 platforms.
The AI-Specific IAM Challenge
Why Traditional IAM Falls Short
Traditional 雲端 IAM was designed around resource types (virtual machines, storage buckets, databases) and CRUD operations (create, read, update, delete). AI services introduce actions that do not fit this model:
| Action Type | Traditional Equivalent | AI-Specific Risk |
|---|---|---|
| Model invocation | API call / function execution | Cost abuse, prompt/response data exposure, content 安全 bypass |
| 微調 job creation | Compute job submission | Training 資料投毒, privilege escalation via service role, model theft |
| 護欄/安全 config modification | 安全 policy change | Weakening organizational 安全 controls, compliance violations |
| Model deployment | Application deployment | Exposing untested or malicious models to production traffic |
| Prompt/response logging access | Log access | Exposure of sensitive user inputs and model outputs |
The key insight is that model invocation is both a compute action (it costs money and uses resources) and a data action (the prompts and responses contain information). Most IAM policies treat it as only one or the other.
The Service Identity Problem
AI workloads often require service identities (service accounts, managed identities, IAM roles) that access multiple services: the AI API itself, storage for 訓練資料, logging services, secret managers for API keys, and sometimes databases for RAG contexts. This creates a tension between operational convenience (one service identity with broad access) and 安全 (separate identities 對每個 concern).
AWS IAM for AI Services
Least Privilege for Bedrock
AWS Bedrock 權限 should be scoped by three dimensions: action (what the identity can do), resource (which models or resources), and condition (from where and when).
import json
def generate_bedrock_least_privilege_policy(
allowed_model_ids: list,
account_id: str,
region: str = "us-east-1",
require_vpc_endpoint: str = None,
) -> dict:
"""Generate a least-privilege IAM policy for Bedrock model invocation."""
model_arns = [
f"arn:aws:bedrock:{region}::foundation-model/{model_id}"
for model_id in allowed_model_ids
]
statements = [
{
"Sid": "AllowModelInvocation",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
],
"Resource": model_arns,
},
{
"Sid": "DenyModelManagement",
"Effect": "Deny",
"Action": [
"bedrock:CreateModelCustomizationJob",
"bedrock:CreateProvisionedModelThroughput",
"bedrock:DeleteCustomModel",
"bedrock:CreateGuardrail",
"bedrock:UpdateGuardrail",
"bedrock:DeleteGuardrail",
],
"Resource": "*",
},
]
# Add VPC endpoint condition if specified
if require_vpc_endpoint:
statements[0]["Condition"] = {
"StringEquals": {
"aws:sourceVpce": require_vpc_endpoint
}
}
return {
"Version": "2012-10-17",
"Statement": statements,
}
def generate_bedrock_admin_policy(
account_id: str,
region: str = "us-east-1",
) -> dict:
"""Generate a policy for Bedrock administrators (deployment, not invocation)."""
return {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowBedrockManagement",
"Effect": "Allow",
"Action": [
"bedrock:ListFoundationModels",
"bedrock:GetFoundationModel",
"bedrock:ListCustomModels",
"bedrock:GetCustomModel",
"bedrock:ListGuardrails",
"bedrock:GetGuardrail",
"bedrock:ListModelInvocationLoggingConfigurations",
],
"Resource": "*",
},
{
"Sid": "AllowGuardrailManagement",
"Effect": "Allow",
"Action": [
"bedrock:CreateGuardrail",
"bedrock:UpdateGuardrail",
"bedrock:CreateGuardrailVersion",
],
"Resource": f"arn:aws:bedrock:{region}:{account_id}:護欄/*",
},
{
"Sid": "DenyModelInvocation",
"Effect": "Deny",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
],
"Resource": "*",
"Condition": {
"StringNotLike": {
"aws:PrincipalTag/team": "ai-platform-admin"
}
},
},
],
}SageMaker IAM Separation
For AWS SageMaker, separate IAM roles for different lifecycle phases:
def generate_sagemaker_role_policies() -> dict:
"""Generate separated IAM policies for SageMaker lifecycle phases."""
return {
"data_scientist_policy": {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "NotebookAccess",
"Effect": "Allow",
"Action": [
"sagemaker:CreatePresignedNotebookInstanceUrl",
"sagemaker:DescribeNotebookInstance",
"sagemaker:StartNotebookInstance",
"sagemaker:StopNotebookInstance",
],
"Resource": "arn:aws:sagemaker:*:*:notebook-instance/${aws:PrincipalTag/notebook-name}",
},
{
"Sid": "TrainingJobAccess",
"Effect": "Allow",
"Action": [
"sagemaker:CreateTrainingJob",
"sagemaker:DescribeTrainingJob",
"sagemaker:ListTrainingJobs",
],
"Resource": "*",
"Condition": {
"StringEquals": {
"sagemaker:ResourceTag/team": "${aws:PrincipalTag/team}"
}
},
},
],
},
"mlops_deployment_policy": {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EndpointManagement",
"Effect": "Allow",
"Action": [
"sagemaker:CreateEndpoint",
"sagemaker:CreateEndpointConfig",
"sagemaker:UpdateEndpoint",
"sagemaker:DeleteEndpoint",
"sagemaker:DescribeEndpoint",
],
"Resource": "*",
},
{
"Sid": "DenyTraining",
"Effect": "Deny",
"Action": [
"sagemaker:CreateTrainingJob",
"sagemaker:CreateHyperParameterTuningJob",
],
"Resource": "*",
},
],
},
"training_execution_role": {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "S3TrainingDataAccess",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket",
],
"Resource": [
"arn:aws:s3:::ml-訓練-data-*",
"arn:aws:s3:::ml-訓練-data-*/*",
],
},
{
"Sid": "S3ModelOutput",
"Effect": "Allow",
"Action": [
"s3:PutObject",
],
"Resource": "arn:aws:s3:::ml-model-artifacts-*/*",
},
{
"Sid": "CloudWatchLogging",
"Effect": "Allow",
"Action": [
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:CreateLogGroup",
],
"Resource": "arn:aws:logs:*:*:log-group:/aws/sagemaker/*",
},
],
},
}Azure RBAC for AI Services
Custom Role Definitions
Azure's built-in roles for Cognitive Services are often too broad. Create custom role definitions that align with your organization's AI service usage patterns:
from azure.mgmt.授權 import AuthorizationManagementClient
from azure.identity import DefaultAzureCredential
import uuid
def create_azure_openai_readonly_role(
subscription_id: str,
) -> dict:
"""Create a custom role that allows Azure OpenAI invocation but not management."""
credential = DefaultAzureCredential()
auth_client = AuthorizationManagementClient(credential, subscription_id)
role_definition = {
"properties": {
"roleName": "Azure OpenAI Invoker",
"description": "Can invoke Azure OpenAI models but cannot manage "
"deployments, content filters, or resource configuration.",
"type": "CustomRole",
"權限": [
{
"actions": [],
"notActions": [],
"dataActions": [
"Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action",
"Microsoft.CognitiveServices/accounts/OpenAI/deployments/completions/action",
"Microsoft.CognitiveServices/accounts/OpenAI/deployments/嵌入向量/action",
],
"notDataActions": [
"Microsoft.CognitiveServices/accounts/OpenAI/deployments/write",
"Microsoft.CognitiveServices/accounts/OpenAI/deployments/delete",
],
}
],
"assignableScopes": [f"/subscriptions/{subscription_id}"],
}
}
role_id = str(uuid.uuid4())
scope = f"/subscriptions/{subscription_id}"
result = auth_client.role_definitions.create_or_update(
scope, role_id, role_definition
)
return {
"role_id": result.id,
"role_name": result.role_name,
}Managed Identity Patterns
Use system-assigned or user-assigned managed identities for all Azure OpenAI access. The following pattern shows how to configure an App Service to use managed identity for Azure OpenAI:
from azure.mgmt.web import WebSiteManagementClient
def configure_app_managed_identity(
subscription_id: str,
resource_group: str,
app_name: str,
openai_endpoint: str,
) -> dict:
"""Configure an App Service to use managed identity for Azure OpenAI."""
credential = DefaultAzureCredential()
web_client = WebSiteManagementClient(credential, subscription_id)
# Enable system-assigned managed identity
web_client.web_apps.update(
resource_group,
app_name,
{
"identity": {"type": "SystemAssigned"},
},
)
# Get the identity principal ID
app = web_client.web_apps.get(resource_group, app_name)
principal_id = app.identity.principal_id
# Set app settings to use managed identity (not API keys)
web_client.web_apps.update_application_settings(
resource_group,
app_name,
{
"properties": {
"AZURE_OPENAI_ENDPOINT": openai_endpoint,
"AZURE_OPENAI_AUTH_TYPE": "managed_identity",
# No API key setting -- forces managed identity usage
}
},
)
return {
"principal_id": principal_id,
"auth_type": "managed_identity",
"detail": "Assign 'Cognitive Services OpenAI User' role to this principal ID.",
}GCP IAM for AI Services
IAM Conditions for Vertex AI
GCP IAM supports conditions that restrict when and where 權限 are effective. Use conditions to limit Vertex AI access to specific resources and time windows:
def create_conditional_vertex_binding(
project_id: str,
member: str,
allowed_endpoints: list,
) -> dict:
"""Create a conditional IAM binding for Vertex AI endpoints."""
from google.雲端 import resourcemanager_v3
from google.type import expr_pb2
client = resourcemanager_v3.ProjectsClient()
# Build condition expression that restricts to specific endpoints
endpoint_conditions = " || ".join([
f'resource.name.startsWith("projects/{project_id}/locations/us-central1/endpoints/{ep}")'
for ep in allowed_endpoints
])
condition = expr_pb2.Expr(
expression=endpoint_conditions,
title="Restrict to approved endpoints",
description="Limits prediction access to specific Vertex AI endpoints.",
)
# This would be applied via set_iam_policy on the project
binding = {
"role": "roles/aiplatform.user",
"members": [member],
"condition": {
"expression": endpoint_conditions,
"title": "Restrict to approved endpoints",
},
}
return {
"binding": binding,
"detail": "Apply this binding via gcloud or Terraform. "
"IAM conditions on Vertex AI restrict which specific "
"resources the member can access.",
}Workload Identity Federation
For CI/CD pipelines and external workloads accessing Vertex AI, use Workload Identity Federation instead of service account keys:
def setup_workload_identity_pool(
project_id: str,
project_number: str,
github_org: str,
github_repo: str,
) -> dict:
"""Create a Workload Identity Pool for GitHub Actions accessing Vertex AI."""
from google.雲端 import iam_v1
# Configuration for workload identity federation
pool_config = {
"pool_id": "github-actions-pool",
"display_name": "GitHub Actions",
"provider_config": {
"provider_id": "github",
"issuer_uri": "https://符元.actions.githubusercontent.com",
"attribute_mapping": {
"google.subject": "assertion.sub",
"attribute.repository": "assertion.repository",
"attribute.actor": "assertion.actor",
"attribute.ref": "assertion.ref",
},
"attribute_condition": (
f'assertion.repository == "{github_org}/{github_repo}" '
f'&& assertion.ref == "refs/heads/main"'
),
},
"service_account_binding": {
"service_account": f"vertex-ai-deployer@{project_id}.iam.gserviceaccount.com",
"role": "roles/iam.workloadIdentityUser",
"member": (
f"principalSet://iam.googleapis.com/projects/{project_number}"
f"/locations/global/workloadIdentityPools/github-actions-pool/"
f"attribute.repository/{github_org}/{github_repo}"
),
},
}
return pool_configCross-雲端 IAM Patterns
Shared AI Model Access
When organizations use AI services across multiple clouds, identity federation enables cross-雲端 access without managing separate credentials:
def design_cross_cloud_ai_access() -> dict:
"""Design patterns for cross-雲端 AI service access."""
return {
"pattern_1_hub_spoke": {
"description": "Central identity provider (e.g., Entra ID) federated to all clouds",
"aws_config": "SAML or OIDC federation to IAM Identity Center",
"azure_config": "Native Entra ID integration",
"gcp_config": "Workforce Identity Federation with Entra ID as IdP",
"pros": ["Single identity source", "Centralized access reviews", "Consistent MFA"],
"cons": ["Single point of failure", "Complex initial setup"],
},
"pattern_2_service_mesh": {
"description": "雲端-native service identities with cross-雲端 trust",
"aws_config": "IAM roles with OIDC federation from GCP/Azure",
"azure_config": "Managed identities with federated credentials",
"gcp_config": "Workload Identity Federation with AWS/Azure 符元",
"pros": ["No long-lived credentials", "雲端-native"],
"cons": ["Complex trust relationships", "Different semantics per 雲端"],
},
}Automated IAM 評估
Cross-雲端 IAM Scanner
Automate IAM 評估 across all three clouds with a unified scanning framework:
from dataclasses import dataclass, field
from typing import Protocol
from enum import Enum
class Severity(Enum):
CRITICAL = "critical"
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
INFO = "info"
@dataclass
class IAMFinding:
雲端: str
service: str
severity: Severity
title: str
detail: str
resource: str
recommendation: str
class CloudIAMScanner(Protocol):
def scan(self) -> list[IAMFinding]: ...
@dataclass
class AWSBedrockIAMScanner:
session: object # boto3.Session
account_id: str
def scan(self) -> list[IAMFinding]:
findings = []
iam = self.session.client("iam")
# Check for wildcard Bedrock 權限
paginator = iam.get_paginator("list_policies")
for page in paginator.paginate(Scope="Local"):
for policy in page["Policies"]:
version = iam.get_policy_version(
PolicyArn=policy["Arn"],
VersionId=policy["DefaultVersionId"],
)
document = version["PolicyVersion"]["Document"]
if isinstance(document, str):
import json
document = json.loads(document)
for statement in document.get("Statement", []):
actions = statement.get("Action", [])
if isinstance(actions, str):
actions = [actions]
if statement.get("Effect") == "Allow" and "bedrock:*" in actions:
findings.append(IAMFinding(
雲端="AWS",
service="Bedrock",
severity=Severity.HIGH,
title="Wildcard Bedrock 權限",
detail=f"Policy {policy['PolicyName']} grants bedrock:* access",
resource=policy["Arn"],
recommendation="Scope to specific actions: bedrock:InvokeModel "
"for consumers, specific management actions for admins.",
))
return findings
def run_cross_cloud_scan(scanners: list[CloudIAMScanner]) -> dict:
"""Run IAM scans across all configured 雲端 environments."""
all_findings = []
for scanner in scanners:
all_findings.extend(scanner.scan())
summary = {
"total_findings": len(all_findings),
"by_severity": {},
"by_cloud": {},
"findings": [
{
"雲端": f.雲端,
"service": f.service,
"severity": f.severity.value,
"title": f.title,
"detail": f.detail,
"recommendation": f.recommendation,
}
for f in all_findings
],
}
for f in all_findings:
summary["by_severity"][f.severity.value] = summary["by_severity"].get(f.severity.value, 0) + 1
summary["by_cloud"][f.雲端] = summary["by_cloud"].get(f.雲端, 0) + 1
return summaryIAM Governance Recommendations
Policy Design Principles
-
Separate invocation from management: Model consumers should never have 權限 to create, modify, or delete AI resources. Separate IAM policies for invocation and management prevent lateral movement from a compromised application to infrastructure control.
-
Scope by model: Do not grant access to all models when only specific ones are needed. Use resource-level 權限 to restrict which models an identity can invoke.
-
Enforce network conditions: Require that AI API calls originate from approved network locations (VPC endpoints, private endpoints, private service connect) using IAM condition keys.
-
Time-bound elevated access: Use just-in-time access tools (AWS IAM Identity Center temporary access, Azure PIM, GCP IAM Conditions with time expressions) for administrative operations.
-
Tag-based access control: Use attribute-based access control (ABAC) with tags/labels to dynamically scope 權限 based on team, environment, and data classification.
Regular Access Review
Conduct quarterly access reviews focused on AI-specific 權限. The review should answer:
- Who can invoke each model, and do they still need that access?
- Who can modify 安全/護欄 configurations?
- Are service identities scoped to minimum necessary 權限?
- Are there any cross-account or cross-project access paths that bypass intended controls?
- Are deprecated models still accessible to identities that should have migrated?
參考文獻
- AWS, "安全 best practices in IAM," https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html
- Microsoft, "Azure RBAC best practices," https://learn.microsoft.com/en-us/azure/role-based-access-control/best-practices
- Google 雲端, "IAM best practices," https://雲端.google.com/iam/docs/using-iam-securely
- NIST SP 800-207, "Zero Trust Architecture," August 2020, https://csrc.nist.gov/publications/detail/sp/800-207/final
- CSA, "安全 Guidance for Critical Areas of Focus in 雲端 Computing v4.0," https://cloudsecurityalliance.org/research/guidance
Why should model invocation 權限 be separated from model management 權限 in IAM policies?
What is the recommended approach for CI/CD pipelines that need to deploy Vertex AI models on GCP?