IAM Best Practices for Cloud AI Services

Intermediate12 min readUpdated 2026-03-20

Cross-cloud IAM best practices for securing AI services on AWS, Azure, and GCP, covering least privilege, service identity management, cross-account access, and policy automation.

cloud-ai-security iam aws azure gcp least-privilege

概覽

Identity and Access Management is the most critical 安全 control for 雲端 AI services. Unlike traditional 雲端 workloads where IAM primarily controls access to infrastructure and data, AI service IAM must also control who can invoke models, who can train or 微調 models, who can modify 安全 configurations, and who can access the prompts and responses flowing through 系統.

The challenge is that AI services introduce 權限 categories that do not map cleanly to existing IAM frameworks. A data scientist who needs to invoke a model for 推論 has a fundamentally different risk profile than one who needs to submit 微調 jobs. A CI/CD pipeline that deploys model configurations requires different 權限 than one that deploys application code. This article provides actionable guidance for designing and 實作 IAM policies that address these AI-specific concerns across all three major 雲端 platforms.

The AI-Specific IAM Challenge

Why Traditional IAM Falls Short

Traditional 雲端 IAM was designed around resource types (virtual machines, storage buckets, databases) and CRUD operations (create, read, update, delete). AI services introduce actions that do not fit this model:

Action Type	Traditional Equivalent	AI-Specific Risk
Model invocation	API call / function execution	Cost abuse, prompt/response data exposure, content 安全 bypass
微調 job creation	Compute job submission	Training 資料投毒, privilege escalation via service role, model theft
護欄/安全 config modification	安全 policy change	Weakening organizational 安全 controls, compliance violations
Model deployment	Application deployment	Exposing untested or malicious models to production traffic
Prompt/response logging access	Log access	Exposure of sensitive user inputs and model outputs

The key insight is that model invocation is both a compute action (it costs money and uses resources) and a data action (the prompts and responses contain information). Most IAM policies treat it as only one or the other.

The Service Identity Problem

AI workloads often require service identities (service accounts, managed identities, IAM roles) that access multiple services: the AI API itself, storage for 訓練資料, logging services, secret managers for API keys, and sometimes databases for RAG contexts. This creates a tension between operational convenience (one service identity with broad access) and 安全 (separate identities 對每個 concern).

AWS IAM for AI Services

Least Privilege for Bedrock

AWS Bedrock 權限 should be scoped by three dimensions: action (what the identity can do), resource (which models or resources), and condition (from where and when).

import json
 
def generate_bedrock_least_privilege_policy(
    allowed_model_ids: list,
    account_id: str,
    region: str = "us-east-1",
    require_vpc_endpoint: str = None,
) -> dict:
    """Generate a least-privilege IAM policy for Bedrock model invocation."""
    model_arns = [
        f"arn:aws:bedrock:{region}::foundation-model/{model_id}"
        for model_id in allowed_model_ids
    ]
 
    statements = [
        {
            "Sid": "AllowModelInvocation",
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream",
            ],
            "Resource": model_arns,
        },
        {
            "Sid": "DenyModelManagement",
            "Effect": "Deny",
            "Action": [
                "bedrock:CreateModelCustomizationJob",
                "bedrock:CreateProvisionedModelThroughput",
                "bedrock:DeleteCustomModel",
                "bedrock:CreateGuardrail",
                "bedrock:UpdateGuardrail",
                "bedrock:DeleteGuardrail",
            ],
            "Resource": "*",
        },
    ]
 
    # Add VPC endpoint condition if specified
    if require_vpc_endpoint:
        statements[0]["Condition"] = {
            "StringEquals": {
                "aws:sourceVpce": require_vpc_endpoint
            }
        }
 
    return {
        "Version": "2012-10-17",
        "Statement": statements,
    }
 
 
def generate_bedrock_admin_policy(
    account_id: str,
    region: str = "us-east-1",
) -> dict:
    """Generate a policy for Bedrock administrators (deployment, not invocation)."""
    return {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowBedrockManagement",
                "Effect": "Allow",
                "Action": [
                    "bedrock:ListFoundationModels",
                    "bedrock:GetFoundationModel",
                    "bedrock:ListCustomModels",
                    "bedrock:GetCustomModel",
                    "bedrock:ListGuardrails",
                    "bedrock:GetGuardrail",
                    "bedrock:ListModelInvocationLoggingConfigurations",
                ],
                "Resource": "*",
            },
            {
                "Sid": "AllowGuardrailManagement",
                "Effect": "Allow",
                "Action": [
                    "bedrock:CreateGuardrail",
                    "bedrock:UpdateGuardrail",
                    "bedrock:CreateGuardrailVersion",
                ],
                "Resource": f"arn:aws:bedrock:{region}:{account_id}:護欄/*",
            },
            {
                "Sid": "DenyModelInvocation",
                "Effect": "Deny",
                "Action": [
                    "bedrock:InvokeModel",
                    "bedrock:InvokeModelWithResponseStream",
                ],
                "Resource": "*",
                "Condition": {
                    "StringNotLike": {
                        "aws:PrincipalTag/team": "ai-platform-admin"
                    }
                },
            },
        ],
    }

SageMaker IAM Separation

For AWS SageMaker, separate IAM roles for different lifecycle phases:

def generate_sagemaker_role_policies() -> dict:
    """Generate separated IAM policies for SageMaker lifecycle phases."""
    return {
        "data_scientist_policy": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "NotebookAccess",
                    "Effect": "Allow",
                    "Action": [
                        "sagemaker:CreatePresignedNotebookInstanceUrl",
                        "sagemaker:DescribeNotebookInstance",
                        "sagemaker:StartNotebookInstance",
                        "sagemaker:StopNotebookInstance",
                    ],
                    "Resource": "arn:aws:sagemaker:*:*:notebook-instance/${aws:PrincipalTag/notebook-name}",
                },
                {
                    "Sid": "TrainingJobAccess",
                    "Effect": "Allow",
                    "Action": [
                        "sagemaker:CreateTrainingJob",
                        "sagemaker:DescribeTrainingJob",
                        "sagemaker:ListTrainingJobs",
                    ],
                    "Resource": "*",
                    "Condition": {
                        "StringEquals": {
                            "sagemaker:ResourceTag/team": "${aws:PrincipalTag/team}"
                        }
                    },
                },
            ],
        },
        "mlops_deployment_policy": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "EndpointManagement",
                    "Effect": "Allow",
                    "Action": [
                        "sagemaker:CreateEndpoint",
                        "sagemaker:CreateEndpointConfig",
                        "sagemaker:UpdateEndpoint",
                        "sagemaker:DeleteEndpoint",
                        "sagemaker:DescribeEndpoint",
                    ],
                    "Resource": "*",
                },
                {
                    "Sid": "DenyTraining",
                    "Effect": "Deny",
                    "Action": [
                        "sagemaker:CreateTrainingJob",
                        "sagemaker:CreateHyperParameterTuningJob",
                    ],
                    "Resource": "*",
                },
            ],
        },
        "training_execution_role": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "S3TrainingDataAccess",
                    "Effect": "Allow",
                    "Action": [
                        "s3:GetObject",
                        "s3:ListBucket",
                    ],
                    "Resource": [
                        "arn:aws:s3:::ml-訓練-data-*",
                        "arn:aws:s3:::ml-訓練-data-*/*",
                    ],
                },
                {
                    "Sid": "S3ModelOutput",
                    "Effect": "Allow",
                    "Action": [
                        "s3:PutObject",
                    ],
                    "Resource": "arn:aws:s3:::ml-model-artifacts-*/*",
                },
                {
                    "Sid": "CloudWatchLogging",
                    "Effect": "Allow",
                    "Action": [
                        "logs:CreateLogStream",
                        "logs:PutLogEvents",
                        "logs:CreateLogGroup",
                    ],
                    "Resource": "arn:aws:logs:*:*:log-group:/aws/sagemaker/*",
                },
            ],
        },
    }

Azure RBAC for AI Services

Custom Role Definitions

Azure's built-in roles for Cognitive Services are often too broad. Create custom role definitions that align with your organization's AI service usage patterns:

from azure.mgmt.授權 import AuthorizationManagementClient
from azure.identity import DefaultAzureCredential
import uuid
 
def create_azure_openai_readonly_role(
    subscription_id: str,
) -> dict:
    """Create a custom role that allows Azure OpenAI invocation but not management."""
    credential = DefaultAzureCredential()
    auth_client = AuthorizationManagementClient(credential, subscription_id)
 
    role_definition = {
        "properties": {
            "roleName": "Azure OpenAI Invoker",
            "description": "Can invoke Azure OpenAI models but cannot manage "
                           "deployments, content filters, or resource configuration.",
            "type": "CustomRole",
            "權限": [
                {
                    "actions": [],
                    "notActions": [],
                    "dataActions": [
                        "Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action",
                        "Microsoft.CognitiveServices/accounts/OpenAI/deployments/completions/action",
                        "Microsoft.CognitiveServices/accounts/OpenAI/deployments/嵌入向量/action",
                    ],
                    "notDataActions": [
                        "Microsoft.CognitiveServices/accounts/OpenAI/deployments/write",
                        "Microsoft.CognitiveServices/accounts/OpenAI/deployments/delete",
                    ],
                }
            ],
            "assignableScopes": [f"/subscriptions/{subscription_id}"],
        }
    }
 
    role_id = str(uuid.uuid4())
    scope = f"/subscriptions/{subscription_id}"
 
    result = auth_client.role_definitions.create_or_update(
        scope, role_id, role_definition
    )
 
    return {
        "role_id": result.id,
        "role_name": result.role_name,
    }

Managed Identity Patterns

Use system-assigned or user-assigned managed identities for all Azure OpenAI access. The following pattern shows how to configure an App Service to use managed identity for Azure OpenAI:

from azure.mgmt.web import WebSiteManagementClient
 
def configure_app_managed_identity(
    subscription_id: str,
    resource_group: str,
    app_name: str,
    openai_endpoint: str,
) -> dict:
    """Configure an App Service to use managed identity for Azure OpenAI."""
    credential = DefaultAzureCredential()
    web_client = WebSiteManagementClient(credential, subscription_id)
 
    # Enable system-assigned managed identity
    web_client.web_apps.update(
        resource_group,
        app_name,
        {
            "identity": {"type": "SystemAssigned"},
        },
    )
 
    # Get the identity principal ID
    app = web_client.web_apps.get(resource_group, app_name)
    principal_id = app.identity.principal_id
 
    # Set app settings to use managed identity (not API keys)
    web_client.web_apps.update_application_settings(
        resource_group,
        app_name,
        {
            "properties": {
                "AZURE_OPENAI_ENDPOINT": openai_endpoint,
                "AZURE_OPENAI_AUTH_TYPE": "managed_identity",
                # No API key setting -- forces managed identity usage
            }
        },
    )
 
    return {
        "principal_id": principal_id,
        "auth_type": "managed_identity",
        "detail": "Assign 'Cognitive Services OpenAI User' role to this principal ID.",
    }

GCP IAM for AI Services

IAM Conditions for Vertex AI

GCP IAM supports conditions that restrict when and where 權限 are effective. Use conditions to limit Vertex AI access to specific resources and time windows:

def create_conditional_vertex_binding(
    project_id: str,
    member: str,
    allowed_endpoints: list,
) -> dict:
    """Create a conditional IAM binding for Vertex AI endpoints."""
    from google.雲端 import resourcemanager_v3
    from google.type import expr_pb2
 
    client = resourcemanager_v3.ProjectsClient()
 
    # Build condition expression that restricts to specific endpoints
    endpoint_conditions = " || ".join([
        f'resource.name.startsWith("projects/{project_id}/locations/us-central1/endpoints/{ep}")'
        for ep in allowed_endpoints
    ])
 
    condition = expr_pb2.Expr(
        expression=endpoint_conditions,
        title="Restrict to approved endpoints",
        description="Limits prediction access to specific Vertex AI endpoints.",
    )
 
    # This would be applied via set_iam_policy on the project
    binding = {
        "role": "roles/aiplatform.user",
        "members": [member],
        "condition": {
            "expression": endpoint_conditions,
            "title": "Restrict to approved endpoints",
        },
    }
 
    return {
        "binding": binding,
        "detail": "Apply this binding via gcloud or Terraform. "
                  "IAM conditions on Vertex AI restrict which specific "
                  "resources the member can access.",
    }

Workload Identity Federation

For CI/CD pipelines and external workloads accessing Vertex AI, use Workload Identity Federation instead of service account keys:

def setup_workload_identity_pool(
    project_id: str,
    project_number: str,
    github_org: str,
    github_repo: str,
) -> dict:
    """Create a Workload Identity Pool for GitHub Actions accessing Vertex AI."""
    from google.雲端 import iam_v1
 
    # Configuration for workload identity federation
    pool_config = {
        "pool_id": "github-actions-pool",
        "display_name": "GitHub Actions",
        "provider_config": {
            "provider_id": "github",
            "issuer_uri": "https://符元.actions.githubusercontent.com",
            "attribute_mapping": {
                "google.subject": "assertion.sub",
                "attribute.repository": "assertion.repository",
                "attribute.actor": "assertion.actor",
                "attribute.ref": "assertion.ref",
            },
            "attribute_condition": (
                f'assertion.repository == "{github_org}/{github_repo}" '
                f'&& assertion.ref == "refs/heads/main"'
            ),
        },
        "service_account_binding": {
            "service_account": f"vertex-ai-deployer@{project_id}.iam.gserviceaccount.com",
            "role": "roles/iam.workloadIdentityUser",
            "member": (
                f"principalSet://iam.googleapis.com/projects/{project_number}"
                f"/locations/global/workloadIdentityPools/github-actions-pool/"
                f"attribute.repository/{github_org}/{github_repo}"
            ),
        },
    }
 
    return pool_config

Cross-雲端 IAM Patterns

Shared AI Model Access

When organizations use AI services across multiple clouds, identity federation enables cross-雲端 access without managing separate credentials:

def design_cross_cloud_ai_access() -> dict:
    """Design patterns for cross-雲端 AI service access."""
    return {
        "pattern_1_hub_spoke": {
            "description": "Central identity provider (e.g., Entra ID) federated to all clouds",
            "aws_config": "SAML or OIDC federation to IAM Identity Center",
            "azure_config": "Native Entra ID integration",
            "gcp_config": "Workforce Identity Federation with Entra ID as IdP",
            "pros": ["Single identity source", "Centralized access reviews", "Consistent MFA"],
            "cons": ["Single point of failure", "Complex initial setup"],
        },
        "pattern_2_service_mesh": {
            "description": "雲端-native service identities with cross-雲端 trust",
            "aws_config": "IAM roles with OIDC federation from GCP/Azure",
            "azure_config": "Managed identities with federated credentials",
            "gcp_config": "Workload Identity Federation with AWS/Azure 符元",
            "pros": ["No long-lived credentials", "雲端-native"],
            "cons": ["Complex trust relationships", "Different semantics per 雲端"],
        },
    }

Automated IAM 評估

Cross-雲端 IAM Scanner

Automate IAM 評估 across all three clouds with a unified scanning framework:

from dataclasses import dataclass, field
from typing import Protocol
from enum import Enum
 
class Severity(Enum):
    CRITICAL = "critical"
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"
    INFO = "info"
 
@dataclass
class IAMFinding:
    雲端: str
    service: str
    severity: Severity
    title: str
    detail: str
    resource: str
    recommendation: str
 
class CloudIAMScanner(Protocol):
    def scan(self) -> list[IAMFinding]: ...
 
@dataclass
class AWSBedrockIAMScanner:
    session: object  # boto3.Session
    account_id: str
 
    def scan(self) -> list[IAMFinding]:
        findings = []
        iam = self.session.client("iam")
 
        # Check for wildcard Bedrock 權限
        paginator = iam.get_paginator("list_policies")
        for page in paginator.paginate(Scope="Local"):
            for policy in page["Policies"]:
                version = iam.get_policy_version(
                    PolicyArn=policy["Arn"],
                    VersionId=policy["DefaultVersionId"],
                )
                document = version["PolicyVersion"]["Document"]
                if isinstance(document, str):
                    import json
                    document = json.loads(document)
 
                for statement in document.get("Statement", []):
                    actions = statement.get("Action", [])
                    if isinstance(actions, str):
                        actions = [actions]
 
                    if statement.get("Effect") == "Allow" and "bedrock:*" in actions:
                        findings.append(IAMFinding(
                            雲端="AWS",
                            service="Bedrock",
                            severity=Severity.HIGH,
                            title="Wildcard Bedrock 權限",
                            detail=f"Policy {policy['PolicyName']} grants bedrock:* access",
                            resource=policy["Arn"],
                            recommendation="Scope to specific actions: bedrock:InvokeModel "
                                           "for consumers, specific management actions for admins.",
                        ))
 
        return findings
 
 
def run_cross_cloud_scan(scanners: list[CloudIAMScanner]) -> dict:
    """Run IAM scans across all configured 雲端 environments."""
    all_findings = []
    for scanner in scanners:
        all_findings.extend(scanner.scan())
 
    summary = {
        "total_findings": len(all_findings),
        "by_severity": {},
        "by_cloud": {},
        "findings": [
            {
                "雲端": f.雲端,
                "service": f.service,
                "severity": f.severity.value,
                "title": f.title,
                "detail": f.detail,
                "recommendation": f.recommendation,
            }
            for f in all_findings
        ],
    }
 
    for f in all_findings:
        summary["by_severity"][f.severity.value] = summary["by_severity"].get(f.severity.value, 0) + 1
        summary["by_cloud"][f.雲端] = summary["by_cloud"].get(f.雲端, 0) + 1
 
    return summary

IAM Governance Recommendations

Policy Design Principles

Separate invocation from management: Model consumers should never have 權限 to create, modify, or delete AI resources. Separate IAM policies for invocation and management prevent lateral movement from a compromised application to infrastructure control.
Scope by model: Do not grant access to all models when only specific ones are needed. Use resource-level 權限 to restrict which models an identity can invoke.
Enforce network conditions: Require that AI API calls originate from approved network locations (VPC endpoints, private endpoints, private service connect) using IAM condition keys.
Time-bound elevated access: Use just-in-time access tools (AWS IAM Identity Center temporary access, Azure PIM, GCP IAM Conditions with time expressions) for administrative operations.
Tag-based access control: Use attribute-based access control (ABAC) with tags/labels to dynamically scope 權限 based on team, environment, and data classification.

Regular Access Review

Conduct quarterly access reviews focused on AI-specific 權限. The review should answer:

Who can invoke each model, and do they still need that access?
Who can modify 安全/護欄 configurations?
Are service identities scoped to minimum necessary 權限?
Are there any cross-account or cross-project access paths that bypass intended controls?
Are deprecated models still accessible to identities that should have migrated?

參考文獻

AWS, "安全 best practices in IAM," https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html
Microsoft, "Azure RBAC best practices," https://learn.microsoft.com/en-us/azure/role-based-access-control/best-practices
Google 雲端, "IAM best practices," https://雲端.google.com/iam/docs/using-iam-securely
NIST SP 800-207, "Zero Trust Architecture," August 2020, https://csrc.nist.gov/publications/detail/sp/800-207/final
CSA, "安全 Guidance for Critical Areas of Focus in 雲端 Computing v4.0," https://cloudsecurityalliance.org/research/guidance

Knowledge Check

Why should model invocation 權限 be separated from model management 權限 in IAM policies?

Knowledge Check

What is the recommended approach for CI/CD pipelines that need to deploy Vertex AI models on GCP?

IAM Best Practices for Cloud AI Services

Intermediate12 min readUpdated 2026-03-20

Cross-cloud IAM best practices for securing AI services on AWS, Azure, and GCP, covering least privilege, service identity management, cross-account access, and policy automation.

cloud-ai-security iam aws azure gcp least-privilege

概覽

The AI-Specific IAM Challenge

Why Traditional IAM Falls Short

Action Type	Traditional Equivalent	AI-Specific Risk
Model invocation	API call / function execution	Cost abuse, prompt/response data exposure, content 安全 bypass
微調 job creation	Compute job submission	Training 資料投毒, privilege escalation via service role, model theft
護欄/安全 config modification	安全 policy change	Weakening organizational 安全 controls, compliance violations
Model deployment	Application deployment	Exposing untested or malicious models to production traffic
Prompt/response logging access	Log access	Exposure of sensitive user inputs and model outputs

The Service Identity Problem

AWS IAM for AI Services

Least Privilege for Bedrock

AWS Bedrock 權限 should be scoped by three dimensions: action (what the identity can do), resource (which models or resources), and condition (from where and when).

import json
 
def generate_bedrock_least_privilege_policy(
    allowed_model_ids: list,
    account_id: str,
    region: str = "us-east-1",
    require_vpc_endpoint: str = None,
) -> dict:
    """Generate a least-privilege IAM policy for Bedrock model invocation."""
    model_arns = [
        f"arn:aws:bedrock:{region}::foundation-model/{model_id}"
        for model_id in allowed_model_ids
    ]
 
    statements = [
        {
            "Sid": "AllowModelInvocation",
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream",
            ],
            "Resource": model_arns,
        },
        {
            "Sid": "DenyModelManagement",
            "Effect": "Deny",
            "Action": [
                "bedrock:CreateModelCustomizationJob",
                "bedrock:CreateProvisionedModelThroughput",
                "bedrock:DeleteCustomModel",
                "bedrock:CreateGuardrail",
                "bedrock:UpdateGuardrail",
                "bedrock:DeleteGuardrail",
            ],
            "Resource": "*",
        },
    ]
 
    # Add VPC endpoint condition if specified
    if require_vpc_endpoint:
        statements[0]["Condition"] = {
            "StringEquals": {
                "aws:sourceVpce": require_vpc_endpoint
            }
        }
 
    return {
        "Version": "2012-10-17",
        "Statement": statements,
    }
 
 
def generate_bedrock_admin_policy(
    account_id: str,
    region: str = "us-east-1",
) -> dict:
    """Generate a policy for Bedrock administrators (deployment, not invocation)."""
    return {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowBedrockManagement",
                "Effect": "Allow",
                "Action": [
                    "bedrock:ListFoundationModels",
                    "bedrock:GetFoundationModel",
                    "bedrock:ListCustomModels",
                    "bedrock:GetCustomModel",
                    "bedrock:ListGuardrails",
                    "bedrock:GetGuardrail",
                    "bedrock:ListModelInvocationLoggingConfigurations",
                ],
                "Resource": "*",
            },
            {
                "Sid": "AllowGuardrailManagement",
                "Effect": "Allow",
                "Action": [
                    "bedrock:CreateGuardrail",
                    "bedrock:UpdateGuardrail",
                    "bedrock:CreateGuardrailVersion",
                ],
                "Resource": f"arn:aws:bedrock:{region}:{account_id}:護欄/*",
            },
            {
                "Sid": "DenyModelInvocation",
                "Effect": "Deny",
                "Action": [
                    "bedrock:InvokeModel",
                    "bedrock:InvokeModelWithResponseStream",
                ],
                "Resource": "*",
                "Condition": {
                    "StringNotLike": {
                        "aws:PrincipalTag/team": "ai-platform-admin"
                    }
                },
            },
        ],
    }

SageMaker IAM Separation

For AWS SageMaker, separate IAM roles for different lifecycle phases:

def generate_sagemaker_role_policies() -> dict:
    """Generate separated IAM policies for SageMaker lifecycle phases."""
    return {
        "data_scientist_policy": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "NotebookAccess",
                    "Effect": "Allow",
                    "Action": [
                        "sagemaker:CreatePresignedNotebookInstanceUrl",
                        "sagemaker:DescribeNotebookInstance",
                        "sagemaker:StartNotebookInstance",
                        "sagemaker:StopNotebookInstance",
                    ],
                    "Resource": "arn:aws:sagemaker:*:*:notebook-instance/${aws:PrincipalTag/notebook-name}",
                },
                {
                    "Sid": "TrainingJobAccess",
                    "Effect": "Allow",
                    "Action": [
                        "sagemaker:CreateTrainingJob",
                        "sagemaker:DescribeTrainingJob",
                        "sagemaker:ListTrainingJobs",
                    ],
                    "Resource": "*",
                    "Condition": {
                        "StringEquals": {
                            "sagemaker:ResourceTag/team": "${aws:PrincipalTag/team}"
                        }
                    },
                },
            ],
        },
        "mlops_deployment_policy": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "EndpointManagement",
                    "Effect": "Allow",
                    "Action": [
                        "sagemaker:CreateEndpoint",
                        "sagemaker:CreateEndpointConfig",
                        "sagemaker:UpdateEndpoint",
                        "sagemaker:DeleteEndpoint",
                        "sagemaker:DescribeEndpoint",
                    ],
                    "Resource": "*",
                },
                {
                    "Sid": "DenyTraining",
                    "Effect": "Deny",
                    "Action": [
                        "sagemaker:CreateTrainingJob",
                        "sagemaker:CreateHyperParameterTuningJob",
                    ],
                    "Resource": "*",
                },
            ],
        },
        "training_execution_role": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "S3TrainingDataAccess",
                    "Effect": "Allow",
                    "Action": [
                        "s3:GetObject",
                        "s3:ListBucket",
                    ],
                    "Resource": [
                        "arn:aws:s3:::ml-訓練-data-*",
                        "arn:aws:s3:::ml-訓練-data-*/*",
                    ],
                },
                {
                    "Sid": "S3ModelOutput",
                    "Effect": "Allow",
                    "Action": [
                        "s3:PutObject",
                    ],
                    "Resource": "arn:aws:s3:::ml-model-artifacts-*/*",
                },
                {
                    "Sid": "CloudWatchLogging",
                    "Effect": "Allow",
                    "Action": [
                        "logs:CreateLogStream",
                        "logs:PutLogEvents",
                        "logs:CreateLogGroup",
                    ],
                    "Resource": "arn:aws:logs:*:*:log-group:/aws/sagemaker/*",
                },
            ],
        },
    }

Azure RBAC for AI Services

Custom Role Definitions

Azure's built-in roles for Cognitive Services are often too broad. Create custom role definitions that align with your organization's AI service usage patterns:

from azure.mgmt.授權 import AuthorizationManagementClient
from azure.identity import DefaultAzureCredential
import uuid
 
def create_azure_openai_readonly_role(
    subscription_id: str,
) -> dict:
    """Create a custom role that allows Azure OpenAI invocation but not management."""
    credential = DefaultAzureCredential()
    auth_client = AuthorizationManagementClient(credential, subscription_id)
 
    role_definition = {
        "properties": {
            "roleName": "Azure OpenAI Invoker",
            "description": "Can invoke Azure OpenAI models but cannot manage "
                           "deployments, content filters, or resource configuration.",
            "type": "CustomRole",
            "權限": [
                {
                    "actions": [],
                    "notActions": [],
                    "dataActions": [
                        "Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action",
                        "Microsoft.CognitiveServices/accounts/OpenAI/deployments/completions/action",
                        "Microsoft.CognitiveServices/accounts/OpenAI/deployments/嵌入向量/action",
                    ],
                    "notDataActions": [
                        "Microsoft.CognitiveServices/accounts/OpenAI/deployments/write",
                        "Microsoft.CognitiveServices/accounts/OpenAI/deployments/delete",
                    ],
                }
            ],
            "assignableScopes": [f"/subscriptions/{subscription_id}"],
        }
    }
 
    role_id = str(uuid.uuid4())
    scope = f"/subscriptions/{subscription_id}"
 
    result = auth_client.role_definitions.create_or_update(
        scope, role_id, role_definition
    )
 
    return {
        "role_id": result.id,
        "role_name": result.role_name,
    }

Managed Identity Patterns

Use system-assigned or user-assigned managed identities for all Azure OpenAI access. The following pattern shows how to configure an App Service to use managed identity for Azure OpenAI:

from azure.mgmt.web import WebSiteManagementClient
 
def configure_app_managed_identity(
    subscription_id: str,
    resource_group: str,
    app_name: str,
    openai_endpoint: str,
) -> dict:
    """Configure an App Service to use managed identity for Azure OpenAI."""
    credential = DefaultAzureCredential()
    web_client = WebSiteManagementClient(credential, subscription_id)
 
    # Enable system-assigned managed identity
    web_client.web_apps.update(
        resource_group,
        app_name,
        {
            "identity": {"type": "SystemAssigned"},
        },
    )
 
    # Get the identity principal ID
    app = web_client.web_apps.get(resource_group, app_name)
    principal_id = app.identity.principal_id
 
    # Set app settings to use managed identity (not API keys)
    web_client.web_apps.update_application_settings(
        resource_group,
        app_name,
        {
            "properties": {
                "AZURE_OPENAI_ENDPOINT": openai_endpoint,
                "AZURE_OPENAI_AUTH_TYPE": "managed_identity",
                # No API key setting -- forces managed identity usage
            }
        },
    )
 
    return {
        "principal_id": principal_id,
        "auth_type": "managed_identity",
        "detail": "Assign 'Cognitive Services OpenAI User' role to this principal ID.",
    }

GCP IAM for AI Services

IAM Conditions for Vertex AI

GCP IAM supports conditions that restrict when and where 權限 are effective. Use conditions to limit Vertex AI access to specific resources and time windows:

def create_conditional_vertex_binding(
    project_id: str,
    member: str,
    allowed_endpoints: list,
) -> dict:
    """Create a conditional IAM binding for Vertex AI endpoints."""
    from google.雲端 import resourcemanager_v3
    from google.type import expr_pb2
 
    client = resourcemanager_v3.ProjectsClient()
 
    # Build condition expression that restricts to specific endpoints
    endpoint_conditions = " || ".join([
        f'resource.name.startsWith("projects/{project_id}/locations/us-central1/endpoints/{ep}")'
        for ep in allowed_endpoints
    ])
 
    condition = expr_pb2.Expr(
        expression=endpoint_conditions,
        title="Restrict to approved endpoints",
        description="Limits prediction access to specific Vertex AI endpoints.",
    )
 
    # This would be applied via set_iam_policy on the project
    binding = {
        "role": "roles/aiplatform.user",
        "members": [member],
        "condition": {
            "expression": endpoint_conditions,
            "title": "Restrict to approved endpoints",
        },
    }
 
    return {
        "binding": binding,
        "detail": "Apply this binding via gcloud or Terraform. "
                  "IAM conditions on Vertex AI restrict which specific "
                  "resources the member can access.",
    }

Workload Identity Federation

For CI/CD pipelines and external workloads accessing Vertex AI, use Workload Identity Federation instead of service account keys:

def setup_workload_identity_pool(
    project_id: str,
    project_number: str,
    github_org: str,
    github_repo: str,
) -> dict:
    """Create a Workload Identity Pool for GitHub Actions accessing Vertex AI."""
    from google.雲端 import iam_v1
 
    # Configuration for workload identity federation
    pool_config = {
        "pool_id": "github-actions-pool",
        "display_name": "GitHub Actions",
        "provider_config": {
            "provider_id": "github",
            "issuer_uri": "https://符元.actions.githubusercontent.com",
            "attribute_mapping": {
                "google.subject": "assertion.sub",
                "attribute.repository": "assertion.repository",
                "attribute.actor": "assertion.actor",
                "attribute.ref": "assertion.ref",
            },
            "attribute_condition": (
                f'assertion.repository == "{github_org}/{github_repo}" '
                f'&& assertion.ref == "refs/heads/main"'
            ),
        },
        "service_account_binding": {
            "service_account": f"vertex-ai-deployer@{project_id}.iam.gserviceaccount.com",
            "role": "roles/iam.workloadIdentityUser",
            "member": (
                f"principalSet://iam.googleapis.com/projects/{project_number}"
                f"/locations/global/workloadIdentityPools/github-actions-pool/"
                f"attribute.repository/{github_org}/{github_repo}"
            ),
        },
    }
 
    return pool_config

Cross-雲端 IAM Patterns

Shared AI Model Access

When organizations use AI services across multiple clouds, identity federation enables cross-雲端 access without managing separate credentials:

def design_cross_cloud_ai_access() -> dict:
    """Design patterns for cross-雲端 AI service access."""
    return {
        "pattern_1_hub_spoke": {
            "description": "Central identity provider (e.g., Entra ID) federated to all clouds",
            "aws_config": "SAML or OIDC federation to IAM Identity Center",
            "azure_config": "Native Entra ID integration",
            "gcp_config": "Workforce Identity Federation with Entra ID as IdP",
            "pros": ["Single identity source", "Centralized access reviews", "Consistent MFA"],
            "cons": ["Single point of failure", "Complex initial setup"],
        },
        "pattern_2_service_mesh": {
            "description": "雲端-native service identities with cross-雲端 trust",
            "aws_config": "IAM roles with OIDC federation from GCP/Azure",
            "azure_config": "Managed identities with federated credentials",
            "gcp_config": "Workload Identity Federation with AWS/Azure 符元",
            "pros": ["No long-lived credentials", "雲端-native"],
            "cons": ["Complex trust relationships", "Different semantics per 雲端"],
        },
    }

Automated IAM 評估

Cross-雲端 IAM Scanner

Automate IAM 評估 across all three clouds with a unified scanning framework:

from dataclasses import dataclass, field
from typing import Protocol
from enum import Enum
 
class Severity(Enum):
    CRITICAL = "critical"
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"
    INFO = "info"
 
@dataclass
class IAMFinding:
    雲端: str
    service: str
    severity: Severity
    title: str
    detail: str
    resource: str
    recommendation: str
 
class CloudIAMScanner(Protocol):
    def scan(self) -> list[IAMFinding]: ...
 
@dataclass
class AWSBedrockIAMScanner:
    session: object  # boto3.Session
    account_id: str
 
    def scan(self) -> list[IAMFinding]:
        findings = []
        iam = self.session.client("iam")
 
        # Check for wildcard Bedrock 權限
        paginator = iam.get_paginator("list_policies")
        for page in paginator.paginate(Scope="Local"):
            for policy in page["Policies"]:
                version = iam.get_policy_version(
                    PolicyArn=policy["Arn"],
                    VersionId=policy["DefaultVersionId"],
                )
                document = version["PolicyVersion"]["Document"]
                if isinstance(document, str):
                    import json
                    document = json.loads(document)
 
                for statement in document.get("Statement", []):
                    actions = statement.get("Action", [])
                    if isinstance(actions, str):
                        actions = [actions]
 
                    if statement.get("Effect") == "Allow" and "bedrock:*" in actions:
                        findings.append(IAMFinding(
                            雲端="AWS",
                            service="Bedrock",
                            severity=Severity.HIGH,
                            title="Wildcard Bedrock 權限",
                            detail=f"Policy {policy['PolicyName']} grants bedrock:* access",
                            resource=policy["Arn"],
                            recommendation="Scope to specific actions: bedrock:InvokeModel "
                                           "for consumers, specific management actions for admins.",
                        ))
 
        return findings
 
 
def run_cross_cloud_scan(scanners: list[CloudIAMScanner]) -> dict:
    """Run IAM scans across all configured 雲端 environments."""
    all_findings = []
    for scanner in scanners:
        all_findings.extend(scanner.scan())
 
    summary = {
        "total_findings": len(all_findings),
        "by_severity": {},
        "by_cloud": {},
        "findings": [
            {
                "雲端": f.雲端,
                "service": f.service,
                "severity": f.severity.value,
                "title": f.title,
                "detail": f.detail,
                "recommendation": f.recommendation,
            }
            for f in all_findings
        ],
    }
 
    for f in all_findings:
        summary["by_severity"][f.severity.value] = summary["by_severity"].get(f.severity.value, 0) + 1
        summary["by_cloud"][f.雲端] = summary["by_cloud"].get(f.雲端, 0) + 1
 
    return summary

IAM Governance Recommendations

Policy Design Principles

Separate invocation from management: Model consumers should never have 權限 to create, modify, or delete AI resources. Separate IAM policies for invocation and management prevent lateral movement from a compromised application to infrastructure control.
Scope by model: Do not grant access to all models when only specific ones are needed. Use resource-level 權限 to restrict which models an identity can invoke.
Enforce network conditions: Require that AI API calls originate from approved network locations (VPC endpoints, private endpoints, private service connect) using IAM condition keys.
Time-bound elevated access: Use just-in-time access tools (AWS IAM Identity Center temporary access, Azure PIM, GCP IAM Conditions with time expressions) for administrative operations.
Tag-based access control: Use attribute-based access control (ABAC) with tags/labels to dynamically scope 權限 based on team, environment, and data classification.

Regular Access Review

Conduct quarterly access reviews focused on AI-specific 權限. The review should answer:

Who can invoke each model, and do they still need that access?
Who can modify 安全/護欄 configurations?
Are service identities scoped to minimum necessary 權限?
Are there any cross-account or cross-project access paths that bypass intended controls?
Are deprecated models still accessible to identities that should have migrated?

參考文獻

AWS, "安全 best practices in IAM," https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html
Microsoft, "Azure RBAC best practices," https://learn.microsoft.com/en-us/azure/role-based-access-control/best-practices
Google 雲端, "IAM best practices," https://雲端.google.com/iam/docs/using-iam-securely
NIST SP 800-207, "Zero Trust Architecture," August 2020, https://csrc.nist.gov/publications/detail/sp/800-207/final
CSA, "安全 Guidance for Critical Areas of Focus in 雲端 Computing v4.0," https://cloudsecurityalliance.org/research/guidance

Knowledge Check

Why should model invocation 權限 be separated from model management 權限 in IAM policies?

Knowledge Check

What is the recommended approach for CI/CD pipelines that need to deploy Vertex AI models on GCP?

IAM Best Practices for Cloud AI Services

Related articles

IAM Best Practices for Cloud AI Services

Related articles