Cloud ML Platform Security (AWS/Azure/GCP)

advanced10 min readUpdated 2026-03-13

Security comparison of cloud ML platforms including AWS SageMaker, Azure Machine Learning, and Google Vertex AI. IAM configuration, data security, model serving, and platform-specific attack surfaces.

cloud aws azure gcp

Cloud ML platforms (AWS SageMaker, Azure Machine Learning, Google Vertex AI) provide managed infrastructure for training, deploying, and serving ML models — including LLMs. Each platform inherits its parent cloud's security model while adding ML-specific attack surface. Red teaming cloud ML deployments requires understanding both cloud security fundamentals and ML-specific vulnerabilities.

Platform Comparison Overview

Feature	AWS SageMaker	Azure ML	Google Vertex AI
IAM model	AWS IAM (roles, policies)	Azure RBAC + Azure AD	Google Cloud IAM
Network isolation	VPC, PrivateLink	VNet, Private Endpoints	VPC, Private Service Connect
Data encryption	KMS (at rest), TLS (transit)	Azure Key Vault, TLS	Cloud KMS, TLS
Model registry	SageMaker Model Registry	Azure ML Model Registry	Vertex AI Model Registry
Serving	SageMaker Endpoints, Serverless	Managed Endpoints, AKS	Vertex AI Endpoints, GKE
Notebook environment	SageMaker Studio, Notebook Instances	Azure ML Compute Instances	Vertex AI Workbench
Compliance	HIPAA, SOC, FedRAMP	HIPAA, SOC, FedRAMP, DoD IL	HIPAA, SOC, FedRAMP

IAM Misconfiguration Risks

IAM misconfigurations are the most common and highest-impact vulnerability class in cloud ML deployments.

AWS SageMaker IAM Risks

# DANGEROUS: Overly permissive SageMaker execution role
overly_permissive_policy = {
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": "sagemaker:*",
        "Resource": "*"
    }, {
        "Effect": "Allow",
        "Action": "s3:*",      # Full S3 access — can read ALL buckets
        "Resource": "*"
    }, {
        "Effect": "Allow",
        "Action": "iam:PassRole",  # Can pass any role
        "Resource": "*"
    }]
}
 
# This role can:
# - Access any S3 bucket (training data, other projects' data)
# - Create endpoints with any role attached
# - Modify any SageMaker resource
# - Potentially escalate privileges via PassRole

Azure ML IAM Risks

Misconfiguration	Risk	Test
Contributor role on workspace	Full control over ML resources, data, and models	Check role assignments — Contributor is often over-assigned
Shared managed identity	Multiple services share one identity, cross-service access	Enumerate managed identity assignments
Missing data plane RBAC	Control plane access does not restrict data plane operations	Test data access after restricting control plane roles
Service principal over-provisioning	Automation accounts with excessive scope	Review SP permissions across resource groups

Google Vertex AI IAM Risks

Misconfiguration	Risk	Test
`roles/aiplatform.admin` over-assignment	Full control over all Vertex AI resources	Review IAM bindings at project level
Missing VPC Service Controls	Data exfiltration via Vertex AI APIs	Test data access from outside perimeter
Default service account usage	Compute Engine default SA has broad project-level permissions	Check if default SA is used for ML workloads
Cross-project access	Vertex AI resources accessible from other projects	Test cross-project API calls

Data Security Across the ML Lifecycle

Data Collection → Preparation → Training → Model Storage → Serving → Inference
      ↑               ↑            ↑            ↑              ↑          ↑
  Data source     Feature       Training    Model         Endpoint    Response
  access control  store access  data in     artifact      auth        data
  authentication  encryption    compute     encryption    rate limit  leakage

Training Data Security

Risk	AWS SageMaker	Azure ML	Vertex AI
Data at rest	S3 SSE or KMS	Azure Storage encryption	Cloud Storage encryption
Data in training	EBS encryption on training instances	Compute encryption	Boot disk + data disk encryption
Data access logging	CloudTrail + S3 access logs	Azure Monitor + Storage Analytics	Cloud Audit Logs
Cross-account data access	S3 bucket policies + IAM	Shared access signatures	IAM + VPC Service Controls
Data exfiltration prevention	VPC endpoints, no internet access	Private endpoints, NSGs	VPC Service Controls

Model Artifact Security

Concern	Description	Test Approach
Model access control	Who can download or use model artifacts?	Attempt to access model artifacts with different credentials
Model integrity	Can model artifacts be tampered with?	Check for write access to model storage, verify signing
Model versioning	Are old (potentially vulnerable) model versions accessible?	Enumerate available model versions
Serialization risks	Model format (pickle, safetensors) determines code execution risk	Identify model serialization format, test for deserialization attacks

Model Serving Security

Endpoint Authentication

Feature	AWS SageMaker	Azure ML	Vertex AI
Default auth	IAM SigV4	Key or token-based	IAM + API key
Custom auth	Lambda authorizer	Azure AD integration	IAP + IAM
Public endpoint risk	Can be exposed via API Gateway	Can be exposed publicly	Can be exposed publicly
Private endpoint	PrivateLink	Private Endpoint	Private Service Connect
mTLS	Via API Gateway custom domain	Available	Available

Endpoint Attack Surface

Test Authentication Bypass
Attempt to invoke model endpoints without valid credentials. Check for misconfigured API gateways, missing auth on health check endpoints, and default credentials.
Test Input Validation
Send malformed, oversized, and adversarial inputs to model endpoints. Check for error messages that reveal model architecture, framework versions, or infrastructure details.
Test for Model Extraction
Systematically query the endpoint to extract model behavior. With enough queries, an attacker can build a functional copy of the model (model stealing). Check rate limiting and query logging.
Test for Data Leakage in Responses
Check whether model responses contain training data, PII, or other sensitive information. Test for membership inference (determining whether a specific data point was in the training set).
Test Network Isolation
Verify that model endpoints cannot be reached from unauthorized networks. Check VPC configurations, security groups, and network policies.

Notebook Environment Security

ML notebook environments (SageMaker Studio, Azure ML Compute, Vertex AI Workbench) are high-value targets because they typically have broad data access and code execution capabilities:

Risk	Description	Mitigation
Internet access	Notebooks with internet access can exfiltrate data	Use VPC/VNet with no internet egress
Shared instances	Multiple users sharing notebook infrastructure	Use per-user instances with separate credentials
Persistent credentials	Long-lived credentials stored in notebook environments	Use temporary credentials, metadata service v2 (IMDSv2)
Package installation	Users install arbitrary packages that may be malicious	Restrict to approved package repositories
Root access	Notebook users with root can modify security controls	Restrict to non-root access

Cross-Platform Security Checklist

Security Control	What to Verify	Priority
IAM least privilege	Are ML roles scoped to minimum necessary permissions?	Critical
Network isolation	Are ML resources in private subnets with no unnecessary internet access?	Critical
Data encryption	Is data encrypted at rest and in transit? Are keys managed properly?	High
Endpoint authentication	Are model serving endpoints properly authenticated?	High
Logging and monitoring	Are ML API calls, data access, and model invocations logged?	High
Model artifact integrity	Are model artifacts access-controlled and integrity-verified?	Medium
Notebook security	Are notebooks isolated, credentialed appropriately, and network-restricted?	High
Supply chain	Are ML frameworks, packages, and pre-trained models from trusted sources?	Medium

For related topics, see Infrastructure Security, Supply Chain Security, and LangChain & LlamaIndex Security.

Infrastructure Security: API Security -- API-level security testing for ML endpoints
Supply Chain Security -- model artifact and dependency integrity
LangChain & LlamaIndex Security -- framework-specific vulnerabilities on cloud platforms
API Provider Security Comparison -- comparing security controls across LLM API providers

References

"AWS SageMaker Security Best Practices" - Amazon Web Services (2024) - AWS guidance on securing ML workloads including IAM, networking, and encryption
"Azure Machine Learning Security Baseline" - Microsoft (2024) - Security controls and configuration for Azure ML deployments
"Google Cloud Vertex AI Security Overview" - Google Cloud (2024) - Security architecture and best practices for Vertex AI workloads
"CIS Benchmarks for Cloud ML Services" - Center for Internet Security (2024) - Hardening benchmarks for cloud ML platform configurations

Knowledge Check

Why is the combination of iam:PassRole and sagemaker:* permissions a privilege escalation risk?

Cloud ML Platform Security (AWS/Azure/GCP)

Test Authentication Bypass

Test Input Validation

Test for Model Extraction

Test for Data Leakage in Responses

Test Network Isolation

Related articles

Cloud ML Platform Security (AWS/Azure/GCP)

Test Authentication Bypass

Test Input Validation

Test for Model Extraction

Test for Data Leakage in Responses

Test Network Isolation

Related articles