Capstone: Cloud AI Security Assessment

advanced11 min readUpdated 2026-03-15

Assess AI deployment security across AWS, Azure, and GCP cloud platforms, producing a comprehensive cloud AI security assessment report.

capstone cloud aws azure gcp advanced

Overview

AI workloads in the cloud introduce security considerations that go beyond traditional cloud security assessments. Model endpoints need protection from adversarial inputs. Training data in object storage needs access controls that account for ML pipeline service accounts. Model registries can become supply chain attack vectors. GPU instances have unique cost exposure risks. And the managed AI services from each cloud provider (SageMaker, Azure AI, Vertex AI) have their own security configurations that most cloud security teams have never audited.

This capstone project challenges you to conduct a security assessment of AI deployments across the three major cloud providers. You will evaluate infrastructure security, model endpoint protection, data pipeline security, and cloud-specific AI service configurations.

Prerequisites

Cloud AI Security — Cloud-specific AI security considerations
Infrastructure Security — General infrastructure assessment methodology
Training Pipeline Attacks — Supply chain and pipeline risks
Defenses and Guardrails — Security controls to evaluate
Basic familiarity with at least one major cloud provider (AWS, Azure, or GCP)
Understanding of IAM, networking, and object storage concepts

Project Brief

Scenario

You are a cloud security specialist at Stratos Security Consulting. Your client, AeroTech Dynamics, is an aerospace engineering firm that has deployed AI workloads across multiple cloud providers:

AWS Deployment:

SageMaker endpoints hosting computer vision models for quality inspection
S3 buckets containing training data (factory floor images, labeled defect data)
Lambda functions for inference preprocessing and postprocessing
SageMaker Model Registry for model versioning and deployment

Azure Deployment:

Azure OpenAI Service powering an internal engineering assistant chatbot
Azure AI Search (formerly Cognitive Search) for RAG over engineering documentation
Azure Blob Storage for document uploads and embedding caches
Azure Key Vault for API key management

GCP Deployment:

Vertex AI pipelines for model training and evaluation
Cloud Storage buckets for training datasets and model artifacts
Cloud Functions for inference API middleware
Artifact Registry for container images used in training pipelines

AeroTech wants a security assessment focused specifically on their AI workloads — not a general cloud security audit, but an AI-specific assessment that covers model security, data pipeline integrity, and AI service configuration.

Assessment Scope

## In Scope
- AI service configurations (SageMaker, Azure OpenAI, Vertex AI)
- Model endpoint security (authentication, authorization, input validation)
- Training data storage and access controls
- Model registry and artifact security
- AI pipeline security (training, evaluation, deployment)
- IAM roles and policies specific to AI workloads
- Network security for AI endpoints
- Cost and resource controls for AI compute
 
## Out of Scope
- General cloud security posture (VPC, general IAM, compliance)
- Non-AI application workloads
- Physical security and on-premises infrastructure
- Social engineering and phishing

Deliverables

Primary Deliverables

Deliverable	Description	Weight
Assessment report	Cloud AI security assessment covering all three providers	40%
Configuration review	Detailed review of AI service configurations with specific misconfigurations	25%
Risk matrix	Cloud AI risk matrix mapping threats to assets across providers	15%
Remediation guide	Provider-specific remediation steps with IaC examples	20%

Rubric Criteria

Coverage (20%) — All three providers assessed with AI-specific (not generic cloud) findings
Technical Depth (25%) — Findings demonstrate understanding of cloud AI service internals and specific misconfigurations
Provider Specificity (15%) — Remediation is specific to each provider's service, not generic advice
Risk Prioritization (20%) — Findings are prioritized by realistic exploitation likelihood and business impact
Actionability (20%) — Remediation includes specific configuration changes, CLI commands, or IaC snippets

Phased Approach

Phase 1: Reconnaissance and Scoping (2 hours)

Inventory AI assets across providers
Build a comprehensive inventory of AI-related resources across all three clouds: endpoints, storage, registries, pipelines, service accounts, and network configurations. Use cloud provider inventory tools (AWS Config, Azure Resource Graph, GCP Asset Inventory) or review IaC templates.
Map AI-specific IAM roles
Identify all IAM roles, service accounts, and permissions associated with AI workloads. Pay attention to: SageMaker execution roles, Azure AI service principals, Vertex AI service accounts, and any cross-service permissions (e.g., can the training pipeline role also deploy models?).
Identify data flows
Map how data flows through each AI pipeline: where training data is stored, how it reaches the training compute, where models are stored after training, how models are deployed to endpoints, and how inference requests reach the model. Each flow is an attack surface.

Phase 2: Service Configuration Review (4 hours)

Assess AWS SageMaker security
Review: endpoint authentication and authorization (IAM vs. API key), VPC configuration for endpoints (are they publicly accessible?), encryption at rest and in transit for model artifacts, SageMaker execution role permissions (least privilege), Model Registry access controls, and S3 bucket policies for training data.
Assess Azure OpenAI and AI Search security
Review: Azure OpenAI API key rotation and managed identity usage, content filtering configuration, network access restrictions (private endpoints vs. public), Azure AI Search index permissions (who can read what?), diagnostic logging configuration, and Key Vault access policies for API key storage.
Assess GCP Vertex AI security
Review: Vertex AI endpoint authentication (service account vs. API key), VPC Service Controls enforcement, Cloud Storage bucket permissions for training data and model artifacts, Artifact Registry access controls for training containers, pipeline service account permissions, and audit logging configuration.
Cross-provider analysis
Identify inconsistencies across providers: is one provider's deployment significantly less secure than others? Are there shared credentials or service accounts that cross provider boundaries? Is there a single point of compromise that would affect deployments on multiple providers?

Phase 3: Vulnerability Testing (4 hours)

Test model endpoint security
For accessible endpoints, test: authentication bypass attempts, rate limiting effectiveness, input validation (malformed inputs, oversized payloads), inference API abuse (model extraction through query volume), and error message information leakage.
Test data access controls
Verify that training data storage has appropriate access controls: can an unauthenticated user access S3 buckets? Can a read-only role modify training data? Are model artifacts in Cloud Storage or Blob Storage properly restricted? Test for public bucket misconfiguration.
Test model registry integrity
Assess whether the model registry (SageMaker Model Registry, Azure ML Registry, Artifact Registry) is protected against unauthorized model uploads or modifications. Could an attacker with compromised pipeline credentials replace a model with a backdoored version?
Test cost exposure
Evaluate cost controls: are there budget alerts for AI compute? Can an attacker trigger expensive training jobs or scale up GPU instances? Is there a maximum instance count limit? Test whether rate limiting prevents API abuse that would generate excessive costs.

Phase 4: Reporting (2 hours)

Write the assessment report
Structure the report by provider and by risk category. For each finding: describe the misconfiguration, explain the AI-specific risk (why this matters more for AI workloads than generic compute), provide evidence, and include provider-specific remediation.
Build the risk matrix
Create a risk matrix that maps cloud AI threats (model theft, data poisoning, supply chain compromise, cost exhaustion, unauthorized access) against assets across all three providers. Highlight which provider has the strongest and weakest posture for each threat.
Produce remediation guide with IaC examples
For each finding, provide specific remediation steps including CLI commands or IaC snippets (Terraform, CloudFormation, ARM templates) that implement the fix. This makes remediation immediately actionable for the client's DevOps team.

Example Output

Example Finding: AWS SageMaker

## Finding: SageMaker Endpoint Publicly Accessible Without IAM Auth
 
**Provider:** AWS
**Service:** SageMaker Real-time Inference
**Severity:** Critical
**Category:** Model Endpoint Security
 
### Description
The quality inspection model endpoint (endpoint-qc-vision-prod) is deployed
without VPC configuration and with IAM authentication disabled. The endpoint
is accessible from the public internet and accepts inference requests from
any source without authentication.
 
### AI-Specific Risk
Unlike a generic API endpoint, an exposed model endpoint enables:
- **Model extraction:** An attacker can send systematic queries to
  reconstruct the model's decision boundary, stealing proprietary
  IP in the quality inspection model
- **Adversarial input testing:** An attacker can probe the model's
  weaknesses to craft adversarial images that pass quality inspection
- **Cost exhaustion:** Unauthenticated access allows unlimited inference
  requests, running up GPU instance costs
 
### Evidence
```bash
# Endpoint accessible without auth
aws sagemaker-runtime invoke-endpoint \
  --endpoint-name endpoint-qc-vision-prod \
  --body '{"image": "base64..."}' \
  --content-type application/json \
  output.json
# Returns: 200 OK with inference result
```
 
### Remediation
```terraform
resource "aws_sagemaker_endpoint_config" "qc_vision" {
  name = "endpoint-qc-vision-prod"
 
  production_variants {
    variant_name = "primary"
    model_name   = aws_sagemaker_model.qc_vision.name
    # ... other config
  }
 
  # Enable VPC configuration
  vpc_config {
    security_group_ids = [aws_security_group.sagemaker_sg.id]
    subnets           = aws_subnet.private[*].id
  }
}
 
# Ensure IAM authentication is required (default, do not disable)
# Access via IAM role with least-privilege policy:
resource "aws_iam_policy" "sagemaker_invoke" {
  name = "sagemaker-invoke-qc-vision"
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = "sagemaker:InvokeEndpoint"
      Resource = aws_sagemaker_endpoint.qc_vision.arn
    }]
  })
}
```
 
### Example Risk Matrix
 
```markdown
| Threat Category      | AWS (SageMaker)  | Azure (OpenAI)  | GCP (Vertex AI) |
|---------------------|------------------|-----------------|-----------------|
| Model theft         | CRITICAL — public endpoint | LOW — private endpoint | MEDIUM — API key only |
| Data poisoning      | MEDIUM — S3 ACL gaps | LOW — RBAC enforced | HIGH — public bucket |
| Supply chain        | HIGH — registry open | MEDIUM — no signing | HIGH — no image scanning |
| Cost exhaustion     | CRITICAL — no limits | LOW — quotas set | MEDIUM — no budget alert |
| Unauthorized access | HIGH — overprivileged roles | LOW — managed identity | MEDIUM — shared SA |
```

Hints

Knowledge Check

Why is a publicly accessible SageMaker model endpoint a more severe finding for an AI workload than a publicly accessible generic API endpoint?

Edit this page on GitHub

Capstone: Cloud AI Security Assessment

advanced11 min readUpdated 2026-03-15

Assess AI deployment security across AWS, Azure, and GCP cloud platforms, producing a comprehensive cloud AI security assessment report.

capstone cloud aws azure gcp advanced

Overview

Prerequisites

Cloud AI Security — Cloud-specific AI security considerations
Infrastructure Security — General infrastructure assessment methodology
Training Pipeline Attacks — Supply chain and pipeline risks
Defenses and Guardrails — Security controls to evaluate
Basic familiarity with at least one major cloud provider (AWS, Azure, or GCP)
Understanding of IAM, networking, and object storage concepts

Project Brief

Scenario

AWS Deployment:

SageMaker endpoints hosting computer vision models for quality inspection
S3 buckets containing training data (factory floor images, labeled defect data)
Lambda functions for inference preprocessing and postprocessing
SageMaker Model Registry for model versioning and deployment

Azure Deployment:

Azure OpenAI Service powering an internal engineering assistant chatbot
Azure AI Search (formerly Cognitive Search) for RAG over engineering documentation
Azure Blob Storage for document uploads and embedding caches
Azure Key Vault for API key management

GCP Deployment:

Vertex AI pipelines for model training and evaluation
Cloud Storage buckets for training datasets and model artifacts
Cloud Functions for inference API middleware
Artifact Registry for container images used in training pipelines

Assessment Scope

## In Scope
- AI service configurations (SageMaker, Azure OpenAI, Vertex AI)
- Model endpoint security (authentication, authorization, input validation)
- Training data storage and access controls
- Model registry and artifact security
- AI pipeline security (training, evaluation, deployment)
- IAM roles and policies specific to AI workloads
- Network security for AI endpoints
- Cost and resource controls for AI compute
 
## Out of Scope
- General cloud security posture (VPC, general IAM, compliance)
- Non-AI application workloads
- Physical security and on-premises infrastructure
- Social engineering and phishing

Deliverables

Primary Deliverables

Deliverable	Description	Weight
Assessment report	Cloud AI security assessment covering all three providers	40%
Configuration review	Detailed review of AI service configurations with specific misconfigurations	25%
Risk matrix	Cloud AI risk matrix mapping threats to assets across providers	15%
Remediation guide	Provider-specific remediation steps with IaC examples	20%

Rubric Criteria

Coverage (20%) — All three providers assessed with AI-specific (not generic cloud) findings
Technical Depth (25%) — Findings demonstrate understanding of cloud AI service internals and specific misconfigurations
Provider Specificity (15%) — Remediation is specific to each provider's service, not generic advice
Risk Prioritization (20%) — Findings are prioritized by realistic exploitation likelihood and business impact
Actionability (20%) — Remediation includes specific configuration changes, CLI commands, or IaC snippets

Phased Approach

Phase 1: Reconnaissance and Scoping (2 hours)

Inventory AI assets across providers
Build a comprehensive inventory of AI-related resources across all three clouds: endpoints, storage, registries, pipelines, service accounts, and network configurations. Use cloud provider inventory tools (AWS Config, Azure Resource Graph, GCP Asset Inventory) or review IaC templates.
Map AI-specific IAM roles
Identify all IAM roles, service accounts, and permissions associated with AI workloads. Pay attention to: SageMaker execution roles, Azure AI service principals, Vertex AI service accounts, and any cross-service permissions (e.g., can the training pipeline role also deploy models?).
Identify data flows
Map how data flows through each AI pipeline: where training data is stored, how it reaches the training compute, where models are stored after training, how models are deployed to endpoints, and how inference requests reach the model. Each flow is an attack surface.

Phase 2: Service Configuration Review (4 hours)

Assess AWS SageMaker security
Review: endpoint authentication and authorization (IAM vs. API key), VPC configuration for endpoints (are they publicly accessible?), encryption at rest and in transit for model artifacts, SageMaker execution role permissions (least privilege), Model Registry access controls, and S3 bucket policies for training data.
Assess Azure OpenAI and AI Search security
Review: Azure OpenAI API key rotation and managed identity usage, content filtering configuration, network access restrictions (private endpoints vs. public), Azure AI Search index permissions (who can read what?), diagnostic logging configuration, and Key Vault access policies for API key storage.
Assess GCP Vertex AI security
Review: Vertex AI endpoint authentication (service account vs. API key), VPC Service Controls enforcement, Cloud Storage bucket permissions for training data and model artifacts, Artifact Registry access controls for training containers, pipeline service account permissions, and audit logging configuration.
Cross-provider analysis
Identify inconsistencies across providers: is one provider's deployment significantly less secure than others? Are there shared credentials or service accounts that cross provider boundaries? Is there a single point of compromise that would affect deployments on multiple providers?

Phase 3: Vulnerability Testing (4 hours)

Test model endpoint security
For accessible endpoints, test: authentication bypass attempts, rate limiting effectiveness, input validation (malformed inputs, oversized payloads), inference API abuse (model extraction through query volume), and error message information leakage.
Test data access controls
Verify that training data storage has appropriate access controls: can an unauthenticated user access S3 buckets? Can a read-only role modify training data? Are model artifacts in Cloud Storage or Blob Storage properly restricted? Test for public bucket misconfiguration.
Test model registry integrity
Assess whether the model registry (SageMaker Model Registry, Azure ML Registry, Artifact Registry) is protected against unauthorized model uploads or modifications. Could an attacker with compromised pipeline credentials replace a model with a backdoored version?
Test cost exposure
Evaluate cost controls: are there budget alerts for AI compute? Can an attacker trigger expensive training jobs or scale up GPU instances? Is there a maximum instance count limit? Test whether rate limiting prevents API abuse that would generate excessive costs.

Phase 4: Reporting (2 hours)

Write the assessment report
Structure the report by provider and by risk category. For each finding: describe the misconfiguration, explain the AI-specific risk (why this matters more for AI workloads than generic compute), provide evidence, and include provider-specific remediation.
Build the risk matrix
Create a risk matrix that maps cloud AI threats (model theft, data poisoning, supply chain compromise, cost exhaustion, unauthorized access) against assets across all three providers. Highlight which provider has the strongest and weakest posture for each threat.
Produce remediation guide with IaC examples
For each finding, provide specific remediation steps including CLI commands or IaC snippets (Terraform, CloudFormation, ARM templates) that implement the fix. This makes remediation immediately actionable for the client's DevOps team.

Example Output

Example Finding: AWS SageMaker

## Finding: SageMaker Endpoint Publicly Accessible Without IAM Auth
 
**Provider:** AWS
**Service:** SageMaker Real-time Inference
**Severity:** Critical
**Category:** Model Endpoint Security
 
### Description
The quality inspection model endpoint (endpoint-qc-vision-prod) is deployed
without VPC configuration and with IAM authentication disabled. The endpoint
is accessible from the public internet and accepts inference requests from
any source without authentication.
 
### AI-Specific Risk
Unlike a generic API endpoint, an exposed model endpoint enables:
- **Model extraction:** An attacker can send systematic queries to
  reconstruct the model's decision boundary, stealing proprietary
  IP in the quality inspection model
- **Adversarial input testing:** An attacker can probe the model's
  weaknesses to craft adversarial images that pass quality inspection
- **Cost exhaustion:** Unauthenticated access allows unlimited inference
  requests, running up GPU instance costs
 
### Evidence
```bash
# Endpoint accessible without auth
aws sagemaker-runtime invoke-endpoint \
  --endpoint-name endpoint-qc-vision-prod \
  --body '{"image": "base64..."}' \
  --content-type application/json \
  output.json
# Returns: 200 OK with inference result
```
 
### Remediation
```terraform
resource "aws_sagemaker_endpoint_config" "qc_vision" {
  name = "endpoint-qc-vision-prod"
 
  production_variants {
    variant_name = "primary"
    model_name   = aws_sagemaker_model.qc_vision.name
    # ... other config
  }
 
  # Enable VPC configuration
  vpc_config {
    security_group_ids = [aws_security_group.sagemaker_sg.id]
    subnets           = aws_subnet.private[*].id
  }
}
 
# Ensure IAM authentication is required (default, do not disable)
# Access via IAM role with least-privilege policy:
resource "aws_iam_policy" "sagemaker_invoke" {
  name = "sagemaker-invoke-qc-vision"
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = "sagemaker:InvokeEndpoint"
      Resource = aws_sagemaker_endpoint.qc_vision.arn
    }]
  })
}
```
 
### Example Risk Matrix
 
```markdown
| Threat Category      | AWS (SageMaker)  | Azure (OpenAI)  | GCP (Vertex AI) |
|---------------------|------------------|-----------------|-----------------|
| Model theft         | CRITICAL — public endpoint | LOW — private endpoint | MEDIUM — API key only |
| Data poisoning      | MEDIUM — S3 ACL gaps | LOW — RBAC enforced | HIGH — public bucket |
| Supply chain        | HIGH — registry open | MEDIUM — no signing | HIGH — no image scanning |
| Cost exhaustion     | CRITICAL — no limits | LOW — quotas set | MEDIUM — no budget alert |
| Unauthorized access | HIGH — overprivileged roles | LOW — managed identity | MEDIUM — shared SA |
```

Hints

Knowledge Check

Why is a publicly accessible SageMaker model endpoint a more severe finding for an AI workload than a publicly accessible generic API endpoint?

Edit this page on GitHub

Capstone: Cloud AI Security Assessment

Inventory AI assets across providers

Map AI-specific IAM roles

Identify data flows

Assess AWS SageMaker security

Assess Azure OpenAI and AI Search security

Assess GCP Vertex AI security

Cross-provider analysis

Test model endpoint security

Test data access controls

Test model registry integrity

Test cost exposure

Write the assessment report

Build the risk matrix

Produce remediation guide with IaC examples

Related articles

Capstone: Cloud AI Security Assessment

Inventory AI assets across providers

Map AI-specific IAM roles

Identify data flows

Assess AWS SageMaker security

Assess Azure OpenAI and AI Search security

Assess GCP Vertex AI security

Cross-provider analysis

Test model endpoint security

Test data access controls

Test model registry integrity

Test cost exposure

Write the assessment report

Build the risk matrix

Produce remediation guide with IaC examples

Related articles