SageMaker 利用ation

進階10 分鐘閱讀更新於 2026-03-15

Red team attack methodology for Amazon SageMaker: endpoint exploitation, notebook instance attacks, training job manipulation, model artifact tampering, and VPC misconfigurations in ML workloads.

aws sagemaker endpoint-exploitation notebook-attacks training-job model-artifacts vpc red-team

SageMaker 利用

Amazon SageMaker provides the full ML lifecycle -- from data labeling and notebook-based development to distributed 訓練, model hosting, and MLOps automation. This breadth creates a large 攻擊面. Unlike Bedrock where 模型 infrastructure is fully managed, SageMaker gives customers control over compute instances, container images, 訓練 scripts, and deployment configurations. Every point of customer control is a point of potential misconfiguration.

Endpoint 利用

Endpoint Enumeration

SageMaker endpoints host trained models for real-time 推論. Each endpoint runs one or more model variants on dedicated compute instances.

# List all active endpoints
aws sagemaker list-endpoints --status-equals InService \
  --query 'Endpoints[].{Name:EndpointName,Created:CreationTime}' --輸出 table
 
# Get endpoint details
aws sagemaker describe-endpoint --endpoint-name <name>
 
# Get endpoint configuration (reveals instance types, model artifacts)
aws sagemaker describe-endpoint-config --endpoint-config-name <config-name>
 
# Get model details (reveals container image and model artifact location)
aws sagemaker describe-model --model-name <model-name>

Model Extraction Through Inference

With sagemaker:InvokeEndpoint access, systematic querying can extract 模型's behavior:

import boto3, json
 
client = boto3.client('sagemaker-runtime')
 
def query_endpoint(payload):
    response = client.invoke_endpoint(
        EndpointName='production-model',
        ContentType='application/json',
        Body=json.dumps(payload)
    )
    return json.loads(response['Body'].read())
 
# Systematic querying to map model behavior
# For classification models: query with crafted inputs to map decision boundaries
# For generative models: extract 訓練資料 through targeted prompting

Endpoint Exposure

SageMaker endpoints are accessed through the AWS API (HTTPS via the SageMaker runtime SDK). 然而, common misconfigurations expose them more broadly:

Misconfiguration	Risk	偵測
Lambda proxy without auth	Public API Gateway routes to SageMaker	Check API Gateway configurations for SageMaker integrations
Overprivileged endpoint role	Endpoint IAM role with S3/DynamoDB access	Describe the endpoint execution role
Missing VPC endpoint policy	Any principal in the VPC can invoke	Check VPC endpoint policy for `sagemaker.runtime`
Cross-account access	Resource policies allowing external accounts	Review model and endpoint resource policies

Notebook Instance 攻擊

Notebook as Pivot Point

SageMaker notebook instances are Jupyter servers running on EC2 instances with an IAM role attached. They are commonly the weakest link in the ML pipeline 因為:

Persistent compute with credentials: The notebook IAM role is accessible via the instance metadata service
Development environment: Data scientists install arbitrary packages, download external data, and run untrusted code
Network access: Notebooks often have access to both public internet and private VPC resources
Long-lived: Notebooks frequently run for weeks or months without patching

Credential Theft

# From a compromised notebook instance
# Access the IAM role credentials
curl http://169.254.169.254/latest/meta-data/iam/安全-credentials/
curl http://169.254.169.254/latest/meta-data/iam/安全-credentials/<role-name>
 
# Check what the notebook role can do
aws sts get-caller-identity
aws iam list-attached-role-policies --role-name <notebook-role>
 
# Common notebook role overprivileges:
# - s3:* (access all 訓練資料, model artifacts)
# - sagemaker:* (full SageMaker control)
# - bedrock:* (invoke any model)
# - logs:* (read all CloudWatch logs)

Lifecycle Configuration 攻擊

SageMaker notebook lifecycle configurations run shell scripts on notebook start and creation. If 攻擊者 can modify these scripts, they achieve persistent code execution:

# List lifecycle configurations
aws sagemaker list-notebook-instance-lifecycle-configs
 
# Get the script content
aws sagemaker describe-notebook-instance-lifecycle-config \
  --notebook-instance-lifecycle-config-name <name>
 
# The script is base64-encoded. Decode to review:
# Look for credential handling, network configuration,
# package installations from untrusted sources

Notebook Content Extraction

Notebooks (.ipynb files) stored on the instance or in connected Git repositories often contain:

資料庫 connection strings and credentials
API keys for external services
訓練資料 samples
Model architecture details and hyperparameters
Internal service URLs and endpoints
Data pipeline configurations

Training Job Manipulation

Training Data Access

Training jobs consume data from S3, EFS, or FSx. Identifying and accessing 訓練資料 is a high-priority target:

# List recent 訓練 jobs
aws sagemaker list-訓練-jobs --sort-by CreationTime --sort-order Descending
 
# Get 訓練 job details including data source
aws sagemaker describe-訓練-job --訓練-job-name <name>
# Look for InputDataConfig[].DataSource.S3DataSource.S3Uri
# and OutputDataConfig.S3OutputPath

Training Pipeline Poisoning

識別 the 訓練資料 source
From the 訓練 job configuration, extract the S3 URI containing 訓練資料. Check for write access to this bucket or prefix.
理解 the data format
Download sample 訓練資料 to 理解 the format (CSV, JSON, JSONL, Parquet, etc.) and schema. The poisoned data must match this format exactly.
Craft poisoned data
Create data that introduces the desired model behavior. For classification models, inject mislabeled examples. For generative models, inject examples containing the target behavior (e.g., 後門 triggers).
Inject and trigger
Write poisoned data to the 訓練 S3 prefix. If 訓練 is scheduled, wait for the next run. If manual, social engineering or waiting for the next retraining cycle is required.

Training Container 攻擊

SageMaker 訓練 jobs run in Docker containers. Organizations use either AWS-provided containers or custom containers from ECR. Custom container attacks:

Container image tampering: Replace the 訓練 container in ECR with a modified version that exfiltrates 訓練資料 or installs a 後門 in the trained model
Dependency confusion: If the container installs packages at runtime, inject malicious packages through dependency confusion attacks
Resource abuse: Training instances have GPU access; a compromised 訓練 container can use these GPUs for cryptocurrency mining or other compute-intensive attacks

Model Artifact Tampering

Artifact Storage

Trained models are stored as artifacts (typically model.tar.gz) in S3. The artifact contains model weights, configuration files, and sometimes 推論 code.

# Find model artifacts
aws sagemaker describe-model --model-name <name>
# Look for PrimaryContainer.ModelDataUrl
 
# Download and inspect
aws s3 cp s3://bucket/path/model.tar.gz .
tar xzf model.tar.gz
# Inspect model files, code/ directory, 推論.py

Model Replacement

If write access to 模型 artifact S3 location is available, the production model can be replaced:

Download the existing model artifact
Modify 模型 (change weights, inject 後門 in 推論 code)
Upload the modified artifact to the same S3 location
The next endpoint update or deployment picks up the modified model

Model Registry 攻擊

SageMaker Model Registry tracks model versions with approval workflows. 攻擊 vectors:

Approve unapproved models: If IAM allows sagemaker:UpdateModelPackage, change model status from PendingManualApproval to Approved
Register backdoored models: Register a new model version pointing to a tampered artifact
Model group hijacking: Create model versions in existing model groups, potentially displacing legitimate models in automated pipelines

VPC Misconfigurations

Network Isolation Failures

SageMaker supports VPC deployment for network isolation, but common misconfigurations undermine it:

Component	Misconfiguration	Impact
Notebook instances	Direct internet access enabled	Data exfiltration, C2 communication
Training jobs	No VPC configuration	訓練資料 accessed over public internet
Endpoints	Public subnet with public IP	Endpoint accessible from the internet
VPC endpoints	Missing or overpermissive policies	Unauthorized service access within VPC

Inter-Service Communication

SageMaker components communicate with other AWS services (S3, ECR, CloudWatch). Without VPC endpoints, this traffic traverses the public internet. With VPC endpoints but without restrictive policies, any principal in the VPC can access these services.

# Check if notebook has direct internet access
aws sagemaker describe-notebook-instance --notebook-instance-name <name>
# DirectInternetAccess: "Enabled" is a finding
 
# Check VPC endpoint policies
aws ec2 describe-vpc-endpoints \
  --filters "Name=service-name,Values=com.amazonaws.*.sagemaker.runtime"

參考文獻

SageMaker 安全 Documentation -- Official 安全 best practices
SageMaker Network Isolation -- VPC and network isolation guidance
SageMaker Model Registry -- Model versioning and approval workflows

SageMaker 利用ation

進階10 分鐘閱讀更新於 2026-03-15

Red team attack methodology for Amazon SageMaker: endpoint exploitation, notebook instance attacks, training job manipulation, model artifact tampering, and VPC misconfigurations in ML workloads.

aws sagemaker endpoint-exploitation notebook-attacks training-job model-artifacts vpc red-team

SageMaker 利用

Endpoint 利用

Endpoint Enumeration

SageMaker endpoints host trained models for real-time 推論. Each endpoint runs one or more model variants on dedicated compute instances.

# List all active endpoints
aws sagemaker list-endpoints --status-equals InService \
  --query 'Endpoints[].{Name:EndpointName,Created:CreationTime}' --輸出 table
 
# Get endpoint details
aws sagemaker describe-endpoint --endpoint-name <name>
 
# Get endpoint configuration (reveals instance types, model artifacts)
aws sagemaker describe-endpoint-config --endpoint-config-name <config-name>
 
# Get model details (reveals container image and model artifact location)
aws sagemaker describe-model --model-name <model-name>

Model Extraction Through Inference

With sagemaker:InvokeEndpoint access, systematic querying can extract 模型's behavior:

import boto3, json
 
client = boto3.client('sagemaker-runtime')
 
def query_endpoint(payload):
    response = client.invoke_endpoint(
        EndpointName='production-model',
        ContentType='application/json',
        Body=json.dumps(payload)
    )
    return json.loads(response['Body'].read())
 
# Systematic querying to map model behavior
# For classification models: query with crafted inputs to map decision boundaries
# For generative models: extract 訓練資料 through targeted prompting

Endpoint Exposure

SageMaker endpoints are accessed through the AWS API (HTTPS via the SageMaker runtime SDK). 然而, common misconfigurations expose them more broadly:

Misconfiguration	Risk	偵測
Lambda proxy without auth	Public API Gateway routes to SageMaker	Check API Gateway configurations for SageMaker integrations
Overprivileged endpoint role	Endpoint IAM role with S3/DynamoDB access	Describe the endpoint execution role
Missing VPC endpoint policy	Any principal in the VPC can invoke	Check VPC endpoint policy for `sagemaker.runtime`
Cross-account access	Resource policies allowing external accounts	Review model and endpoint resource policies

Notebook Instance 攻擊

Notebook as Pivot Point

SageMaker notebook instances are Jupyter servers running on EC2 instances with an IAM role attached. They are commonly the weakest link in the ML pipeline 因為:

Persistent compute with credentials: The notebook IAM role is accessible via the instance metadata service
Development environment: Data scientists install arbitrary packages, download external data, and run untrusted code
Network access: Notebooks often have access to both public internet and private VPC resources
Long-lived: Notebooks frequently run for weeks or months without patching

Credential Theft

# From a compromised notebook instance
# Access the IAM role credentials
curl http://169.254.169.254/latest/meta-data/iam/安全-credentials/
curl http://169.254.169.254/latest/meta-data/iam/安全-credentials/<role-name>
 
# Check what the notebook role can do
aws sts get-caller-identity
aws iam list-attached-role-policies --role-name <notebook-role>
 
# Common notebook role overprivileges:
# - s3:* (access all 訓練資料, model artifacts)
# - sagemaker:* (full SageMaker control)
# - bedrock:* (invoke any model)
# - logs:* (read all CloudWatch logs)

Lifecycle Configuration 攻擊

SageMaker notebook lifecycle configurations run shell scripts on notebook start and creation. If 攻擊者 can modify these scripts, they achieve persistent code execution:

# List lifecycle configurations
aws sagemaker list-notebook-instance-lifecycle-configs
 
# Get the script content
aws sagemaker describe-notebook-instance-lifecycle-config \
  --notebook-instance-lifecycle-config-name <name>
 
# The script is base64-encoded. Decode to review:
# Look for credential handling, network configuration,
# package installations from untrusted sources

Notebook Content Extraction

Notebooks (.ipynb files) stored on the instance or in connected Git repositories often contain:

資料庫 connection strings and credentials
API keys for external services
訓練資料 samples
Model architecture details and hyperparameters
Internal service URLs and endpoints
Data pipeline configurations

Training Job Manipulation

Training Data Access

Training jobs consume data from S3, EFS, or FSx. Identifying and accessing 訓練資料 is a high-priority target:

# List recent 訓練 jobs
aws sagemaker list-訓練-jobs --sort-by CreationTime --sort-order Descending
 
# Get 訓練 job details including data source
aws sagemaker describe-訓練-job --訓練-job-name <name>
# Look for InputDataConfig[].DataSource.S3DataSource.S3Uri
# and OutputDataConfig.S3OutputPath

Training Pipeline Poisoning

識別 the 訓練資料 source
From the 訓練 job configuration, extract the S3 URI containing 訓練資料. Check for write access to this bucket or prefix.
理解 the data format
Download sample 訓練資料 to 理解 the format (CSV, JSON, JSONL, Parquet, etc.) and schema. The poisoned data must match this format exactly.
Craft poisoned data
Create data that introduces the desired model behavior. For classification models, inject mislabeled examples. For generative models, inject examples containing the target behavior (e.g., 後門 triggers).
Inject and trigger
Write poisoned data to the 訓練 S3 prefix. If 訓練 is scheduled, wait for the next run. If manual, social engineering or waiting for the next retraining cycle is required.

Training Container 攻擊

SageMaker 訓練 jobs run in Docker containers. Organizations use either AWS-provided containers or custom containers from ECR. Custom container attacks:

Container image tampering: Replace the 訓練 container in ECR with a modified version that exfiltrates 訓練資料 or installs a 後門 in the trained model
Dependency confusion: If the container installs packages at runtime, inject malicious packages through dependency confusion attacks
Resource abuse: Training instances have GPU access; a compromised 訓練 container can use these GPUs for cryptocurrency mining or other compute-intensive attacks

Model Artifact Tampering

Artifact Storage

Trained models are stored as artifacts (typically model.tar.gz) in S3. The artifact contains model weights, configuration files, and sometimes 推論 code.

# Find model artifacts
aws sagemaker describe-model --model-name <name>
# Look for PrimaryContainer.ModelDataUrl
 
# Download and inspect
aws s3 cp s3://bucket/path/model.tar.gz .
tar xzf model.tar.gz
# Inspect model files, code/ directory, 推論.py

Model Replacement

If write access to 模型 artifact S3 location is available, the production model can be replaced:

Download the existing model artifact
Modify 模型 (change weights, inject 後門 in 推論 code)
Upload the modified artifact to the same S3 location
The next endpoint update or deployment picks up the modified model

Model Registry 攻擊

SageMaker Model Registry tracks model versions with approval workflows. 攻擊 vectors:

Approve unapproved models: If IAM allows sagemaker:UpdateModelPackage, change model status from PendingManualApproval to Approved
Register backdoored models: Register a new model version pointing to a tampered artifact
Model group hijacking: Create model versions in existing model groups, potentially displacing legitimate models in automated pipelines

VPC Misconfigurations

Network Isolation Failures

SageMaker supports VPC deployment for network isolation, but common misconfigurations undermine it:

Component	Misconfiguration	Impact
Notebook instances	Direct internet access enabled	Data exfiltration, C2 communication
Training jobs	No VPC configuration	訓練資料 accessed over public internet
Endpoints	Public subnet with public IP	Endpoint accessible from the internet
VPC endpoints	Missing or overpermissive policies	Unauthorized service access within VPC

Inter-Service Communication

# Check if notebook has direct internet access
aws sagemaker describe-notebook-instance --notebook-instance-name <name>
# DirectInternetAccess: "Enabled" is a finding
 
# Check VPC endpoint policies
aws ec2 describe-vpc-endpoints \
  --filters "Name=service-name,Values=com.amazonaws.*.sagemaker.runtime"

參考文獻

SageMaker 安全 Documentation -- Official 安全 best practices
SageMaker Network Isolation -- VPC and network isolation guidance
SageMaker Model Registry -- Model versioning and approval workflows

SageMaker 利用ation

識別 the 訓練資料 source

理解 the data format

Craft poisoned data

Inject and trigger

相關文章

SageMaker 利用ation

識別 the 訓練資料 source

理解 the data format

Craft poisoned data

Inject and trigger

相關文章