SageMaker 利用ation
Red team attack methodology for Amazon SageMaker: endpoint exploitation, notebook instance attacks, training job manipulation, model artifact tampering, and VPC misconfigurations in ML workloads.
SageMaker 利用
Amazon SageMaker provides the full ML lifecycle -- from data labeling and notebook-based development to distributed 訓練, model hosting, and MLOps automation. This breadth creates a large 攻擊面. Unlike Bedrock where 模型 infrastructure is fully managed, SageMaker gives customers control over compute instances, container images, 訓練 scripts, and deployment configurations. Every point of customer control is a point of potential misconfiguration.
Endpoint 利用
Endpoint Enumeration
SageMaker endpoints host trained models for real-time 推論. Each endpoint runs one or more model variants on dedicated compute instances.
# List all active endpoints
aws sagemaker list-endpoints --status-equals InService \
--query 'Endpoints[].{Name:EndpointName,Created:CreationTime}' --輸出 table
# Get endpoint details
aws sagemaker describe-endpoint --endpoint-name <name>
# Get endpoint configuration (reveals instance types, model artifacts)
aws sagemaker describe-endpoint-config --endpoint-config-name <config-name>
# Get model details (reveals container image and model artifact location)
aws sagemaker describe-model --model-name <model-name>Model Extraction Through Inference
With sagemaker:InvokeEndpoint access, systematic querying can extract 模型's behavior:
import boto3, json
client = boto3.client('sagemaker-runtime')
def query_endpoint(payload):
response = client.invoke_endpoint(
EndpointName='production-model',
ContentType='application/json',
Body=json.dumps(payload)
)
return json.loads(response['Body'].read())
# Systematic querying to map model behavior
# For classification models: query with crafted inputs to map decision boundaries
# For generative models: extract 訓練資料 through targeted promptingEndpoint Exposure
SageMaker endpoints are accessed through the AWS API (HTTPS via the SageMaker runtime SDK). 然而, common misconfigurations expose them more broadly:
| Misconfiguration | Risk | 偵測 |
|---|---|---|
| Lambda proxy without auth | Public API Gateway routes to SageMaker | Check API Gateway configurations for SageMaker integrations |
| Overprivileged endpoint role | Endpoint IAM role with S3/DynamoDB access | Describe the endpoint execution role |
| Missing VPC endpoint policy | Any principal in the VPC can invoke | Check VPC endpoint policy for sagemaker.runtime |
| Cross-account access | Resource policies allowing external accounts | Review model and endpoint resource policies |
Notebook Instance 攻擊
Notebook as Pivot Point
SageMaker notebook instances are Jupyter servers running on EC2 instances with an IAM role attached. They are commonly the weakest link in the ML pipeline 因為:
- Persistent compute with credentials: The notebook IAM role is accessible via the instance metadata service
- Development environment: Data scientists install arbitrary packages, download external data, and run untrusted code
- Network access: Notebooks often have access to both public internet and private VPC resources
- Long-lived: Notebooks frequently run for weeks or months without patching
Credential Theft
# From a compromised notebook instance
# Access the IAM role credentials
curl http://169.254.169.254/latest/meta-data/iam/安全-credentials/
curl http://169.254.169.254/latest/meta-data/iam/安全-credentials/<role-name>
# Check what the notebook role can do
aws sts get-caller-identity
aws iam list-attached-role-policies --role-name <notebook-role>
# Common notebook role overprivileges:
# - s3:* (access all 訓練資料, model artifacts)
# - sagemaker:* (full SageMaker control)
# - bedrock:* (invoke any model)
# - logs:* (read all CloudWatch logs)Lifecycle Configuration 攻擊
SageMaker notebook lifecycle configurations run shell scripts on notebook start and creation. If 攻擊者 can modify these scripts, they achieve persistent code execution:
# List lifecycle configurations
aws sagemaker list-notebook-instance-lifecycle-configs
# Get the script content
aws sagemaker describe-notebook-instance-lifecycle-config \
--notebook-instance-lifecycle-config-name <name>
# The script is base64-encoded. Decode to review:
# Look for credential handling, network configuration,
# package installations from untrusted sourcesNotebook Content Extraction
Notebooks (.ipynb files) stored on the instance or in connected Git repositories often contain:
- 資料庫 connection strings and credentials
- API keys for external services
- 訓練資料 samples
- Model architecture details and hyperparameters
- Internal service URLs and endpoints
- Data pipeline configurations
Training Job Manipulation
Training Data Access
Training jobs consume data from S3, EFS, or FSx. Identifying and accessing 訓練資料 is a high-priority target:
# List recent 訓練 jobs
aws sagemaker list-訓練-jobs --sort-by CreationTime --sort-order Descending
# Get 訓練 job details including data source
aws sagemaker describe-訓練-job --訓練-job-name <name>
# Look for InputDataConfig[].DataSource.S3DataSource.S3Uri
# and OutputDataConfig.S3OutputPathTraining Pipeline Poisoning
識別 the 訓練資料 source
From the 訓練 job configuration, extract the S3 URI containing 訓練資料. Check for write access to this bucket or prefix.
理解 the data format
Download sample 訓練資料 to 理解 the format (CSV, JSON, JSONL, Parquet, etc.) and schema. The poisoned data must match this format exactly.
Craft poisoned data
Create data that introduces the desired model behavior. For classification models, inject mislabeled examples. For generative models, inject examples containing the target behavior (e.g., 後門 triggers).
Inject and trigger
Write poisoned data to the 訓練 S3 prefix. If 訓練 is scheduled, wait for the next run. If manual, social engineering or waiting for the next retraining cycle is required.
Training Container 攻擊
SageMaker 訓練 jobs run in Docker containers. Organizations use either AWS-provided containers or custom containers from ECR. Custom container attacks:
- Container image tampering: Replace the 訓練 container in ECR with a modified version that exfiltrates 訓練資料 or installs a 後門 in the trained model
- Dependency confusion: If the container installs packages at runtime, inject malicious packages through dependency confusion attacks
- Resource abuse: Training instances have GPU access; a compromised 訓練 container can use these GPUs for cryptocurrency mining or other compute-intensive attacks
Model Artifact Tampering
Artifact Storage
Trained models are stored as artifacts (typically model.tar.gz) in S3. The artifact contains model weights, configuration files, and sometimes 推論 code.
# Find model artifacts
aws sagemaker describe-model --model-name <name>
# Look for PrimaryContainer.ModelDataUrl
# Download and inspect
aws s3 cp s3://bucket/path/model.tar.gz .
tar xzf model.tar.gz
# Inspect model files, code/ directory, 推論.pyModel Replacement
If write access to 模型 artifact S3 location is available, the production model can be replaced:
- Download the existing model artifact
- Modify 模型 (change weights, inject 後門 in 推論 code)
- Upload the modified artifact to the same S3 location
- The next endpoint update or deployment picks up the modified model
Model Registry 攻擊
SageMaker Model Registry tracks model versions with approval workflows. 攻擊 vectors:
- Approve unapproved models: If IAM allows
sagemaker:UpdateModelPackage, change model status fromPendingManualApprovaltoApproved - Register backdoored models: Register a new model version pointing to a tampered artifact
- Model group hijacking: Create model versions in existing model groups, potentially displacing legitimate models in automated pipelines
VPC Misconfigurations
Network Isolation Failures
SageMaker supports VPC deployment for network isolation, but common misconfigurations undermine it:
| Component | Misconfiguration | Impact |
|---|---|---|
| Notebook instances | Direct internet access enabled | Data exfiltration, C2 communication |
| Training jobs | No VPC configuration | 訓練資料 accessed over public internet |
| Endpoints | Public subnet with public IP | Endpoint accessible from the internet |
| VPC endpoints | Missing or overpermissive policies | Unauthorized service access within VPC |
Inter-Service Communication
SageMaker components communicate with other AWS services (S3, ECR, CloudWatch). Without VPC endpoints, this traffic traverses the public internet. With VPC endpoints but without restrictive policies, any principal in the VPC can access these services.
# Check if notebook has direct internet access
aws sagemaker describe-notebook-instance --notebook-instance-name <name>
# DirectInternetAccess: "Enabled" is a finding
# Check VPC endpoint policies
aws ec2 describe-vpc-endpoints \
--filters "Name=service-name,Values=com.amazonaws.*.sagemaker.runtime"相關主題
- AWS AI Services 概覽 -- Service landscape and enumeration
- IAM for AI Services -- IAM patterns and misconfigurations
- Infrastructure & Supply Chain -- 供應鏈 attack techniques applicable to ML containers
- AI Cost & Billing 攻擊 -- GPU compute abuse through SageMaker
You have compromised a SageMaker notebook instance and extracted the IAM role credentials. Which action would provide the most persistent impact on the ML pipeline?
A SageMaker model artifact stored in S3 contains a code/ directory with 推論.py. Why is write access to this S3 location a critical finding?
參考文獻
- SageMaker 安全 Documentation -- Official 安全 best practices
- SageMaker Network Isolation -- VPC and network isolation guidance
- SageMaker Model Registry -- Model versioning and approval workflows