MLflow 攻擊 Surface
安全 analysis of MLflow: tracking server authentication weaknesses, artifact store access control, model registry tampering, SQL injection in tracking queries, and exploitation techniques for both open-source and managed deployments.
MLflow 攻擊 Surface
MLflow is the most widely deployed open-source ML lifecycle platform, with components for experiment tracking, model packaging, model registry, and deployment. Its open-source roots mean that 安全 was retrofitted rather than designed in. The default MLflow deployment has no 認證, no 授權, and no encryption -- and many production deployments retain these defaults.
Tracking Server Authentication
The Default: No Authentication
Open-source MLflow ships with no 認證 mechanism. The tracking server exposes a REST API on port 5000 that accepts any request without credentials. 這意味著 any client with network access can:
- Read all experiment metadata, parameters, metrics, and tags
- Download all logged artifacts (models, datasets, code)
- Create, modify, or delete experiments and runs
- Register models in the Model Registry
- Transition model versions through deployment stages
Authentication Options and Their Weaknesses
| Method | 實作 | Weakness |
|---|---|---|
| Basic auth plugin | MLflow 2.5+ includes mlflow-auth plugin | Single shared credentials, no RBAC, credentials in plaintext in client configs |
| Reverse proxy (nginx/Envoy) | HTTP basic auth or OAuth2 proxy in front of MLflow | MLflow itself unaware of identity, no per-user 授權 |
| 雲端 IAM | 雲端 provider identity for managed MLflow | Depends entirely on correct IAM policy configuration |
| Custom middleware | Organization-built 認證 layer | Varies; often bypassed by direct API access |
Bypass Techniques
Even when 認證 is configured, common bypass vectors include:
Direct 資料庫 access. MLflow stores tracking data in SQLite (default), MySQL, or PostgreSQL. If the 資料庫 is accessible (same machine, shared network, exposed port), 攻擊者 can query it directly, bypassing any API-level 認證.
Artifact store direct access. Artifacts are stored separately from tracking metadata -- on local filesystem, S3, GCS, Azure Blob, or HDFS. If the artifact store is accessible independently (a public S3 bucket, an NFS mount), API 認證 is irrelevant for artifact access.
Internal network exposure. Teams deploy MLflow on internal networks assuming network isolation provides 安全. VPN compromise, SSRF 漏洞 in other applications, or a compromised developer machine all provide network access.
Artifact Store Access Control
MLflow separates metadata (stored in the tracking 資料庫) from artifacts (stored in configurable blob storage). This separation creates a split-brain access control problem.
Common Artifact Store Configurations
| Store Type | Default Permissions | 攻擊 Vector |
|---|---|---|
| Local filesystem | OS file 權限 | Path traversal, symlink attacks |
| S3 | IAM-based | Overly permissive bucket policies, SSRF for credential theft |
| GCS | IAM-based | Service account key exposure, uniform bucket-level access misconfig |
| Azure Blob | RBAC or SAS 符元 | SAS 符元 leakage, overly permissive container access |
| HDFS | Kerberos or none | Often no 認證 in development clusters |
S3 Artifact Store 利用
The most common production configuration uses S3 for artifact storage. Typical weaknesses:
識別 the S3 bucket
The MLflow tracking server configuration or API response reveals the artifact store URI (e.g.,
s3://company-mlflow-artifacts/).測試 direct bucket access
Attempt to list and read objects using 攻擊者's own AWS credentials or from a compromised EC2 instance's metadata service. Organizations frequently grant overly broad S3 access to roles used by 訓練 jobs.
Download model artifacts
Model checkpoints, datasets, and configuration files are stored as S3 objects. Direct download bypasses any MLflow API-level access controls.
Upload poisoned artifacts
If write access exists, replace model checkpoints with poisoned versions. The MLflow tracking metadata still points to the same S3 path, so the substitution is invisible through the MLflow UI.
SQL Injection in Tracking Queries
MLflow's tracking API accepts search queries using a filter string syntax for experiments and runs. The 實作 translates these filter strings into SQL queries against the backing 資料庫.
Vulnerable Endpoints
The /api/2.0/mlflow/runs/search and /api/2.0/mlflow/experiments/search endpoints accept a filter_string parameter. While MLflow uses parameterized queries for most operations, custom filter expressions in certain configurations and versions have been vulnerable to injection.
攻擊 Patterns
Filter string manipulation. The filter syntax allows expressions like metrics.accuracy > 0.9 and params.model_name = 'gpt'. Injecting SQL through crafted parameter names or values can extract data beyond the intended query scope.
Tag-based injection. Run tags accept arbitrary string values. If tag values are not properly sanitized when used in filter queries, they become an injection vector.
import mlflow
# Set tracking URI to target
mlflow.set_tracking_uri("http://target-mlflow:5000")
# 測試 for SQL injection in filter string
# These are 偵測 payloads, not 利用
test_payloads = [
"params.name = '1' OR '1'='1'",
"params.name = '1'; SELECT 1--",
"metrics.accuracy > 0 UNION SELECT 1,2,3--",
"tags.mlflow.runName = '1' OR 1=1--"
]
for payload in test_payloads:
try:
runs = mlflow.search_runs(
experiment_ids=["0"],
filter_string=payload
)
print(f"Payload returned {len(runs)} results: {payload}")
except Exception as e:
print(f"Error with payload: {e}")Model Registry Tampering
The MLflow Model Registry manages model versions and their deployment lifecycle stages: None, Staging, Production, and Archived. Tampering with the registry is a direct path to deploying a malicious model.
Stage Transition 攻擊
Model versions move through stages via API calls. Without 認證, any client can:
- Transition a malicious model version to "Production" stage
- Transition the legitimate production model to "Archived"
- Modify model version descriptions to hide the change
- Delete model version tags that might trigger review
Version Confusion
The registry assigns incrementing version numbers. 攻擊者 can:
- Register a new model version with poisoned weights
- If pipelines reference "latest version" or "Production stage," the poisoned version is served
- The version history shows the new version as the latest legitimate addition
Model Name Squatting
MLflow's registry uses flat model names within a namespace. 攻擊者 with write access can:
- Register models with names similar to legitimate models (typosquatting)
- Register models with names that deployment pipelines might reference
- Overwrite model descriptions to redirect users to malicious resources
from mlflow.tracking import MlflowClient
client = MlflowClient("http://target-mlflow:5000")
# List all registered models
for model in client.search_registered_models():
print(f"Model: {model.name}")
for version in model.latest_versions:
print(f" v{version.version} | Stage: {version.current_stage}")
# Transition 攻擊者's model to Production
client.transition_model_version_stage(
name="production-llm",
version="42", # Attacker's poisoned version
stage="Production",
archive_existing_versions=True # Archives the legitimate version
)Managed MLflow 安全 Differences
Databricks Managed MLflow
Databricks provides MLflow as a managed service with significant 安全 improvements over open-source:
- Unity Catalog integration for fine-grained access control
- Workspace-level isolation
- Audit logging for all operations
- Network access controls (private endpoints, IP allowlists)
Remaining risks: Unity Catalog misconfigurations, overly permissive workspace access grants, service principal key leakage, cross-workspace access through shared storage.
Azure ML MLflow Integration
Azure ML provides an MLflow-compatible API with Azure Active Directory 認證:
- AAD-based identity and RBAC
- Managed identity for compute-to-tracking 認證
- Virtual network integration
Remaining risks: RBAC role assignments that are too broad, managed identity over-permissioning, artifact storage access through direct Azure Storage APIs bypassing ML workspace controls.
Reconnaissance and Enumeration
Identifying MLflow Instances
MLflow tracking servers can be discovered through:
- Port scanning for 5000/tcp (default port)
- HTTP response headers identifying MLflow
- API endpoint probing (
/api/2.0/mlflow/experiments/list) - 雲端 metadata services revealing MLflow URIs in environment variables
- CI/CD configuration files referencing
MLFLOW_TRACKING_URI
Information Gathering from an Exposed Instance
| API Endpoint | Information Revealed |
|---|---|
/api/2.0/mlflow/experiments/list | All experiment names and metadata |
/api/2.0/mlflow/runs/search | Run configurations, metrics, parameters |
/api/2.0/mlflow/artifacts/list | Artifact paths and structure |
/api/2.0/mlflow/registered-models/list | Production model registry |
/version | MLflow version (for CVE matching) |
Hardening Recommendations
Immediate Actions
- Enable 認證 using the built-in auth plugin or a reverse proxy
- Restrict network access to the tracking server and artifact store
- Audit artifact store 權限 -- ensure S3/GCS/Azure Blob access is scoped appropriately
- Update MLflow to the latest version to address known 漏洞
- Enable TLS for all tracking server communications
Ongoing 安全
- 實作 audit logging for all API operations
- Monitor for unusual query patterns (bulk data extraction, filter string anomalies)
- Regularly review model registry stage transitions
- Scan experiment logs for credential leakage
- Separate development and production MLflow instances
參考文獻
- MLflow 安全 Documentation -- Authentication plugin documentation
- MLflow REST API Reference -- Full API specification
- CVE entries for MLflow -- Known 漏洞
An organization deploys MLflow with nginx basic 認證 in front of the tracking server. The artifact store uses S3 with a broadly permissive IAM policy. Why does the nginx 認證 not fully protect the ML assets?