Skip to main content
AI 紅隊維基 AI 基礎設施安全 MLflow 安全 Hardening MLflow 安全 Hardening Beginner 12 min readUpdated 2026-03-20 Securing MLflow deployments against unauthorized access, experiment tampering, and model registry poisoning.
MLflow is the most widely adopted open-source platform for managing the machine learning lifecycle. It provides experiment tracking, model packaging, a model registry, and deployment tools. Organizations use MLflow to track 訓練 runs, compare model performance, store model artifacts, and manage model versions from development through production.
The 安全 problem with MLflow is that it was designed as a data science productivity tool, not as a 安全-critical system. The default deployment has no 認證, no 授權, and exposes a REST API that allows anyone with network access to read all experiments, modify model artifacts, register new models, and transition models to production. 這是 not a hypothetical concern — Protect AI's huntr bug bounty program has documented multiple critical 漏洞 in MLflow, and internet-facing MLflow instances with no 認證 remain common.
This article covers the 攻擊面 of MLflow deployments, the specific hardening steps required to secure them, and the 紅隊 techniques for assessing MLflow 安全. The 漏洞 described here map to OWASP LLM Top 10 2025 LLM06 (Excessive Agency) when MLflow is integrated into automated deployment pipelines, and to MITRE ATLAS AML.T0010 (ML Supply Chain Compromise).
MLflow consists of several components, each with its own 攻擊面:
Component Purpose Default Exposure Risk Tracking Server Records experiment parameters, metrics, artifacts HTTP API on port 5000 Unauthenticated read/write Model Registry Stores and versions trained models Via Tracking Server API Model replacement/投毒 Artifact Store Stores model files, datasets, logs S3, GCS, Azure Blob, or local filesystem Direct access to model files Backend Store Metadata 資料庫 (SQLite, MySQL, PostgreSQL) Depends on deployment SQL injection (in older versions) MLflow UI Web dashboard for experiment visualization Same port as Tracking Server
Out of the box, mlflow server starts with no 認證:
# 這是 how most MLflow tutorials start — completely open
mlflow server --host 0.0.0.0 --port 5000
# Anyone on the network can now:
# - Read all experiments and runs
# - Modify any experiment data
# - Upload malicious model artifacts
# - Transition any model to "Production" stage
# - Delete experiments and runs
The MLflow REST API is fully functional without any credentials:
import requests
from typing import Dict, List, Any
class MLflowSecurityScanner :
"""Scan an MLflow deployment for 安全 misconfigurations."""
def __init__ ( self , mlflow_url : str ):
self .base_url = mlflow_url. rstrip ( "/" )
def check_authentication ( self ) -> Dict[ str , Any]:
"""測試 if the MLflow API requires 認證."""
endpoints = [
"/api/2.0/mlflow/experiments/search" ,
"/api/2.0/mlflow/registered-models/search" ,
"/api/2.0/mlflow/runs/search" ,
]
results = { "authenticated"
MLflow 2.5+ includes a built-in 認證 plugin. Enable it by starting the server with the --app-name flag:
# Enable basic 認證
mlflow server \
--host 0.0.0.0 \
--port 5000 \
--app-name basic-auth \
--backend-store-uri postgresql://mlflow:password@db:5432/mlflow \
--default-artifact-root s3://mlflow-artifacts/
# Create an admin user
mlflow server create-admin \
--username admin \
--password "$( openssl rand -base64 32 )"
Configure 授權 with the basic_auth.ini file:
[mlflow]
default_permission = READ
admin_username = admin
admin_password = changeme
authorization_function = mlflow.server.auth:authenticate_request
database_uri = sqlite:///basic_auth.db
For production deployments, use a reverse proxy (NGINX, Envoy, or a 雲端 load balancer) with proper 認證:
# /etc/nginx/sites-available/mlflow
server {
listen 443 ssl;
server_name mlflow.internal.company.com;
ssl_certificate /etc/ssl/certs/mlflow.crt;
ssl_certificate_key /etc/ssl/private/mlflow.key;
# Require client certificate 認證
ssl_client_certificate /etc/ssl/certs/ca.crt;
ssl_verify_client on ;
# Rate limiting
limit_req_zone $ binary_remote_addr zone=mlflow:10m rate=10r/s;
location / {
limit_req zone=mlflow burst=20 nodelay;
# Forward to MLflow server
proxy_pass http://127.0.0.1:5000;
proxy_set_header Host $ host ;
proxy_set_header X-Real-IP $ remote_addr ;
proxy_set_header X-Forwarded-For $
For organizations using SSO, integrate MLflow with an identity provider:
"""
範例: MLflow 認證 middleware using OAuth2.
Place in a custom MLflow plugin or reverse proxy.
"""
from functools import wraps
from flask import request, jsonify
import requests
from typing import Optional
class OAuth2Middleware :
"""OAuth2 認證 middleware for MLflow."""
def __init__ ( self , issuer_url : str , client_id : str , required_scopes : list ):
self .issuer_url = issuer_url
self .client_id = client_id
self .required_scopes = required_scopes
The MLflow model registry is a critical target 因為 it is often integrated directly into deployment pipelines. If 攻擊者 can register a malicious model or transition a poisoned model to the "Production" stage, that model may be automatically deployed.
import mlflow
import hashlib
import json
from pathlib import Path
from typing import Dict, Optional
class ModelRegistryGuard :
"""安全 controls for the MLflow model registry."""
def __init__ ( self , tracking_uri : str , allowed_signers : list ):
mlflow. set_tracking_uri (tracking_uri)
self .allowed_signers = allowed_signers
def compute_model_hash ( self , model_uri : str ) -> str :
"""Compute SHA-256 hash of a registered model's artifacts."""
local_path
The artifact store holds the actual model files, datasets, and other artifacts. Depending on the backend, this could be a local filesystem, S3 bucket, GCS bucket, or Azure Blob Storage. The artifact store must be secured independently of the MLflow tracking server:
import boto3
from typing import Dict, List
class ArtifactStoreAuditor :
"""Audit the 安全 of MLflow artifact storage backends."""
def audit_s3_bucket ( self , bucket_name : str ) -> Dict:
"""Audit an S3 bucket used for MLflow artifacts."""
s3 = boto3. client ( "s3" )
findings = []
# Check bucket policy
try :
policy = s3. get_bucket_policy ( Bucket = bucket_name)
findings. append ({
"check" : "bucket_policy" ,
"status" :
MLflow should never be directly accessible from the internet. Place it behind a VPN or private network:
#!/bin/bash
# Firewall rules to restrict MLflow access (iptables example)
# Allow access only from internal networks
iptables -A INPUT -p tcp --dport 5000 -s 10.0.0.0/8 -j ACCEPT
iptables -A INPUT -p tcp --dport 5000 -s 172.16.0.0/12 -j ACCEPT
iptables -A INPUT -p tcp --dport 5000 -j DROP
# Log blocked access attempts
iptables -A INPUT -p tcp --dport 5000 -j LOG --log-prefix "MLFLOW_BLOCKED: "
Always use TLS for MLflow communication, especially when the artifact store or backend store is on a separate host:
# Start MLflow with TLS
mlflow server \
--host 0.0.0.0 \
--port 5000 \
--gunicorn-opts "--certfile=/etc/ssl/certs/mlflow.crt --keyfile=/etc/ssl/private/mlflow.key" \
--backend-store-uri postgresql://mlflow:password@db:5432/mlflow \
--default-artifact-root s3://mlflow-artifacts/
MLflow does not provide detailed audit logging by default. 實作 a custom logging solution:
import logging
import json
from datetime import datetime, timezone
from functools import wraps
from typing import Callable, Any
class MLflowAuditLogger :
"""Audit logger for MLflow operations."""
def __init__ ( self , log_file : str = "/var/log/mlflow/audit.json" ):
self .logger = logging. getLogger ( "mlflow.audit" )
handler = logging. FileHandler (log_file)
handler. setFormatter (logging. Formatter ( " %(message)s " ))
self .logger.
MLflow has had several significant 安全 漏洞 discovered through responsible disclosure:
CVE-2023-6831 : Path traversal 漏洞 in MLflow allowing arbitrary file read via the artifact download API. 攻擊者 could read any file on the MLflow server by crafting a malicious artifact path.
CVE-2024-27132 : Remote code execution via MLflow recipes. Crafted recipe configurations could execute arbitrary Python code on the server.
CVE-2023-6977 : Path traversal in the MLflow artifact upload endpoint allowing file writes outside the artifact directory.
These CVEs demonstrate that MLflow's 安全 posture requires active 監控 and prompt patching. Subscribe to MLflow's 安全 advisories and maintain an upgrade cadence.
Enable 認證 immediately — never run MLflow without 認證 in any environment, including development
Use a reverse proxy with TLS termination, rate limiting, and additional 認證 layers
Restrict network access to MLflow to internal networks only; never expose to the internet
Secure the artifact store independently with encryption, access controls, and versioning
實作 audit logging for all model registry operations, especially stage transitions
Verify model integrity with checksums before any model is promoted to production
Patch regularly — MLflow has had critical CVEs; monitor 安全 advisories
Use a dedicated 資料庫 (PostgreSQL/MySQL) instead of the default SQLite for the backend store
實作 RBAC to restrict who can register models, transition stages, and delete experiments
MLflow Documentation — https://mlflow.org/docs/latest/
CVE-2023-6831 — MLflow path traversal file read 漏洞
CVE-2024-27132 — MLflow remote code execution via recipes
CVE-2023-6977 — MLflow path traversal in artifact upload
Protect AI huntr — https://huntr.com/ — bug bounty platform where many MLflow 漏洞 were reported
MITRE ATLAS — AML.T0010 (ML Supply Chain Compromise)
OWASP LLM Top 10 2025 — LLM06 (Excessive Agency)
Related articles BeginnerMLflow Security Hardening Securing MLflow deployments against unauthorized access, experiment tampering, and model registry poisoning.
Intermediate攻擊ing Experiment Tracking Systems Techniques for exploiting experiment tracking platforms like MLflow, Weights & Biases, Neptune, and CometML, including data exfiltration, metric manipulation, experiment injection, and leveraging tracking metadata for reconnaissance.
Intermediate實驗追蹤安全 ML 實驗追蹤系統中的安全風險:會被記錄什麼、哪些是敏感內容,以及追蹤平台為何成為攻擊者尋求智財與管線存取的高價值目標。
IntermediateML Experiment Tracking 安全 Securing experiment tracking systems like MLflow, Weights & Biases, and Neptune.
BeginnerAI 基礎設施安全 AI 基礎設施安全顧慮的概覽,涵蓋模型供應鏈、API 安全、部署架構,以及 ML 系統的獨特攻擊面。
No CSRF protection by default
:
True
,
"open_endpoints"
: []}
for endpoint in endpoints:
try :
resp = requests. get (
f " { self .base_url }{ endpoint } " ,
timeout = 10 ,
# No credentials provided
)
if resp.status_code == 200 :
results[ "authenticated" ] = False
results[ "open_endpoints" ]. append (endpoint)
except requests.RequestException:
pass
return results
def enumerate_experiments ( self ) -> List[Dict]:
"""Enumerate all accessible experiments."""
resp = requests. post (
f " { self .base_url } /api/2.0/mlflow/experiments/search" ,
json = { "max_results" : 1000 },
timeout = 10 ,
)
if resp.status_code == 200 :
return resp. json (). get ( "experiments" , [])
return []
def enumerate_registered_models ( self ) -> List[Dict]:
"""Enumerate all registered models in 模型 registry."""
resp = requests. get (
f " { self .base_url } /api/2.0/mlflow/registered-models/search" ,
params = { "max_results" : 1000 },
timeout = 10 ,
)
if resp.status_code == 200 :
return resp. json (). get ( "registered_models" , [])
return []
def check_artifact_access ( self , run_id : str ) -> Dict[ str , Any]:
"""Check if artifacts can be accessed and modified."""
# List artifacts
resp = requests. get (
f " { self .base_url } /api/2.0/mlflow/artifacts/list" ,
params = { "run_id" : run_id},
timeout = 10 ,
)
if resp.status_code == 200 :
artifacts = resp. json (). get ( "files" , [])
return {
"accessible" : True ,
"artifact_count" : len (artifacts),
"artifacts" : [a. get ( "path" ) for a in artifacts[: 10 ]],
}
return { "accessible" : False }
def full_scan ( self ) -> Dict[ str , Any]:
"""Run a comprehensive 安全 scan."""
results = {
"target" : self .base_url,
"認證" : self . check_authentication (),
}
if not results[ "認證" ][ "authenticated" ]:
experiments = self . enumerate_experiments ()
results[ "experiments" ] = {
"count" : len (experiments),
"names" : [e. get ( "name" ) for e in experiments[: 20 ]],
}
models = self . enumerate_registered_models ()
results[ "registered_models" ] = {
"count" : len (models),
"names" : [m. get ( "name" ) for m in models[: 20 ]],
}
return results
proxy_add_x_forwarded_for
;
proxy_set_header X-Forwarded-Proto $ scheme ;
# 安全 headers
add_header X-Content-Type-Options nosniff;
add_header X-Frame-Options DENY;
add_header Content-安全-Policy "default-src 'self'" ;
}
# Block direct access to artifact download endpoints from external networks
location /api/2.0/mlflow/artifacts/ {
# Only allow from internal CIDR
allow 10.0.0.0/8;
deny all ;
proxy_pass http://127.0.0.1:5000;
}
}
self
.jwks_uri
=
f
"
{
issuer_url
}
/.well-known/jwks.json"
def validate_token ( self , 符元 : str ) -> Optional[ dict ]:
"""Validate an OAuth2 access 符元 via introspection."""
resp = requests. post (
f " { self .issuer_url } /oauth2/introspect" ,
data = { "符元" : 符元, "client_id" : self .client_id},
timeout = 5 ,
)
if resp.status_code == 200 :
token_info = resp. json ()
if token_info. get ( "active" ):
return token_info
return None
def require_auth ( self , f ):
"""Decorator to require OAuth2 認證."""
@wraps ( f )
def decorated (* args , ** kwargs ):
auth_header = request.headers. get ( "Authorization" , "" )
if not auth_header. startswith ( "Bearer " ):
return jsonify ({ "error" : "Missing bearer 符元" }), 401
符元 = auth_header[ 7 :]
token_info = self . validate_token (符元)
if not token_info:
return jsonify ({ "error" : "Invalid 符元" }), 401
# Check required scopes
token_scopes = set (token_info. get ( "scope" , "" ). split ())
if not set ( self .required_scopes). issubset (token_scopes):
return jsonify ({ "error" : "Insufficient scopes" }), 403
return f (*args, **kwargs)
return decorated
=
mlflow.artifacts.
download_artifacts
(model_uri)
sha256 = hashlib. sha256 ()
for file_path in sorted ( Path (local_path). rglob ( "*" )):
if file_path. is_file ():
with open (file_path, "rb" ) as f:
for chunk in iter ( lambda : f. read ( 8192 ), b "" ):
sha256. update (chunk)
return sha256. hexdigest ()
def verify_model_before_promotion (
self ,
model_name : str ,
version : int ,
expected_hash : Optional[ str ] = None ,
) -> Dict:
"""Verify a model's integrity before promoting to production."""
client = mlflow.tracking. MlflowClient ()
model_version = client. get_model_version (model_name, str (version))
checks = { "model" : model_name, "version" : version, "passed" : True , "checks" : []}
# Check 1: Verify 模型 was created by an authorized user
run = client. get_run (model_version.run_id)
creator = run.info.user_id
if creator not in self .allowed_signers:
checks[ "passed" ] = False
checks[ "checks" ]. append ({
"check" : "authorized_creator" ,
"status" : "FAIL" ,
"detail" : f "Model created by unauthorized user: { creator } " ,
})
else :
checks[ "checks" ]. append ({ "check" : "authorized_creator" , "status" : "PASS" })
# Check 2: Verify model artifact integrity
if expected_hash:
model_uri = f "models:/ { model_name } / { version } "
actual_hash = self . compute_model_hash (model_uri)
if actual_hash != expected_hash:
checks[ "passed" ] = False
checks[ "checks" ]. append ({
"check" : "artifact_integrity" ,
"status" : "FAIL" ,
"detail" : f "Hash mismatch: expected { expected_hash } , got { actual_hash } " ,
})
else :
checks[ "checks" ]. append ({ "check" : "artifact_integrity" , "status" : "PASS" })
# Check 3: Verify model was logged from an approved experiment
experiment = client. get_experiment (run.info.experiment_id)
checks[ "checks" ]. append ({
"check" : "experiment_source" ,
"status" : "INFO" ,
"detail" : f "Model from experiment: { experiment.name } " ,
})
# Check 4: Check for suspicious tags or metadata
suspicious_tags = [ "pickle" , "exec" , "eval" , "subprocess" , "os.system" ]
model_tags = model_version.tags or {}
for tag_key, tag_value in model_tags. items ():
for suspicious in suspicious_tags:
if suspicious in str (tag_value). lower ():
checks[ "passed" ] = False
checks[ "checks" ]. append ({
"check" : "suspicious_metadata" ,
"status" : "FAIL" ,
"detail" : f "Suspicious content in tag ' { tag_key } ': contains ' { suspicious } '" ,
})
return checks
"INFO"
,
"detail" : "Bucket policy exists — review for overly permissive access" ,
})
except s3.exceptions. from_code ( "NoSuchBucketPolicy" ):
findings. append ({
"check" : "bucket_policy" ,
"status" : "WARNING" ,
"detail" : "No bucket policy — access controlled only by IAM" ,
})
# Check public access block
try :
public_access = s3. get_public_access_block ( Bucket = bucket_name)
config = public_access[ "PublicAccessBlockConfiguration" ]
all_blocked = all ([
config. get ( "BlockPublicAcls" , False ),
config. get ( "IgnorePublicAcls" , False ),
config. get ( "BlockPublicPolicy" , False ),
config. get ( "RestrictPublicBuckets" , False ),
])
findings. append ({
"check" : "public_access_block" ,
"status" : "PASS" if all_blocked else "FAIL" ,
"detail" : "All public access blocked" if all_blocked else "Public access not fully blocked" ,
})
except Exception:
findings. append ({
"check" : "public_access_block" ,
"status" : "FAIL" ,
"detail" : "Could not verify public access block settings" ,
})
# Check encryption
try :
encryption = s3. get_bucket_encryption ( Bucket = bucket_name)
findings. append ({
"check" : "encryption" ,
"status" : "PASS" ,
"detail" : "Server-side encryption enabled" ,
})
except Exception:
findings. append ({
"check" : "encryption" ,
"status" : "FAIL" ,
"detail" : "Server-side encryption not configured" ,
})
# Check versioning (important for rollback after 投毒)
versioning = s3. get_bucket_versioning ( Bucket = bucket_name)
status = versioning. get ( "Status" , "Disabled" )
findings. append ({
"check" : "versioning" ,
"status" : "PASS" if status == "Enabled" else "WARNING" ,
"detail" : f "Versioning: { status } " ,
})
return { "bucket" : bucket_name, "findings" : findings}
addHandler
(handler)
self .logger. setLevel (logging. INFO )
def log_event (
self ,
action : str ,
user : str ,
resource_type : str ,
resource_id : str ,
details : dict = None ,
source_ip : str = None ,
) -> None :
"""Log an audit event."""
event = {
"timestamp" : datetime. now (timezone.utc). isoformat (),
"action" : action,
"user" : user,
"resource_type" : resource_type,
"resource_id" : resource_id,
"source_ip" : source_ip,
"details" : details or {},
}
self .logger. info (json. dumps (event))
def audit_model_transition (
self ,
user : str ,
model_name : str ,
version : str ,
from_stage : str ,
to_stage : str ,
source_ip : str = None ,
) -> None :
"""Log a model stage transition — critical for 供應鏈 安全."""
self . log_event (
action = "model_transition" ,
user = user,
resource_type = "registered_model_version" ,
resource_id = f " { model_name } /v { version } " ,
details = {
"from_stage" : from_stage,
"to_stage" : to_stage,
"alert" : to_stage. lower () == "production" ,
},
source_ip = source_ip,
)