ML CI/CD Security
Security overview of ML continuous integration and deployment pipelines: how ML CI/CD differs from traditional CI/CD, unique attack surfaces in training workflows, and the security implications of automated model building and deployment.
ML CI/CD Security
ML CI/CD pipelines automate the flow from code commit to trained model to production deployment. They inherit all the security challenges of traditional CI/CD -- secret management, build isolation, artifact integrity -- and add ML-specific concerns: training data validation, model behavioral testing, and the management of extremely expensive compute resources. A compromised ML pipeline does not just deploy bad code; it deploys a model whose behavior cannot be fully predicted or audited.
ML CI/CD Pipeline Architecture
A typical ML CI/CD pipeline extends the traditional software pipeline with ML-specific stages:
Traditional CI/CD Stages (Inherited)
| Stage | ML Context | Security Concern |
|---|---|---|
| Source control | Training code, configs, pipeline definitions | Code injection, unauthorized modifications |
| Dependency installation | ML frameworks, custom packages | Supply chain attacks, dependency confusion |
| Build | Container image creation with ML frameworks | Image poisoning, layer caching attacks |
| Test | Unit tests for data processing code | Insufficient coverage of ML-specific code paths |
| Deploy | Model serving infrastructure provisioning | Infrastructure misconfiguration |
ML-Specific Stages (Added)
| Stage | Purpose | Security Concern |
|---|---|---|
| Data validation | Verify training data integrity and quality | Data poisoning bypass, schema exploitation |
| Training | Execute model training on GPU infrastructure | Compute hijacking, checkpoint manipulation |
| Evaluation | Run benchmarks and behavioral tests | Evaluation gaming, insufficient safety testing |
| Model registration | Store trained model in registry | Artifact substitution, unsigned registration |
| Safety gates | Automated safety and bias checks | Gate bypass, threshold manipulation |
| Canary deployment | Gradual traffic shifting to new model | Canary metric manipulation |
How ML CI/CD Differs from Traditional CI/CD
Execution Time and Cost
Traditional CI/CD pipelines run in minutes. ML training pipelines run for hours to weeks and consume thousands of dollars of GPU time per run. This has direct security implications:
- Compute hijacking is profitable. Cryptocurrency mining on GPU infrastructure during training jobs is a real and growing threat.
- Rerunning is expensive. If a pipeline is compromised, re-running training to produce a clean model costs significant time and money.
- Long execution windows. Multi-day training runs provide extended windows for attackers to modify training data, inject code, or exfiltrate artifacts.
Data as a First-Class Input
Traditional CI/CD processes code and configuration. ML CI/CD also processes data, which introduces:
- Data provenance tracking requirements
- Training data validation as a pipeline stage
- Data access permissions that span the pipeline
- The possibility of data-level attacks (poisoning) that have no equivalent in software CI/CD
Non-Deterministic Outputs
Software builds are deterministic: the same source produces the same binary. ML training is stochastic: the same data and code produce different models. This means:
- Artifact verification cannot rely on reproducible builds
- Model comparison requires behavioral testing, not binary comparison
- Rollback requires keeping old model artifacts, not just old source code
Artifact Size and Transfer
Model artifacts range from megabytes to hundreds of gigabytes. Moving these artifacts through a CI/CD pipeline introduces:
- Long transfer times creating windows for man-in-the-middle attacks
- Large storage requirements for artifact versioning
- Network bandwidth as a limiting factor for pipeline speed
- Integrity verification overhead for hashing large files
Pipeline Platforms and Their Security Models
Common ML Pipeline Platforms
| Platform | Isolation Model | Authentication | Audit Logging |
|---|---|---|---|
| GitHub Actions | Container-based; self-hosted runners often shared | OIDC, PATs | GitHub audit log |
| GitLab CI | Docker executors, Kubernetes runners | Job tokens, OIDC | GitLab audit events |
| Kubeflow Pipelines | Kubernetes pods per step | Kubernetes RBAC | Kubernetes audit log |
| AWS SageMaker Pipelines | Container per step, IAM roles | IAM, VPC | CloudTrail |
| Vertex AI Pipelines | Container per step, service accounts | IAM | Cloud Audit Logs |
| Azure ML Pipelines | Compute instances, managed endpoints | AAD, managed identity | Activity log |
| Airflow/Dagster | Variable (local, Docker, Kubernetes) | Plugin-dependent | Operator-dependent |
Self-Hosted Runner Risks
ML pipelines commonly use self-hosted runners for GPU access. Self-hosted runners introduce significant risks:
- Shared state between jobs. GPU memory, local filesystems, and cached dependencies persist between jobs unless explicitly cleaned.
- Elevated privileges. GPU access requires device mounts or privileged containers.
- Network exposure. Runners on internal networks access training data stores, model registries, and cloud services.
- Long-lived credentials. Runners cache cloud credentials for data access.
Secret Management in ML Pipelines
ML pipelines require access to numerous credentials:
| Secret | Used For | Common Mismanagement |
|---|---|---|
| Cloud credentials (AWS, GCP, Azure) | Data access, compute provisioning | Overly broad IAM roles, long-lived keys |
| Model registry tokens | Push/pull model artifacts | Shared tokens, no rotation |
| Experiment tracking API keys | Log metrics and artifacts | Keys in source code or environment |
| Data warehouse credentials | Training data access | Static credentials in pipeline configs |
| Hugging Face tokens | Download base models | Tokens with unnecessary write access |
| Container registry credentials | Pull training containers | Registry-wide access from pipeline tokens |
Secret Exposure Vectors
- Pipeline logs. Training frameworks print configuration including credentials unless explicitly suppressed.
- Experiment tracking. Auto-logging captures environment variables containing secrets.
- Cached layers. Docker image layers may contain secrets from build arguments.
- Shared filesystems. Credentials written to shared storage by one pipeline step are readable by all steps.
- Error messages. Failed API calls may log the request including authentication headers.
Compute Resource Security
GPU Infrastructure Threats
| Threat | Impact | Mitigation |
|---|---|---|
| Cryptomining | Financial loss, training delays | Resource monitoring, egress filtering |
| GPU memory scavenging | Data leakage from previous tenants | Memory clearing between jobs |
| Model exfiltration | IP theft via compute access | Egress restrictions, DLP |
| Training manipulation | Backdoored models | Job isolation, integrity checks |
Resource Allocation Attacks
An attacker with pipeline access can:
- Launch expensive training jobs for cryptomining
- Reserve GPU resources to deny service to legitimate training
- Modify resource limits to cause out-of-memory failures at critical training stages
- Schedule jobs that exhaust compute quotas, blocking production model retraining
Pipeline-Level Defenses
Isolation Controls
- Run each pipeline step in an ephemeral container with no persistent state
- Clear GPU memory between training jobs
- Use separate service accounts per pipeline stage with minimum required permissions
- Implement network segmentation between pipeline stages
Integrity Controls
- Sign all pipeline definitions and prevent unauthorized modifications
- Verify dependencies at each step (hash verification, version pinning)
- Validate training data integrity before starting training
- Sign model artifacts immediately after training completes
Access Controls
- Restrict who can modify pipeline definitions
- Require approval for pipeline changes that affect production
- Use short-lived credentials (OIDC tokens) instead of long-lived secrets
- Implement break-glass procedures for emergency pipeline modifications
Monitoring Controls
- Alert on unusual compute usage patterns
- Monitor pipeline execution times for anomalies
- Track data access patterns during training jobs
- Log all artifact operations (upload, download, modification)
Related Topics
- Pipeline Poisoning -- Specific pipeline attack techniques
- Artifact Integrity -- Ensuring ML artifact authenticity
- Deployment Gates -- Security checkpoints in ML deployment
Why is compute hijacking a more significant threat in ML CI/CD pipelines than in traditional software CI/CD?