July 2026: Supply Chain Audit Challenge
Audit an ML project's entire supply chain for security issues including dependencies, model provenance, data pipelines, training infrastructure, and deployment artifacts.
July 2026: Supply Chain Audit Challenge
Audit the complete supply chain of an ML project -- from training data and dependencies to model artifacts and deployment configuration. This challenge requires breadth across multiple security domains and develops the systematic thinking needed for comprehensive security assessments.
Challenge Overview
| Detail | Value |
|---|---|
| Duration | July 1--31, 2026 |
| Difficulty | Advanced |
| Estimated Time | 12--18 hours |
| Prerequisites | Software supply chain security concepts, Python packaging, ML pipeline familiarity, container security basics |
| Max Points | 100 |
Scenario
You have been given access to the complete source repository for "SentiBot," a sentiment analysis service. The repository contains everything needed to train, evaluate, package, and deploy the model. Your task is to audit this project for supply chain security issues at every level.
The project is structured as a typical ML application:
sentibot/
├── data/
│ ├── raw/ # Training data sources
│ ├── processed/ # Preprocessed datasets
│ └── scripts/ # Data collection and processing scripts
├── models/
│ ├── base/ # Base model checkpoints
│ ├── fine-tuned/ # Fine-tuned model artifacts
│ └── configs/ # Training configurations
├── src/
│ ├── training/ # Training code
│ ├── inference/ # Inference server code
│ ├── preprocessing/ # Data preprocessing pipeline
│ └── evaluation/ # Evaluation scripts
├── deploy/
│ ├── docker/ # Dockerfiles
│ ├── k8s/ # Kubernetes manifests
│ ├── terraform/ # Infrastructure as code
│ └── ci/ # CI/CD pipeline configs
├── tests/ # Test suite
├── requirements.txt # Python dependencies
├── setup.py # Package configuration
├── pyproject.toml # Build configuration
└── README.md
Audit Categories
Category 1: Dependency Security (25 points)
Audit all dependencies of the project for known vulnerabilities and supply chain risks.
1.1 Python Dependencies (10 points)
- Examine
requirements.txtandpyproject.tomlfor pinned vs. unpinned dependencies - Check for known vulnerabilities (CVEs) in declared dependencies
- Identify dependencies from unusual or potentially compromised sources
- Look for typosquatting risks in package names
- Check for dependencies that have been abandoned or have concerning maintainer changes
1.2 Container Dependencies (8 points)
- Audit Dockerfiles for base image security (unpinned tags, outdated images, unnecessary packages)
- Check for secrets baked into container layers
- Evaluate the container build process for injection points
- Review multi-stage build hygiene (leaking build artifacts into runtime images)
1.3 Infrastructure Dependencies (7 points)
- Review Terraform modules for pinned versions and source integrity
- Audit Kubernetes manifests for security misconfigurations (privileged containers, host network access, missing resource limits)
- Check CI/CD pipeline configs for injection vulnerabilities and secret handling
Category 2: Data Provenance (20 points)
Audit the training data pipeline for integrity and provenance issues.
2.1 Data Source Verification (10 points)
- Trace every training data source to its origin
- Verify that data collection scripts fetch from expected sources with integrity checks
- Check for data sources that could be manipulated by an adversary (public URLs without checksums, user-contributed content, scraped websites)
- Look for data that has been included without proper licensing
2.2 Data Pipeline Security (10 points)
- Review preprocessing scripts for injection vulnerabilities (e.g.,
eval()calls on data content, pickle deserialization of untrusted data) - Check for data validation gaps that would allow poisoned samples to enter the training set
- Verify that processed data checksums are validated before training
- Look for race conditions or TOCTOU issues in the data pipeline
Category 3: Model Security (25 points)
Audit the model artifacts and training process for security issues.
3.1 Base Model Provenance (10 points)
- Verify the source and integrity of the base model checkpoint
- Check how the base model was downloaded (pinned hash vs. mutable URL)
- Look for model serialization format risks (pickle-based formats that allow arbitrary code execution)
- Verify that model files have not been tampered with since download
3.2 Training Process Security (8 points)
- Review training configurations for settings that could introduce vulnerabilities (e.g., learning rates that allow fast overwriting of safety training)
- Check for logging of training metrics that could indicate data poisoning
- Verify that training runs are reproducible from declared inputs
- Look for training shortcuts that bypass validation or evaluation steps
3.3 Model Export and Serving (7 points)
- Audit the model export process for injection points
- Check that exported models are in safe serialization formats (safetensors vs. pickle)
- Verify model signing or integrity checking in the deployment pipeline
- Review the inference server for deserialization vulnerabilities
Category 4: Deployment Security (20 points)
Audit the deployment pipeline and runtime configuration.
4.1 CI/CD Pipeline (10 points)
- Review CI/CD configuration for command injection vulnerabilities
- Check for secrets in pipeline logs or environment variables
- Verify that pipeline runs use pinned tool versions
- Look for missing security gates (no vulnerability scanning, no model validation before deployment)
- Check for pull request pipeline triggers that could execute attacker-controlled code
4.2 Runtime Configuration (10 points)
- Review Kubernetes manifests for security context misconfigurations
- Check for exposed management interfaces or debug endpoints
- Verify that secrets are managed through a proper secrets manager (not hardcoded or in environment variables)
- Review network policies for proper segmentation
- Check for overly permissive IAM roles or service account permissions
Category 5: Bonus Findings (10 points)
Additional points for:
- Discovering issues that do not fit neatly into the categories above
- Demonstrating attack chains that combine multiple supply chain weaknesses
- Providing a risk-prioritized remediation roadmap
- Identifying systemic issues in the project's security practices
Audit Methodology
Phase 1: Inventory (2--3 hours)
Before looking for vulnerabilities, build a complete inventory:
- Bill of Materials. List every dependency, tool, and external resource used by the project.
- Data flow diagram. Map how data moves from sources through processing, training, and into the deployed model.
- Trust boundaries. Identify where the project trusts external inputs (downloaded models, fetched data, user-contributed code).
Phase 2: Automated Scanning (2--3 hours)
Use automated tools to find known issues:
| Tool Category | What It Finds | Limitations |
|---|---|---|
| Dependency scanners (pip-audit, safety) | Known CVEs in Python packages | Only finds published vulnerabilities |
| Container scanners (Trivy, Grype) | OS and language package vulnerabilities in containers | Does not audit application logic |
| IaC scanners (Checkov, tfsec) | Misconfigurations in Terraform and Kubernetes | Rule-based, misses novel issues |
| SAST (Bandit, Semgrep) | Code-level security issues | High false positive rate, misses ML-specific issues |
Phase 3: Manual Review (6--8 hours)
Automated tools miss the most interesting supply chain issues. Manual review should focus on:
- Trust decisions. Every place the code trusts an external input is a potential vulnerability. Review each one.
- Serialization boundaries. Data crossing serialization boundaries (pickle, JSON, protobuf) is a common attack vector.
- Build reproducibility. Can you reproduce the same model artifact from the same inputs? If not, something in the pipeline is non-deterministic and potentially manipulable.
- Access control gaps. Who can modify each component? A dependency that anyone can publish a new version of is different from one maintained by a trusted organization.
Phase 4: Reporting (2--3 hours)
Organize findings by risk level and provide actionable remediation:
## Finding: [Title]
**Risk Level:** Critical/High/Medium/Low
**Category:** Dependency/Data/Model/Deployment
**CVSS:** [Score if applicable]
### Description
[What the issue is and where it exists]
### Evidence
[Specific files, lines, or configurations that demonstrate the issue]
### Attack Scenario
[How an adversary could exploit this issue]
### Remediation
[Specific steps to fix the issue, with code examples where helpful]
### Priority
[Why this should be fixed now/soon/eventually]Planted Vulnerabilities
The repository contains a mix of realistic vulnerabilities:
- Obvious issues that any security scanner would catch (5--6 findings)
- Subtle issues that require manual review to identify (8--10 findings)
- ML-specific issues that traditional security tools miss entirely (4--5 findings)
- Red herrings that look suspicious but are not actually exploitable (2--3 instances)
Part of the challenge is distinguishing real vulnerabilities from false positives and prioritizing findings by actual risk.
Scoring
| Category | Points | What Earns Full Marks |
|---|---|---|
| Dependency Security | 25 | All planted dependency issues found with severity assessment |
| Data Provenance | 20 | Complete data flow audit with provenance gaps identified |
| Model Security | 25 | Model artifact integrity issues and serialization risks found |
| Deployment Security | 20 | CI/CD and runtime misconfigurations documented |
| Bonus | 10 | Attack chains, prioritized remediation, systemic analysis |
Why Supply Chain Auditing Is Different for ML
Traditional software supply chain auditing focuses on code dependencies and build pipelines. ML projects have additional surfaces that traditional tools do not cover:
Data as a Dependency
In traditional software, dependencies are libraries and frameworks -- versioned, published, and (ideally) reviewed. In ML, training data is a dependency with none of these properties:
- Training data is often unversioned. The same data source may change over time without any record.
- Data provenance is rarely tracked. The chain of custody from original source to training input is usually undocumented.
- Data integrity is hard to verify. A poisoned training sample looks identical to a legitimate one without knowing the correct label.
This challenge forces you to think about data with the same rigor you apply to code dependencies: where did it come from, how was it verified, and what happens if it is compromised?
Models as Build Artifacts
A trained model is the ML equivalent of a compiled binary. Like a binary, it is opaque -- you cannot easily inspect its behavior by reading its weights. Unlike a binary, it cannot be reproducibly built from source in most cases:
- Non-deterministic training means the same code and data may produce different models.
- Base models downloaded from public repositories are trusted on reputation, not on verifiable provenance.
- Model serialization formats (especially pickle-based formats) can contain arbitrary executable code.
The GPU Trust Boundary
ML training and inference runs on GPU hardware that introduces its own trust considerations:
- GPU drivers and CUDA libraries are part of the trusted computing base but are rarely audited.
- Shared GPU environments (cloud instances, shared clusters) may leak information between tenants through shared memory.
- GPU-specific optimizations (mixed precision, quantization) alter model behavior in ways that may have security implications.
Connecting to Professional Practice
Supply chain auditing is one of the most sought-after skills in AI security. Organizations deploying ML systems need to answer questions like:
- "Can we trust this model from Hugging Face?"
- "What happens if our training data provider is compromised?"
- "Are our CI/CD pipelines secure against an insider threat?"
- "How do we verify that the model in production is the model we trained?"
This challenge develops the systematic thinking and technical skills needed to answer these questions for real organizations.
Further Reading
- Infrastructure & Supply Chain -- supply chain security fundamentals
- LLMOps Security -- operational security for ML pipelines
- Fine-Tuning Security -- security of fine-tuning processes
- August 2026 Challenge -- the next challenge