Supply Chain Risks in AI Code Generation
Analysis of supply chain attack vectors introduced by AI code generation tools, including dependency confusion, typosquatting, and training data poisoning.
Overview
AI code generation tools introduce a new dimension to software supply chain risk. Traditional supply chain attacks target the dependencies that human developers choose. AI code generation adds several new vectors: models can hallucinate packages that do not exist (creating opportunities for typosquatting), suggest outdated or vulnerable dependency versions, and propagate insecure patterns learned from compromised training data.
This article examines the intersection of AI code generation and supply chain security, covering the specific attack vectors that emerge when LLMs recommend, import, and configure software dependencies.
The AI Supply Chain Threat Model
Traditional vs. AI-Augmented Supply Chain Risks
Traditional software supply chain attacks require the attacker to compromise or impersonate a real dependency. AI code generation changes the equation in two fundamental ways:
-
Hallucinated packages: LLMs can suggest packages that have never existed. An attacker who registers these hallucinated package names can execute code on any developer who follows the AI's suggestion.
-
Statistical pattern reproduction: LLMs suggest dependencies based on patterns in training data, not on current security assessments. A package that was popular during the training window but is now known-vulnerable will continue to be suggested.
from dataclasses import dataclass
from enum import Enum
from typing import Optional
class SupplyChainVector(Enum):
HALLUCINATED_PACKAGE = "hallucinated_package"
TYPOSQUATTED_DEPENDENCY = "typosquatted_dependency"
VULNERABLE_VERSION = "vulnerable_version"
DEPRECATED_PACKAGE = "deprecated_package"
TRAINING_DATA_POISONING = "training_data_poisoning"
COMPROMISED_TRANSITIVE = "compromised_transitive_dependency"
@dataclass
class AISupplyChainRisk:
vector: SupplyChainVector
description: str
ai_specific: bool # Is this risk unique to AI code generation?
detection_difficulty: str
example: str
AI_SUPPLY_CHAIN_RISKS = [
AISupplyChainRisk(
vector=SupplyChainVector.HALLUCINATED_PACKAGE,
description="LLM suggests a package name that does not exist on any registry",
ai_specific=True,
detection_difficulty="easy", # pip install fails, but attacker can register first
example="LLM suggests 'pip install flask-security-utils' - package doesn't exist",
),
AISupplyChainRisk(
vector=SupplyChainVector.TYPOSQUATTED_DEPENDENCY,
description="LLM suggests a misspelled package name that is a malicious typosquat",
ai_specific=False, # Humans also typosquat, but LLMs do it more consistently
detection_difficulty="medium",
example="LLM suggests 'requets' instead of 'requests'",
),
AISupplyChainRisk(
vector=SupplyChainVector.VULNERABLE_VERSION,
description="LLM suggests a dependency version with known CVEs",
ai_specific=True, # LLMs are frozen in time, always suggest training-era versions
detection_difficulty="easy", # Version scanners catch this
example="LLM suggests 'langchain==0.0.171' which has CVE-2023-29374",
),
AISupplyChainRisk(
vector=SupplyChainVector.DEPRECATED_PACKAGE,
description="LLM suggests a package that has been deprecated or abandoned",
ai_specific=True,
detection_difficulty="medium",
example="LLM suggests 'pycrypto' instead of 'pycryptodome'",
),
AISupplyChainRisk(
vector=SupplyChainVector.TRAINING_DATA_POISONING,
description="Attacker poisons training data to make LLMs consistently suggest a malicious package",
ai_specific=True,
detection_difficulty="hard",
example="Attacker creates many GitHub repos importing 'evil-utils', LLM learns the pattern",
),
AISupplyChainRisk(
vector=SupplyChainVector.COMPROMISED_TRANSITIVE,
description="LLM suggests a package whose transitive dependencies include compromised packages",
ai_specific=False,
detection_difficulty="hard",
example="Suggested package has deep dependency tree with compromised leaf node",
),
]Package Hallucination Attacks
The Hallucination-to-Exploitation Pipeline
Research has demonstrated that LLMs consistently hallucinate specific package names. An attacker can study which fake packages LLMs suggest, register those names on PyPI or npm, and wait for developers to install them.
The attack pipeline works as follows:
- Discovery: Query multiple LLMs with common coding tasks and collect the package names they suggest.
- Verification: Check which suggested names do not correspond to real packages.
- Registration: Register the hallucinated package names on public registries.
- Payload: Include malicious code in the registered package (data exfiltration, reverse shell, crypto mining).
- Waiting: Developers using AI coding tools will be directed to install the attacker's package.
import subprocess
import json
from typing import Optional
def check_package_exists(package_name: str, registry: str = "pypi") -> dict:
"""Check if a package exists on a public registry."""
if registry == "pypi":
result = subprocess.run(
["pip", "index", "versions", package_name],
capture_output=True, text=True,
)
exists = result.returncode == 0
return {
"package": package_name,
"registry": "pypi",
"exists": exists,
"versions": result.stdout.strip() if exists else None,
}
elif registry == "npm":
result = subprocess.run(
["npm", "view", package_name, "version"],
capture_output=True, text=True,
)
exists = result.returncode == 0
return {
"package": package_name,
"registry": "npm",
"exists": exists,
"version": result.stdout.strip() if exists else None,
}
return {"package": package_name, "error": f"Unknown registry: {registry}"}
# Known examples of hallucinated package names from research
# Source: "Can LLMs Generate Malicious Code?" - security researchers
# have documented consistent hallucination patterns
KNOWN_HALLUCINATION_CATEGORIES = [
{
"pattern": "Plausible compound names",
"examples": [
"flask-security-utils",
"django-auth-helpers",
"react-data-validator",
],
"why_llms_hallucinate": (
"LLMs combine real package naming patterns "
"(flask-*, django-*, react-*) with common suffixes"
),
},
{
"pattern": "Outdated or renamed packages",
"examples": [
"sklearn", # Real name is scikit-learn
"cv2", # Real package is opencv-python
"yaml", # Real package is pyyaml
],
"why_llms_hallucinate": (
"LLMs confuse import names with package names "
"(import cv2 vs pip install opencv-python)"
),
},
{
"pattern": "Platform-specific packages",
"examples": [
"python-dotenv-cli",
"node-fetch-polyfill",
],
"why_llms_hallucinate": (
"LLMs merge concepts from different ecosystems"
),
},
]
def scan_ai_generated_requirements(requirements_file: str) -> list[dict]:
"""Scan a requirements file for potentially hallucinated packages."""
findings = []
with open(requirements_file) as f:
for line_num, line in enumerate(f, 1):
line = line.strip()
if not line or line.startswith("#"):
continue
# Extract package name (before any version specifier)
package_name = line.split("==")[0].split(">=")[0].split("<=")[0].split("~=")[0].strip()
# Check if package exists
result = check_package_exists(package_name)
if not result.get("exists"):
findings.append({
"line": line_num,
"package": package_name,
"status": "NOT FOUND on PyPI",
"risk": "high",
"recommendation": "Verify package name - may be hallucinated by AI",
})
else:
# Check package age and popularity indicators
findings.append({
"line": line_num,
"package": package_name,
"status": "exists",
"risk": "low",
"note": "Verify this is the intended package",
})
return findingsProactive Defense: Package Name Monitoring
Organizations can proactively monitor for hallucinated package registrations:
import hashlib
import time
from datetime import datetime, timedelta
class PackageHallucinationMonitor:
"""Monitor package registries for hallucinated package registrations."""
def __init__(self, organization_packages: list[str]):
self.org_packages = organization_packages
self.watch_patterns = self._generate_watch_patterns()
def _generate_watch_patterns(self) -> list[str]:
"""Generate package names that LLMs might hallucinate based on org packages."""
patterns = []
common_suffixes = [
"-utils", "-helpers", "-tools", "-core", "-cli",
"-common", "-extra", "-plus", "-pro", "-lite",
]
common_prefixes = [
"python-", "py-", "node-", "go-",
]
for pkg in self.org_packages:
# Typosquats
for i in range(len(pkg)):
# Character swap
if i < len(pkg) - 1:
swapped = list(pkg)
swapped[i], swapped[i + 1] = swapped[i + 1], swapped[i]
patterns.append("".join(swapped))
# Compound names
for suffix in common_suffixes:
patterns.append(f"{pkg}{suffix}")
for prefix in common_prefixes:
patterns.append(f"{prefix}{pkg}")
return list(set(patterns))
def check_new_registrations(self) -> list[dict]:
"""Check if any watched package names have been newly registered."""
alerts = []
for pattern in self.watch_patterns:
result = check_package_exists(pattern)
if result.get("exists"):
alerts.append({
"package": pattern,
"status": "REGISTERED",
"action": "Investigate - may be a supply chain attack",
"timestamp": datetime.utcnow().isoformat(),
})
return alertsVulnerable Version Suggestion
The Training Data Cutoff Problem
LLMs have a training data cutoff. They suggest dependency versions that were current during their training window. This means they systematically recommend versions with known CVEs that were discovered after the cutoff.
# Real-world examples of LLMs suggesting vulnerable versions
VULNERABLE_VERSION_EXAMPLES = [
{
"package": "langchain",
"version_suggested": "0.0.171",
"cve": "CVE-2023-29374",
"vulnerability": "Arbitrary code execution via LLMMathChain",
"fixed_in": "0.0.172",
"cvss": 9.8,
"cwe": "CWE-94",
},
{
"package": "langchain",
"version_suggested": "0.0.194",
"cve": "CVE-2023-36258",
"vulnerability": "Arbitrary code execution via PALChain",
"fixed_in": "0.0.235",
"cvss": 9.8,
"cwe": "CWE-94",
},
{
"package": "pytorch",
"version_suggested": "1.13.1",
"cve": "CVE-2023-45802",
"vulnerability": "Deserialization of untrusted data in torch.load",
"fixed_in": "2.0.0",
"cvss": 7.8,
"cwe": "CWE-502",
},
{
"package": "transformers",
"version_suggested": "4.28.0",
"cve": "CVE-2023-40590",
"vulnerability": "Arbitrary code execution via deserialization",
"fixed_in": "4.30.0",
"cvss": 8.8,
"cwe": "CWE-502",
},
]
def check_suggested_versions(
suggestions: list[dict],
) -> list[dict]:
"""Check AI-suggested package versions against known vulnerabilities.
In production, this would query the OSV database or similar.
"""
findings = []
# Simulated check against known vulnerable versions
vuln_db = {
(v["package"], v["version_suggested"]): v
for v in VULNERABLE_VERSION_EXAMPLES
}
for suggestion in suggestions:
key = (suggestion["package"], suggestion["version"])
if key in vuln_db:
vuln = vuln_db[key]
findings.append({
"package": suggestion["package"],
"suggested_version": suggestion["version"],
"vulnerability": vuln["cve"],
"description": vuln["vulnerability"],
"fixed_in": vuln["fixed_in"],
"severity": "critical" if vuln["cvss"] >= 9.0 else "high",
"action": f"Update to >= {vuln['fixed_in']}",
})
return findingsTraining Data Poisoning for Supply Chain Attacks
Attack Methodology
An attacker can influence what packages LLMs suggest by contaminating the training data. This is a slow, strategic attack:
- Create malicious package: Register a package with a plausible name on PyPI/npm.
- Generate training signal: Create many GitHub repositories, blog posts, and Stack Overflow answers that import or recommend the package.
- Wait for training: The next generation of LLMs will learn the pattern and suggest the package.
- Harvest: Developers using AI coding tools receive suggestions to install the malicious package.
# Modeling the training data poisoning attack economics
from dataclasses import dataclass
@dataclass
class PoisoningCampaign:
"""Model of a supply chain poisoning campaign via training data."""
target_package_name: str
legitimate_sounding_purpose: str
github_repos_created: int
blog_posts_created: int
stackoverflow_answers: int
estimated_cost_usd: float
estimated_time_months: float
expected_installs_per_month: int
# Example campaigns and their economics
EXAMPLE_CAMPAIGNS = [
PoisoningCampaign(
target_package_name="flask-session-encrypt",
legitimate_sounding_purpose="Flask session encryption middleware",
github_repos_created=50,
blog_posts_created=10,
stackoverflow_answers=20,
estimated_cost_usd=500, # Mostly time cost
estimated_time_months=3,
expected_installs_per_month=100,
),
]
def estimate_campaign_reach(campaign: PoisoningCampaign) -> dict:
"""Estimate the reach and impact of a poisoning campaign."""
# Each GitHub repo adds ~1 training signal
# Each blog post adds ~5 training signals (indexed by multiple crawlers)
# Each SO answer adds ~10 training signals (high crawl frequency)
total_signals = (
campaign.github_repos_created * 1
+ campaign.blog_posts_created * 5
+ campaign.stackoverflow_answers * 10
)
return {
"total_training_signals": total_signals,
"cost_per_signal": campaign.estimated_cost_usd / total_signals,
"estimated_monthly_victims": campaign.expected_installs_per_month,
"cost_per_victim": (
campaign.estimated_cost_usd / campaign.expected_installs_per_month
),
"assessment": "economically viable" if campaign.estimated_cost_usd < 1000 else "expensive",
}CI/CD Pipeline Defenses
Automated Detection
import subprocess
import json
from pathlib import Path
class AISupplyChainScanner:
"""CI/CD scanner for AI-introduced supply chain risks."""
def __init__(self, project_path: str):
self.project_path = Path(project_path)
def scan_requirements(self) -> list[dict]:
"""Scan Python requirements for supply chain risks."""
findings = []
req_files = list(self.project_path.glob("**/requirements*.txt"))
req_files.extend(self.project_path.glob("**/setup.cfg"))
req_files.extend(self.project_path.glob("**/pyproject.toml"))
for req_file in req_files:
findings.extend(self._check_file(req_file))
return findings
def _check_file(self, filepath: Path) -> list[dict]:
findings = []
content = filepath.read_text()
for line_num, line in enumerate(content.split("\n"), 1):
line = line.strip()
if not line or line.startswith("#"):
continue
# Check for unpinned versions
if ">=" in line and "==" not in line:
findings.append({
"file": str(filepath),
"line": line_num,
"issue": "unpinned_version",
"detail": f"'{line}' uses >= instead of ==",
"risk": "AI may have suggested latest-at-training-time version",
"fix": "Pin to exact version with hash verification",
})
# Check for no version at all
pkg_name = line.split("==")[0].split(">=")[0].split("[")[0].strip()
if pkg_name and "==" not in line and ">=" not in line and "<" not in line:
findings.append({
"file": str(filepath),
"line": line_num,
"issue": "no_version_specified",
"detail": f"'{pkg_name}' has no version constraint",
"risk": "Any version will be installed, including malicious ones",
"fix": f"Pin version: {pkg_name}==<specific_version>",
})
return findings
def scan_package_json(self) -> list[dict]:
"""Scan package.json for npm supply chain risks."""
findings = []
pkg_files = list(self.project_path.glob("**/package.json"))
for pkg_file in pkg_files:
if "node_modules" in str(pkg_file):
continue
with open(pkg_file) as f:
try:
pkg = json.load(f)
except json.JSONDecodeError:
continue
for dep_type in ("dependencies", "devDependencies"):
deps = pkg.get(dep_type, {})
for name, version in deps.items():
# Check for wildcard versions
if version in ("*", "latest", ""):
findings.append({
"file": str(pkg_file),
"package": name,
"version": version,
"issue": "wildcard_version",
"risk": "critical",
})
# Check for suspicious package names
if len(name) <= 2:
findings.append({
"file": str(pkg_file),
"package": name,
"issue": "suspiciously_short_name",
"risk": "medium",
})
return findings
def generate_report(self) -> dict:
"""Generate a complete supply chain risk report."""
python_findings = self.scan_requirements()
npm_findings = self.scan_package_json()
return {
"scan_date": datetime.utcnow().isoformat(),
"project_path": str(self.project_path),
"python_findings": python_findings,
"npm_findings": npm_findings,
"total_issues": len(python_findings) + len(npm_findings),
"critical_issues": sum(
1 for f in python_findings + npm_findings
if f.get("risk") == "critical"
),
}Semgrep Rules for AI-Generated Dependencies
# Semgrep rule to detect common AI code generation supply chain patterns
SEMGREP_SUPPLY_CHAIN_RULES = """
rules:
- id: ai-hallucinated-import-pattern
patterns:
- pattern: import $PACKAGE
- metavariable-regex:
metavariable: $PACKAGE
regex: ^(flask_security_utils|django_auth_helpers|react_data_validator)$
message: >
This import matches a known AI-hallucinated package pattern.
Verify the package exists and is the intended dependency.
languages: [python]
severity: WARNING
- id: ai-suggested-vulnerable-langchain
patterns:
- pattern: langchain==$VERSION
- metavariable-regex:
metavariable: $VERSION
regex: ^0\\.0\\.(1[0-7][0-9]|[0-9]{1,2})$
message: >
LangChain versions before 0.0.172 are vulnerable to CVE-2023-29374
(arbitrary code execution). AI tools commonly suggest outdated versions.
languages: [generic]
severity: ERROR
- id: ai-unpinned-dependency
pattern: pip install $PACKAGE
message: >
Unpinned pip install detected. AI-generated install commands often
omit version pins. Pin to a specific verified version.
languages: [bash]
severity: WARNING
"""Mitigation Framework
| Layer | Control | Implementation |
|---|---|---|
| Development | Dependency review | Human review of all AI-suggested dependencies |
| Development | .aiderignore / .cursorignore | Exclude sensitive dependency files from AI context |
| CI | Package verification | Verify all packages exist and match expected checksums |
| CI | Version pinning enforcement | Reject unpinned or range-specified dependencies |
| CI | Vulnerability scanning | Run pip-audit, npm audit, or Snyk on every PR |
| Registry | Namespace monitoring | Monitor for registrations similar to your package names |
| Training | Data poisoning awareness | Track package suggestion patterns across model versions |
References
- "An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models" — Yan et al., 2024 — https://arxiv.org/abs/2402.00187
- CVE-2023-29374 — LangChain arbitrary code execution via LLMMathChain — https://nvd.nist.gov/vuln/detail/CVE-2023-29374
- CVE-2023-36258 — LangChain arbitrary code execution via PALChain — https://nvd.nist.gov/vuln/detail/CVE-2023-36258
- OWASP Top 10 for LLM Applications 2025 — LLM05: Supply Chain Vulnerabilities — https://genai.owasp.org/llmrisk/
- "Typosquatting in Package Registries" — MITRE ATLAS Technique AML.T0010 — https://atlas.mitre.org/
- CWE-494: Download of Code Without Integrity Check — https://cwe.mitre.org/data/definitions/494.html