Simulation: Open Source AI Project Audit
Security audit simulation for an open-source AI application, covering code review, dependency analysis, model supply chain verification, and deployment configuration review.
Engagement Brief
Project: OpenChat-Plus -- an open-source AI-powered chat application with 15K GitHub stars Repository: Fictional GitHub repository with documented architecture Components: FastAPI backend, React frontend, LangChain orchestration, multiple model backends (OpenAI, Anthropic, local Ollama), PostgreSQL + pgvector for RAG, Redis for sessions Scope: Full source code review, dependency audit, model supply chain analysis, deployment configuration review Duration: 2 hours Deliverable: Security audit report with prioritized findings and recommended fixes
Project Structure
openchat-plus/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI application
│ │ ├── config.py # Configuration management
│ │ ├── auth/ # Authentication module
│ │ ├── chat/ # Chat endpoint handlers
│ │ ├── rag/ # RAG pipeline
│ │ ├── models/ # Model integration layer
│ │ └── utils/ # Utility functions
│ ├── requirements.txt
│ └── Dockerfile
├── frontend/
│ ├── src/
│ ├── package.json
│ └── Dockerfile
├── docker-compose.yml
├── .env.example
├── docs/
│ └── deployment.md
└── models/
└── download_models.sh
Phase 1: Code Review (45 minutes)
Configuration and secrets management
Review how the application handles sensitive configuration. This is the most common source of security issues in open-source projects.
# config.py - Review for security issues config_review_checklist = [ "Are API keys stored in environment variables (good) or hardcoded (bad)?", "Does .env.example contain real credentials or obvious defaults?", "Is there a .gitignore entry for .env files?", "Are secrets logged during startup or error conditions?", "Is there a config validation step that fails safely on missing values?", "Are default values secure (e.g., debug=False, CORS restricted)?", ]Common findings in open-source AI projects:
# FINDING: Hardcoded default API key # Severity: High # File: backend/app/config.py OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "sk-default-key-for-testing") # FINDING: Debug mode enabled by default # Severity: Medium # File: backend/app/config.py DEBUG = os.getenv("DEBUG", "true").lower() == "true" # FINDING: Overly permissive CORS # Severity: Medium # File: backend/app/main.py app.add_middleware(CORSMiddleware, allow_origins=["*"])Input handling and prompt construction
Review how user input flows into LLM prompts. This is the primary attack surface for prompt injection.
# Review the chat handler for injection vulnerabilities input_review_checklist = [ "Is user input sanitized before being included in prompts?", "Is the system prompt separated from user input?", "Are there input length limits enforced?", "Is user input escaped in structured prompts (JSON, XML)?", "Are file uploads validated before processing?", "Is there rate limiting on AI endpoints?", ]Common vulnerable patterns:
# FINDING: Direct string interpolation in prompt # Severity: Critical # File: backend/app/chat/handler.py prompt = f"System: {system_prompt}\nUser: {user_message}\nAssistant:" # FINDING: No input length validation # Severity: Medium # File: backend/app/chat/handler.py @app.post("/chat") async def chat(message: str): # No max_length constraint response = await llm.generate(message) # FINDING: User-controlled system prompt component # Severity: High # File: backend/app/chat/handler.py system = f"You are a helpful assistant for {user.company_name}." # user.company_name is user-supplied and unvalidatedAuthentication and authorization review
Review the auth system for weaknesses that could be exploited alongside AI vulnerabilities.
auth_review_checklist = [ "What authentication mechanism is used (JWT, session, API key)?", "Are tokens properly validated on every request?", "Is there role-based access control for admin functions?", "Are AI-specific endpoints behind authentication?", "Is rate limiting applied per-user or globally?", "Are conversation histories access-controlled per-user?", ]Common findings:
# FINDING: API key authentication with no rotation mechanism # Severity: Medium # File: backend/app/auth/api_key.py # FINDING: Admin endpoints accessible without authentication # Severity: Critical # File: backend/app/main.py @app.get("/admin/config") # No auth decorator async def get_config(): return {"model": settings.MODEL, "api_key": settings.API_KEY[:8] + "..."} # FINDING: Conversation history accessible by ID without ownership check # Severity: High # File: backend/app/chat/history.py @app.get("/chat/history/{conversation_id}") async def get_history(conversation_id: str): return db.get_conversation(conversation_id) # No user ownership checkRAG pipeline security review
Review the RAG implementation for injection and data leakage risks.
rag_review_checklist = [ "Are uploaded documents sanitized before embedding?", "Is there access control on the vector store (per-user/per-role)?", "Are retrieved documents filtered before being included in prompts?", "Is there a maximum number of retrieved chunks?", "Are metadata fields sanitized (filename, author, etc.)?", "Is the embedding model downloaded from a trusted source?", ]
Phase 2: Dependency Analysis (30 minutes)
Python dependency audit
Analyze the Python dependency tree for known vulnerabilities and suspicious packages.
dependency_audit_steps = [ # Step 1: Check for known vulnerabilities "pip audit -r requirements.txt", "safety check -r requirements.txt", # Step 2: Check for pinned vs unpinned versions # Unpinned: langchain (vulnerable to supply chain attacks) # Pinned: langchain==0.1.12 (reproducible, auditable) # Step 3: Check for typosquatting risks # langchain vs langchainn vs lang-chain # openai vs open-ai vs openaii # Step 4: Check for abandoned or unmaintained packages # Last commit date, open security issues, maintainer activity ]Common AI-specific dependency risks:
| Package | Risk | Severity | |---------|------|----------| | langchain (unpinned) | Rapid release cycle, breaking changes, new vulnerabilities | High | | transformers | Large attack surface, model loading executes arbitrary code | High | | pickle-based serialization | Arbitrary code execution on deserialization | Critical | | gradio (if present) | Known SSRF and path traversal vulnerabilities | High |Frontend dependency audit
Analyze JavaScript dependencies for supply chain risks.
frontend_audit_steps = [ "npm audit --production", "npx lockfile-lint --path package-lock.json --type npm --allowed-hosts npm", # Check for dependencies that handle AI responses in the frontend # XSS risk: if AI responses are rendered as HTML # Markdown rendering: check for sanitization ]Transitive dependency analysis
Check the full dependency tree, not just direct dependencies.
transitive_audit = [ "pip install pipdeptree && pipdeptree --warn silence", # Look for: # - Deep dependency chains (fragile supply chain) # - Dependencies with few maintainers # - Dependencies that pull in native code (C extensions) # - Dependencies that require network access during installation ]
Phase 3: Model Supply Chain (30 minutes)
Model provenance verification
Verify that models used by the application come from trusted sources.
model_supply_chain = { "download_script_review": [ "Does download_models.sh download from official sources (HuggingFace, official APIs)?", "Are model checksums verified after download?", "Is HTTPS used for all downloads?", "Are model versions pinned (specific commit hash, not 'latest')?", ], "model_loading_review": [ "Does the code use pickle to load models (arbitrary code execution risk)?", "Are safetensors used instead of pytorch .bin files?", "Is there integrity verification before loading?", "Can users specify arbitrary model paths or URLs?", ], "api_model_review": [ "Are API-based models (OpenAI, Anthropic) accessed through official SDKs?", "Is the API base URL configurable? (Proxy/MITM risk)", "Are API responses validated before processing?", ], }Common model supply chain findings:
# FINDING: Model downloaded without integrity verification # Severity: High # File: models/download_models.sh # wget https://huggingface.co/model/resolve/main/model.bin -O model.bin # No checksum verification # FINDING: Pickle deserialization for model loading # Severity: Critical # File: backend/app/models/local.py # model = torch.load("model.bin") # Executes arbitrary code in the pickle # FINDING: User-configurable model source # Severity: High # File: backend/app/config.py # MODEL_URL = os.getenv("MODEL_URL", "https://huggingface.co/...") # Users can point to any URL, including malicious model filesModel file analysis
If the project includes or downloads model files, analyze them for tampering.
model_file_checks = [ "Verify file sizes match expected values from official sources", "Compare SHA-256 hashes against published checksums", "Check for unexpected files in model archives (code files, scripts)", "Scan model config files for unusual settings", "Check if model files contain embedded code (pickle exploit)", ]
Phase 4: Deployment Configuration Review (15 minutes)
Docker and container security
Review containerization for security misconfigurations.
docker_review = { "dockerfile_checks": [ "Is the container running as root? (should use non-root user)", "Are secrets passed as build args? (visible in image layers)", "Is the base image pinned to a specific digest?", "Are unnecessary tools installed (curl, wget in production)?", "Is multi-stage build used to minimize attack surface?", ], "compose_checks": [ "Are ports exposed unnecessarily?", "Are volumes mounted with write access that should be read-only?", "Are environment variables with secrets in docker-compose.yml?", "Is the network configuration appropriate (internal vs external)?", "Are resource limits set (CPU, memory)?", ], }Common findings:
# FINDING: Container running as root # Severity: Medium # File: backend/Dockerfile # No USER directive -- runs as root by default # FINDING: Secrets in docker-compose.yml # Severity: High # File: docker-compose.yml # environment: # - OPENAI_API_KEY=sk-real-key-here # FINDING: Database exposed to host network # Severity: High # File: docker-compose.yml # ports: # - "5432:5432" # PostgreSQL accessible from hostDeployment documentation review
Check whether the deployment guide leads users toward secure or insecure configurations.
docs_review = [ "Does the deployment guide mention HTTPS configuration?", "Are security-critical environment variables documented?", "Does the guide warn against using default credentials?", "Is there guidance on network security (firewall, VPN)?", "Are backup and update procedures documented?", "Is there a security policy (SECURITY.md) for vulnerability reporting?", ]
Audit Report Template
# Security Audit Report: OpenChat-Plus
## Project Information
- **Repository:** [URL]
- **Version audited:** [commit hash]
- **Audit date:** 2026-03-15
- **Auditor:** [Name]
## Executive Summary
[Overall security posture and critical risks]
## Findings Summary
| # | Finding | Severity | Category | File(s) |
|---|---------|----------|----------|---------|
| 1 | [Title] | Critical | Code | [path] |
## Detailed Findings
[Standard finding format with code snippets and fix recommendations]
## Dependency Audit Results
[Vulnerability scan results, supply chain risks]
## Model Supply Chain Assessment
[Model provenance, integrity, loading security]
## Deployment Security
[Container, configuration, and documentation findings]
## Recommendations
### For Maintainers
[Code fixes, CI/CD pipeline improvements, security processes]
### For Users/Deployers
[Secure deployment checklist, configuration hardening]Related Topics
- Supply Chain Security for AI -- AI supply chain threat models
- Code Review for AI Applications -- AI-focused code review techniques
- Dependency Security -- Dependency analysis methods
- Container Security -- Container hardening for AI workloads
Why is pickle-based model loading considered a critical security risk in open-source AI projects?