Methodology for Auditing AI-Generated Code

intermediate14 min readUpdated 2026-03-20

Structured audit methodology for evaluating the security of AI-generated code, covering static analysis, dynamic testing, and organizational assessment.

code-gen-security audit-methodology security-assessment code-review

Overview

Auditing AI-generated code requires a different approach than traditional code review. AI-generated code has characteristic vulnerability patterns, lacks the contextual understanding that human developers bring, and is produced at a volume that makes manual review impractical. Organizations need a structured methodology that combines automated tooling with targeted manual review, applied at the right points in the development lifecycle.

This article presents a comprehensive audit methodology for AI-generated code, covering identification, static analysis, dynamic testing, and organizational assessment. The methodology is designed to be repeatable, scalable, and integrated into existing security programs.

Audit Framework Overview

Phase Model

The AI code generation audit methodology consists of five phases:

from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
 
class AuditPhase(Enum):
    IDENTIFICATION = "1_identification"
    STATIC_ANALYSIS = "2_static_analysis"
    DYNAMIC_TESTING = "3_dynamic_testing"
    ORGANIZATIONAL = "4_organizational"
    REPORTING = "5_reporting"
 
@dataclass
class AuditPhaseDetail:
    phase: AuditPhase
    objective: str
    activities: list[str]
    tools: list[str]
    outputs: list[str]
    estimated_hours: float
 
AUDIT_PHASES = [
    AuditPhaseDetail(
        phase=AuditPhase.IDENTIFICATION,
        objective="Identify which code was AI-generated and which tools were used",
        activities=[
            "Interview developers about AI tool usage patterns",
            "Analyze git history for AI-generated commits",
            "Review CI/CD logs for AI tool integration",
            "Map AI tool configurations (.cursorrules, CLAUDE.md, .aiderignore)",
            "Identify high-risk code areas (auth, crypto, input handling)",
        ],
        tools=["git", "custom scripts", "developer interviews"],
        outputs=[
            "AI code generation inventory",
            "Tool usage map",
            "High-risk area identification",
        ],
        estimated_hours=4.0,
    ),
    AuditPhaseDetail(
        phase=AuditPhase.STATIC_ANALYSIS,
        objective="Detect vulnerability patterns characteristic of AI-generated code",
        activities=[
            "Run Semgrep with AI-specific rule sets",
            "Run CodeQL for dataflow analysis",
            "Run Bandit for Python security issues",
            "Check for known AI vulnerability patterns",
            "Scan for license compliance issues",
        ],
        tools=["Semgrep", "CodeQL", "Bandit", "ScanCode"],
        outputs=[
            "Static analysis findings report",
            "Vulnerability inventory with severity ratings",
            "License compliance report",
        ],
        estimated_hours=8.0,
    ),
    AuditPhaseDetail(
        phase=AuditPhase.DYNAMIC_TESTING,
        objective="Verify that identified vulnerabilities are exploitable",
        activities=[
            "Test SQL injection points identified in static analysis",
            "Test XSS sinks with crafted payloads",
            "Test authentication and authorization boundaries",
            "Test command injection in AI-generated shell interactions",
            "Fuzz API endpoints generated by AI",
        ],
        tools=["Burp Suite", "sqlmap", "custom scripts", "pytest"],
        outputs=[
            "Confirmed vulnerability list",
            "Proof-of-concept exploits",
            "False positive analysis",
        ],
        estimated_hours=12.0,
    ),
    AuditPhaseDetail(
        phase=AuditPhase.ORGANIZATIONAL,
        objective="Assess organizational controls around AI code generation",
        activities=[
            "Review AI tool governance policies",
            "Assess developer training on AI code security",
            "Review code review processes for AI-generated code",
            "Evaluate CI/CD security gates for AI code",
            "Check data classification compliance",
        ],
        tools=["Policy documents", "interviews", "process review"],
        outputs=[
            "Organizational maturity assessment",
            "Policy gap analysis",
            "Process improvement recommendations",
        ],
        estimated_hours=6.0,
    ),
    AuditPhaseDetail(
        phase=AuditPhase.REPORTING,
        objective="Produce actionable findings with prioritized recommendations",
        activities=[
            "Consolidate technical and organizational findings",
            "Risk-rank findings by exploitability and impact",
            "Develop remediation recommendations",
            "Create executive summary",
            "Present findings to stakeholders",
        ],
        tools=["Report templates", "risk frameworks"],
        outputs=[
            "Technical audit report",
            "Executive summary",
            "Remediation roadmap",
        ],
        estimated_hours=4.0,
    ),
]

Phase 1: AI Code Identification

Git History Analysis

The first challenge is identifying which code was generated by AI. Several signals help:

import subprocess
import re
import json
from pathlib import Path
from datetime import datetime
 
class AICodeIdentifier:
    """Identify AI-generated code in a repository."""
 
    # Patterns that indicate AI-generated commits
    AI_COMMIT_PATTERNS = [
        r"(?i)generated\s+by\s+(copilot|cursor|claude|aider|gpt|ai)",
        r"(?i)co-authored-by:.*\b(copilot|claude|aider)\b",
        r"(?i)aider:",  # Aider commit message prefix
        r"(?i)\[ai\]|\[generated\]|\[copilot\]",
    ]
 
    # Code patterns characteristic of AI generation
    AI_CODE_PATTERNS = [
        r"# Generated by",
        r"// Auto-generated",
        r"# TODO: Add error handling",  # Common AI placeholder
        r"# TODO: Add tests",           # Common AI placeholder
        r"pass\s+# placeholder",        # AI placeholder pattern
    ]
 
    def __init__(self, repo_path: str):
        self.repo_path = repo_path
 
    def identify_ai_commits(self, since: str = "6 months ago") -> list[dict]:
        """Identify commits likely generated by AI tools."""
        result = subprocess.run(
            ["git", "log", f"--since={since}", "--format=%H|%an|%ae|%s|%aI"],
            capture_output=True, text=True, cwd=self.repo_path,
        )
 
        ai_commits = []
        for line in result.stdout.strip().split("\n"):
            if not line:
                continue
            parts = line.split("|", 4)
            if len(parts) < 5:
                continue
 
            commit_hash, author, email, subject, date = parts
 
            for pattern in self.AI_COMMIT_PATTERNS:
                if re.search(pattern, subject) or re.search(pattern, author):
                    ai_commits.append({
                        "hash": commit_hash,
                        "author": author,
                        "email": email,
                        "subject": subject,
                        "date": date,
                        "detection_method": "commit_pattern",
                    })
                    break
 
        return ai_commits
 
    def identify_ai_code_patterns(self) -> list[dict]:
        """Scan codebase for patterns characteristic of AI-generated code."""
        findings = []
        code_extensions = {".py", ".js", ".ts", ".jsx", ".tsx", ".java", ".go"}
 
        for filepath in Path(self.repo_path).rglob("*"):
            if filepath.suffix not in code_extensions:
                continue
            if any(skip in str(filepath) for skip in ["node_modules", ".git", "venv"]):
                continue
 
            try:
                content = filepath.read_text()
                for i, line in enumerate(content.split("\n"), 1):
                    for pattern in self.AI_CODE_PATTERNS:
                        if re.search(pattern, line):
                            findings.append({
                                "file": str(filepath.relative_to(self.repo_path)),
                                "line": i,
                                "pattern": pattern,
                                "content": line.strip()[:100],
                            })
            except (UnicodeDecodeError, PermissionError):
                pass
 
        return findings
 
    def analyze_coding_velocity(self, since: str = "3 months ago") -> dict:
        """Analyze commit velocity for signs of AI-assisted development."""
        result = subprocess.run(
            [
                "git", "log", f"--since={since}",
                "--format=%aI|%an", "--shortstat",
            ],
            capture_output=True, text=True, cwd=self.repo_path,
        )
 
        # Parse and analyze for abnormal velocity patterns
        # High insertions with few deletions in short time spans
        # may indicate AI code generation
        return {
            "analysis": "velocity_patterns",
            "note": "Sudden velocity increases may indicate AI tool adoption",
        }

Developer Interview Guide

# Structured interview questions for AI code generation audit
 
DEVELOPER_INTERVIEW = {
    "tool_usage": [
        "Which AI coding tools do you use? (Copilot, Cursor, Claude Code, Aider, other)",
        "How frequently do you accept AI suggestions without modification?",
        "Do you use AI for generating security-sensitive code (auth, crypto, input validation)?",
        "What model do you typically use? (GPT-4, Claude, Codex, local models)",
        "Do you use agent/autonomous mode, or only completion/suggestion mode?",
    ],
    "code_review": [
        "How do you review AI-generated code before accepting it?",
        "Do you distinguish between human-written and AI-generated code in reviews?",
        "Have you ever caught a security issue in AI-generated code?",
        "Are AI-generated changes reviewed differently in pull requests?",
    ],
    "configuration": [
        "Is there a .cursorrules, CLAUDE.md, or .aiderignore in your projects?",
        "Who maintains these configuration files?",
        "Are there project-level security instructions for AI tools?",
        "Do you use .cursorignore or similar to exclude sensitive files?",
    ],
    "incidents": [
        "Have you experienced any security issues related to AI-generated code?",
        "Has AI-generated code ever introduced a bug that was hard to diagnose?",
        "Have you seen AI suggest dependencies that didn't exist?",
    ],
}

Phase 2: Static Analysis

AI-Specific Semgrep Configuration

# Semgrep rules targeting AI-generated code patterns
AI_CODE_SEMGREP_RULES = """
rules:
  # SQL Injection patterns common in AI-generated code
  - id: ai-sql-fstring
    patterns:
      - pattern: |
          $CURSOR.execute(f"...{$VAR}...")
    message: >
      SQL injection via f-string. AI coding assistants commonly generate
      this pattern. Use parameterized queries.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-89
      category: ai-generated
 
  # Hardcoded secrets (AI often generates placeholder secrets)
  - id: ai-hardcoded-secret
    patterns:
      - pattern: |
          $KEY = "sk-..."
      - pattern: |
          api_key = "..."
      - pattern: |
          password = "..."
    message: >
      Hardcoded secret detected. AI coding assistants often generate
      placeholder credentials that developers forget to replace.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-798
      category: ai-generated
 
  # Missing input validation (AI often skips validation)
  - id: ai-missing-input-validation
    patterns:
      - pattern: |
          @app.route("...", methods=["POST"])
          def $FUNC():
              $DATA = request.json
              ...
              $DB.insert($DATA)
      - pattern-not: |
          @app.route("...", methods=["POST"])
          def $FUNC():
              $DATA = request.json
              ...
              validate(...)
              ...
              $DB.insert($DATA)
    message: >
      POST endpoint inserts request data without validation.
      AI-generated endpoints often skip input validation.
    languages: [python]
    severity: WARNING
    metadata:
      cwe: CWE-20
      category: ai-generated
 
  # Insecure deserialization
  - id: ai-pickle-load
    pattern: pickle.load(...)
    message: >
      pickle.load() on untrusted data allows arbitrary code execution.
      AI assistants frequently suggest pickle for serialization.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-502
      category: ai-generated
 
  # Missing error handling
  - id: ai-bare-except
    pattern: |
      except:
          pass
    message: >
      Bare except with pass suppresses all errors including security-relevant
      ones. AI tools generate this as a placeholder.
    languages: [python]
    severity: WARNING
    metadata:
      cwe: CWE-755
      category: ai-generated
"""

Running the Static Analysis Suite

#!/bin/bash
# Comprehensive static analysis for AI-generated code audit
 
set -euo pipefail
 
PROJECT_DIR="${1:-.}"
REPORT_DIR="${2:-/tmp/ai-audit-report}"
 
mkdir -p "$REPORT_DIR"
 
echo "=== AI-Generated Code Security Audit - Static Analysis ==="
echo "Project: $PROJECT_DIR"
echo "Reports: $REPORT_DIR"
echo ""
 
# Step 1: Semgrep with AI-specific rules
echo "--- Step 1: Semgrep Analysis ---"
if command -v semgrep &>/dev/null; then
    semgrep --config "p/owasp-top-ten" \
            --config "p/python-sql-injection" \
            --config "p/xss" \
            --config "p/python-security-audit" \
            "$PROJECT_DIR" \
            --json \
            --output "$REPORT_DIR/semgrep-results.json" \
            --exclude "node_modules" --exclude ".git" --exclude "venv" \
            2>/dev/null || true
 
    # Summarize
    python3 -c "
import json
with open('$REPORT_DIR/semgrep-results.json') as f:
    data = json.load(f)
results = data.get('results', [])
print(f'Total findings: {len(results)}')
by_severity = {}
for r in results:
    sev = r.get('extra', {}).get('severity', 'unknown')
    by_severity[sev] = by_severity.get(sev, 0) + 1
for sev, count in sorted(by_severity.items()):
    print(f'  {sev}: {count}')
"
else
    echo "Semgrep not installed. Install: pip install semgrep"
fi
 
# Step 2: Bandit (Python security linter)
echo ""
echo "--- Step 2: Bandit Analysis ---"
if command -v bandit &>/dev/null; then
    bandit -r "$PROJECT_DIR" \
        -f json \
        -o "$REPORT_DIR/bandit-results.json" \
        --exclude ".git,node_modules,venv,__pycache__" \
        -ll \
        2>/dev/null || true
 
    python3 -c "
import json
with open('$REPORT_DIR/bandit-results.json') as f:
    data = json.load(f)
results = data.get('results', [])
print(f'Total findings: {len(results)}')
by_severity = {}
for r in results:
    sev = r.get('issue_severity', 'unknown')
    by_severity[sev] = by_severity.get(sev, 0) + 1
for sev, count in sorted(by_severity.items()):
    print(f'  {sev}: {count}')
"
else
    echo "Bandit not installed. Install: pip install bandit"
fi
 
# Step 3: Pattern-based checks for AI-specific issues
echo ""
echo "--- Step 3: AI-Specific Pattern Checks ---"
 
# Check for hardcoded credentials (common in AI-generated code)
echo "Hardcoded credentials:"
grep -rn "api_key\s*=\s*['\"]" "$PROJECT_DIR" --include="*.py" \
    --exclude-dir=".git" --exclude-dir="node_modules" -c 2>/dev/null || echo "  0 matches"
 
# Check for eval/exec usage
echo "eval/exec usage:"
grep -rn "\beval\b\|\bexec\b" "$PROJECT_DIR" --include="*.py" \
    --exclude-dir=".git" --exclude-dir="node_modules" -c 2>/dev/null || echo "  0 matches"
 
# Check for missing error handling patterns
echo "Bare except clauses:"
grep -rn "except:" "$PROJECT_DIR" --include="*.py" \
    --exclude-dir=".git" --exclude-dir="node_modules" -c 2>/dev/null || echo "  0 matches"
 
echo ""
echo "=== Static Analysis Complete ==="
echo "Results saved to: $REPORT_DIR"

Phase 3: Dynamic Testing

Targeted Testing Strategy

Dynamic testing for AI-generated code focuses on the vulnerability classes AI is most likely to introduce:

import requests
from urllib.parse import quote
 
class AICodeDynamicTester:
    """Dynamic testing focused on common AI-generated vulnerability patterns."""
 
    def __init__(self, base_url: str):
        self.base_url = base_url.rstrip("/")
        self.findings: list[dict] = []
 
    def test_sql_injection_endpoints(
        self, endpoints: list[dict]
    ) -> list[dict]:
        """Test endpoints for SQL injection vulnerabilities."""
        sqli_payloads = [
            "' OR '1'='1",
            "' OR '1'='1' --",
            "1; DROP TABLE users --",
            "' UNION SELECT NULL, username, password FROM users --",
            "1' AND SLEEP(5) --",
        ]
 
        findings = []
        for endpoint in endpoints:
            url = f"{self.base_url}{endpoint['path']}"
            for param in endpoint.get("params", []):
                for payload in sqli_payloads:
                    try:
                        params = {param: payload}
                        response = requests.get(url, params=params, timeout=10)
 
                        # Check for SQL error messages in response
                        sql_errors = [
                            "syntax error",
                            "mysql_fetch",
                            "sqlite3.OperationalError",
                            "psycopg2.errors",
                            "ORA-",
                            "SQLSTATE",
                        ]
 
                        for error in sql_errors:
                            if error.lower() in response.text.lower():
                                findings.append({
                                    "type": "sql_injection",
                                    "endpoint": endpoint["path"],
                                    "parameter": param,
                                    "payload": payload,
                                    "evidence": error,
                                    "severity": "critical",
                                })
                                break
 
                        # Check for time-based blind SQLi
                        if "SLEEP" in payload and response.elapsed.total_seconds() > 4:
                            findings.append({
                                "type": "blind_sql_injection",
                                "endpoint": endpoint["path"],
                                "parameter": param,
                                "payload": payload,
                                "evidence": f"Response time: {response.elapsed.total_seconds():.1f}s",
                                "severity": "critical",
                            })
 
                    except requests.exceptions.Timeout:
                        if "SLEEP" in payload:
                            findings.append({
                                "type": "blind_sql_injection",
                                "endpoint": endpoint["path"],
                                "parameter": param,
                                "payload": payload,
                                "evidence": "Request timed out (possible sleep injection)",
                                "severity": "critical",
                            })
                    except requests.exceptions.RequestException:
                        pass
 
        return findings
 
    def test_xss_endpoints(self, endpoints: list[dict]) -> list[dict]:
        """Test endpoints for reflected XSS vulnerabilities."""
        xss_payloads = [
            '<script>alert("XSS")</script>',
            '<img src=x onerror=alert(1)>',
            '" onmouseover="alert(1)"',
            "javascript:alert(1)",
            '<svg/onload=alert(1)>',
        ]
 
        findings = []
        for endpoint in endpoints:
            url = f"{self.base_url}{endpoint['path']}"
            for param in endpoint.get("params", []):
                for payload in xss_payloads:
                    try:
                        params = {param: payload}
                        response = requests.get(url, params=params, timeout=10)
 
                        # Check if payload is reflected without encoding
                        if payload in response.text:
                            findings.append({
                                "type": "reflected_xss",
                                "endpoint": endpoint["path"],
                                "parameter": param,
                                "payload": payload,
                                "evidence": "Payload reflected unencoded in response",
                                "severity": "high",
                            })
                    except requests.exceptions.RequestException:
                        pass
 
        return findings
 
    def test_authentication_bypass(
        self, protected_endpoints: list[str]
    ) -> list[dict]:
        """Test for authentication bypass in AI-generated auth code."""
        findings = []
 
        bypass_techniques = [
            {"name": "No auth header", "headers": {}},
            {"name": "Empty bearer", "headers": {"Authorization": "Bearer "}},
            {"name": "Invalid token", "headers": {"Authorization": "Bearer invalid"}},
            {"name": "Admin role claim", "headers": {"X-User-Role": "admin"}},
        ]
 
        for endpoint in protected_endpoints:
            url = f"{self.base_url}{endpoint}"
            for technique in bypass_techniques:
                try:
                    response = requests.get(
                        url, headers=technique["headers"], timeout=10
                    )
                    if response.status_code == 200:
                        findings.append({
                            "type": "auth_bypass",
                            "endpoint": endpoint,
                            "technique": technique["name"],
                            "evidence": f"Got 200 OK with {technique['name']}",
                            "severity": "critical",
                        })
                except requests.exceptions.RequestException:
                    pass
 
        return findings

Phase 4: Organizational Assessment

Maturity Model

from dataclasses import dataclass
 
@dataclass
class MaturityDimension:
    dimension: str
    level_1: str  # Initial
    level_2: str  # Developing
    level_3: str  # Defined
    level_4: str  # Managed
    level_5: str  # Optimizing
 
AI_CODE_SECURITY_MATURITY = [
    MaturityDimension(
        dimension="Tool Governance",
        level_1="No policy on AI tool usage",
        level_2="Informal guidance on approved tools",
        level_3="Formal policy with approved tool list",
        level_4="Policy enforced via technical controls",
        level_5="Continuous assessment of new tools",
    ),
    MaturityDimension(
        dimension="Code Review",
        level_1="No distinction between AI and human code",
        level_2="Awareness that AI code needs extra review",
        level_3="Documented review checklist for AI code",
        level_4="Automated checks in CI/CD for AI patterns",
        level_5="Metrics-driven continuous improvement",
    ),
    MaturityDimension(
        dimension="Developer Training",
        level_1="No training on AI code security",
        level_2="Ad hoc awareness communications",
        level_3="Formal training program for AI tool users",
        level_4="Regular training with assessments",
        level_5="Threat modeling exercises with AI scenarios",
    ),
    MaturityDimension(
        dimension="Static Analysis",
        level_1="No static analysis",
        level_2="Generic SAST tools without AI-specific rules",
        level_3="SAST with AI-specific rule sets",
        level_4="Custom rules for organization's AI patterns",
        level_5="ML-enhanced detection of AI code issues",
    ),
    MaturityDimension(
        dimension="Incident Response",
        level_1="No AI-specific incident procedures",
        level_2="AI code issues handled ad hoc",
        level_3="Documented playbook for AI code incidents",
        level_4="Practiced playbook with tabletop exercises",
        level_5="Automated detection and response",
    ),
]

Phase 5: Reporting

Report Template

def generate_audit_report(
    project_name: str,
    ai_commits: list,
    static_findings: list,
    dynamic_findings: list,
    maturity_scores: dict,
) -> str:
    """Generate the final audit report."""
 
    critical = sum(1 for f in static_findings + dynamic_findings if f.get("severity") == "critical")
    high = sum(1 for f in static_findings + dynamic_findings if f.get("severity") == "high")
 
    report = f"""
# AI-Generated Code Security Audit Report
## Project: {project_name}
## Date: {datetime.utcnow().strftime('%Y-%m-%d')}
 
---
 
## Executive Summary
 
This audit assessed the security of AI-generated code in the {project_name} project.
The assessment identified {len(ai_commits)} commits attributed to AI coding tools,
{len(static_findings)} static analysis findings, and {len(dynamic_findings)} confirmed
vulnerabilities through dynamic testing.
 
**Critical findings: {critical} | High findings: {high}**
 
## Key Findings
 
### 1. AI Code Generation Scope
- {len(ai_commits)} commits identified as AI-generated
- AI coding tools in use: [list from identification phase]
- Estimated percentage of codebase AI-generated: [X]%
 
### 2. Vulnerability Summary
| Severity | Static | Dynamic | Total |
|---|---|---|---|
| Critical | {sum(1 for f in static_findings if f.get('severity') == 'critical')} | {sum(1 for f in dynamic_findings if f.get('severity') == 'critical')} | {critical} |
| High | {sum(1 for f in static_findings if f.get('severity') == 'high')} | {sum(1 for f in dynamic_findings if f.get('severity') == 'high')} | {high} |
 
### 3. Top Vulnerability Categories
[Summarize by CWE]
 
### 4. Organizational Maturity
[Summarize maturity assessment]
 
## Recommendations
 
1. **Immediate**: Remediate all critical and high findings
2. **Short-term**: Deploy AI-specific Semgrep rules in CI/CD
3. **Medium-term**: Establish AI code generation governance policy
4. **Long-term**: Build organizational maturity to Level 3+
 
## Detailed Findings
 
[Individual findings with evidence, impact, and remediation]
"""
    return report

References

OWASP Code Review Guide — https://owasp.org/www-project-code-review-guide/
Semgrep Documentation — https://semgrep.dev/docs/
CodeQL Documentation — https://codeql.github.com/docs/
"Do Users Write More Insecure Code with AI Assistants?" — Perry et al., 2023 — https://arxiv.org/abs/2211.03622
OWASP Top 10 for LLM Applications 2025 — https://genai.owasp.org/llmrisk/
NIST AI Risk Management Framework — https://www.nist.gov/artificial-intelligence/ai-risk-management-framework
CWE Top 25 Most Dangerous Software Weaknesses — https://cwe.mitre.org/top25/

Edit this page on GitHub

Methodology for Auditing AI-Generated Code

intermediate14 min readUpdated 2026-03-20

Structured audit methodology for evaluating the security of AI-generated code, covering static analysis, dynamic testing, and organizational assessment.

code-gen-security audit-methodology security-assessment code-review

from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
 
class AuditPhase(Enum):
    IDENTIFICATION = "1_identification"
    STATIC_ANALYSIS = "2_static_analysis"
    DYNAMIC_TESTING = "3_dynamic_testing"
    ORGANIZATIONAL = "4_organizational"
    REPORTING = "5_reporting"
 
@dataclass
class AuditPhaseDetail:
    phase: AuditPhase
    objective: str
    activities: list[str]
    tools: list[str]
    outputs: list[str]
    estimated_hours: float
 
AUDIT_PHASES = [
    AuditPhaseDetail(
        phase=AuditPhase.IDENTIFICATION,
        objective="Identify which code was AI-generated and which tools were used",
        activities=[
            "Interview developers about AI tool usage patterns",
            "Analyze git history for AI-generated commits",
            "Review CI/CD logs for AI tool integration",
            "Map AI tool configurations (.cursorrules, CLAUDE.md, .aiderignore)",
            "Identify high-risk code areas (auth, crypto, input handling)",
        ],
        tools=["git", "custom scripts", "developer interviews"],
        outputs=[
            "AI code generation inventory",
            "Tool usage map",
            "High-risk area identification",
        ],
        estimated_hours=4.0,
    ),
    AuditPhaseDetail(
        phase=AuditPhase.STATIC_ANALYSIS,
        objective="Detect vulnerability patterns characteristic of AI-generated code",
        activities=[
            "Run Semgrep with AI-specific rule sets",
            "Run CodeQL for dataflow analysis",
            "Run Bandit for Python security issues",
            "Check for known AI vulnerability patterns",
            "Scan for license compliance issues",
        ],
        tools=["Semgrep", "CodeQL", "Bandit", "ScanCode"],
        outputs=[
            "Static analysis findings report",
            "Vulnerability inventory with severity ratings",
            "License compliance report",
        ],
        estimated_hours=8.0,
    ),
    AuditPhaseDetail(
        phase=AuditPhase.DYNAMIC_TESTING,
        objective="Verify that identified vulnerabilities are exploitable",
        activities=[
            "Test SQL injection points identified in static analysis",
            "Test XSS sinks with crafted payloads",
            "Test authentication and authorization boundaries",
            "Test command injection in AI-generated shell interactions",
            "Fuzz API endpoints generated by AI",
        ],
        tools=["Burp Suite", "sqlmap", "custom scripts", "pytest"],
        outputs=[
            "Confirmed vulnerability list",
            "Proof-of-concept exploits",
            "False positive analysis",
        ],
        estimated_hours=12.0,
    ),
    AuditPhaseDetail(
        phase=AuditPhase.ORGANIZATIONAL,
        objective="Assess organizational controls around AI code generation",
        activities=[
            "Review AI tool governance policies",
            "Assess developer training on AI code security",
            "Review code review processes for AI-generated code",
            "Evaluate CI/CD security gates for AI code",
            "Check data classification compliance",
        ],
        tools=["Policy documents", "interviews", "process review"],
        outputs=[
            "Organizational maturity assessment",
            "Policy gap analysis",
            "Process improvement recommendations",
        ],
        estimated_hours=6.0,
    ),
    AuditPhaseDetail(
        phase=AuditPhase.REPORTING,
        objective="Produce actionable findings with prioritized recommendations",
        activities=[
            "Consolidate technical and organizational findings",
            "Risk-rank findings by exploitability and impact",
            "Develop remediation recommendations",
            "Create executive summary",
            "Present findings to stakeholders",
        ],
        tools=["Report templates", "risk frameworks"],
        outputs=[
            "Technical audit report",
            "Executive summary",
            "Remediation roadmap",
        ],
        estimated_hours=4.0,
    ),
]

Phase 1: AI Code Identification

Git History Analysis

The first challenge is identifying which code was generated by AI. Several signals help:

import subprocess
import re
import json
from pathlib import Path
from datetime import datetime
 
class AICodeIdentifier:
    """Identify AI-generated code in a repository."""
 
    # Patterns that indicate AI-generated commits
    AI_COMMIT_PATTERNS = [
        r"(?i)generated\s+by\s+(copilot|cursor|claude|aider|gpt|ai)",
        r"(?i)co-authored-by:.*\b(copilot|claude|aider)\b",
        r"(?i)aider:",  # Aider commit message prefix
        r"(?i)\[ai\]|\[generated\]|\[copilot\]",
    ]
 
    # Code patterns characteristic of AI generation
    AI_CODE_PATTERNS = [
        r"# Generated by",
        r"// Auto-generated",
        r"# TODO: Add error handling",  # Common AI placeholder
        r"# TODO: Add tests",           # Common AI placeholder
        r"pass\s+# placeholder",        # AI placeholder pattern
    ]
 
    def __init__(self, repo_path: str):
        self.repo_path = repo_path
 
    def identify_ai_commits(self, since: str = "6 months ago") -> list[dict]:
        """Identify commits likely generated by AI tools."""
        result = subprocess.run(
            ["git", "log", f"--since={since}", "--format=%H|%an|%ae|%s|%aI"],
            capture_output=True, text=True, cwd=self.repo_path,
        )
 
        ai_commits = []
        for line in result.stdout.strip().split("\n"):
            if not line:
                continue
            parts = line.split("|", 4)
            if len(parts) < 5:
                continue
 
            commit_hash, author, email, subject, date = parts
 
            for pattern in self.AI_COMMIT_PATTERNS:
                if re.search(pattern, subject) or re.search(pattern, author):
                    ai_commits.append({
                        "hash": commit_hash,
                        "author": author,
                        "email": email,
                        "subject": subject,
                        "date": date,
                        "detection_method": "commit_pattern",
                    })
                    break
 
        return ai_commits
 
    def identify_ai_code_patterns(self) -> list[dict]:
        """Scan codebase for patterns characteristic of AI-generated code."""
        findings = []
        code_extensions = {".py", ".js", ".ts", ".jsx", ".tsx", ".java", ".go"}
 
        for filepath in Path(self.repo_path).rglob("*"):
            if filepath.suffix not in code_extensions:
                continue
            if any(skip in str(filepath) for skip in ["node_modules", ".git", "venv"]):
                continue
 
            try:
                content = filepath.read_text()
                for i, line in enumerate(content.split("\n"), 1):
                    for pattern in self.AI_CODE_PATTERNS:
                        if re.search(pattern, line):
                            findings.append({
                                "file": str(filepath.relative_to(self.repo_path)),
                                "line": i,
                                "pattern": pattern,
                                "content": line.strip()[:100],
                            })
            except (UnicodeDecodeError, PermissionError):
                pass
 
        return findings
 
    def analyze_coding_velocity(self, since: str = "3 months ago") -> dict:
        """Analyze commit velocity for signs of AI-assisted development."""
        result = subprocess.run(
            [
                "git", "log", f"--since={since}",
                "--format=%aI|%an", "--shortstat",
            ],
            capture_output=True, text=True, cwd=self.repo_path,
        )
 
        # Parse and analyze for abnormal velocity patterns
        # High insertions with few deletions in short time spans
        # may indicate AI code generation
        return {
            "analysis": "velocity_patterns",
            "note": "Sudden velocity increases may indicate AI tool adoption",
        }

Developer Interview Guide

# Structured interview questions for AI code generation audit
 
DEVELOPER_INTERVIEW = {
    "tool_usage": [
        "Which AI coding tools do you use? (Copilot, Cursor, Claude Code, Aider, other)",
        "How frequently do you accept AI suggestions without modification?",
        "Do you use AI for generating security-sensitive code (auth, crypto, input validation)?",
        "What model do you typically use? (GPT-4, Claude, Codex, local models)",
        "Do you use agent/autonomous mode, or only completion/suggestion mode?",
    ],
    "code_review": [
        "How do you review AI-generated code before accepting it?",
        "Do you distinguish between human-written and AI-generated code in reviews?",
        "Have you ever caught a security issue in AI-generated code?",
        "Are AI-generated changes reviewed differently in pull requests?",
    ],
    "configuration": [
        "Is there a .cursorrules, CLAUDE.md, or .aiderignore in your projects?",
        "Who maintains these configuration files?",
        "Are there project-level security instructions for AI tools?",
        "Do you use .cursorignore or similar to exclude sensitive files?",
    ],
    "incidents": [
        "Have you experienced any security issues related to AI-generated code?",
        "Has AI-generated code ever introduced a bug that was hard to diagnose?",
        "Have you seen AI suggest dependencies that didn't exist?",
    ],
}

Phase 2: Static Analysis

AI-Specific Semgrep Configuration

# Semgrep rules targeting AI-generated code patterns
AI_CODE_SEMGREP_RULES = """
rules:
  # SQL Injection patterns common in AI-generated code
  - id: ai-sql-fstring
    patterns:
      - pattern: |
          $CURSOR.execute(f"...{$VAR}...")
    message: >
      SQL injection via f-string. AI coding assistants commonly generate
      this pattern. Use parameterized queries.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-89
      category: ai-generated
 
  # Hardcoded secrets (AI often generates placeholder secrets)
  - id: ai-hardcoded-secret
    patterns:
      - pattern: |
          $KEY = "sk-..."
      - pattern: |
          api_key = "..."
      - pattern: |
          password = "..."
    message: >
      Hardcoded secret detected. AI coding assistants often generate
      placeholder credentials that developers forget to replace.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-798
      category: ai-generated
 
  # Missing input validation (AI often skips validation)
  - id: ai-missing-input-validation
    patterns:
      - pattern: |
          @app.route("...", methods=["POST"])
          def $FUNC():
              $DATA = request.json
              ...
              $DB.insert($DATA)
      - pattern-not: |
          @app.route("...", methods=["POST"])
          def $FUNC():
              $DATA = request.json
              ...
              validate(...)
              ...
              $DB.insert($DATA)
    message: >
      POST endpoint inserts request data without validation.
      AI-generated endpoints often skip input validation.
    languages: [python]
    severity: WARNING
    metadata:
      cwe: CWE-20
      category: ai-generated
 
  # Insecure deserialization
  - id: ai-pickle-load
    pattern: pickle.load(...)
    message: >
      pickle.load() on untrusted data allows arbitrary code execution.
      AI assistants frequently suggest pickle for serialization.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-502
      category: ai-generated
 
  # Missing error handling
  - id: ai-bare-except
    pattern: |
      except:
          pass
    message: >
      Bare except with pass suppresses all errors including security-relevant
      ones. AI tools generate this as a placeholder.
    languages: [python]
    severity: WARNING
    metadata:
      cwe: CWE-755
      category: ai-generated
"""

Running the Static Analysis Suite

#!/bin/bash
# Comprehensive static analysis for AI-generated code audit
 
set -euo pipefail
 
PROJECT_DIR="${1:-.}"
REPORT_DIR="${2:-/tmp/ai-audit-report}"
 
mkdir -p "$REPORT_DIR"
 
echo "=== AI-Generated Code Security Audit - Static Analysis ==="
echo "Project: $PROJECT_DIR"
echo "Reports: $REPORT_DIR"
echo ""
 
# Step 1: Semgrep with AI-specific rules
echo "--- Step 1: Semgrep Analysis ---"
if command -v semgrep &>/dev/null; then
    semgrep --config "p/owasp-top-ten" \
            --config "p/python-sql-injection" \
            --config "p/xss" \
            --config "p/python-security-audit" \
            "$PROJECT_DIR" \
            --json \
            --output "$REPORT_DIR/semgrep-results.json" \
            --exclude "node_modules" --exclude ".git" --exclude "venv" \
            2>/dev/null || true
 
    # Summarize
    python3 -c "
import json
with open('$REPORT_DIR/semgrep-results.json') as f:
    data = json.load(f)
results = data.get('results', [])
print(f'Total findings: {len(results)}')
by_severity = {}
for r in results:
    sev = r.get('extra', {}).get('severity', 'unknown')
    by_severity[sev] = by_severity.get(sev, 0) + 1
for sev, count in sorted(by_severity.items()):
    print(f'  {sev}: {count}')
"
else
    echo "Semgrep not installed. Install: pip install semgrep"
fi
 
# Step 2: Bandit (Python security linter)
echo ""
echo "--- Step 2: Bandit Analysis ---"
if command -v bandit &>/dev/null; then
    bandit -r "$PROJECT_DIR" \
        -f json \
        -o "$REPORT_DIR/bandit-results.json" \
        --exclude ".git,node_modules,venv,__pycache__" \
        -ll \
        2>/dev/null || true
 
    python3 -c "
import json
with open('$REPORT_DIR/bandit-results.json') as f:
    data = json.load(f)
results = data.get('results', [])
print(f'Total findings: {len(results)}')
by_severity = {}
for r in results:
    sev = r.get('issue_severity', 'unknown')
    by_severity[sev] = by_severity.get(sev, 0) + 1
for sev, count in sorted(by_severity.items()):
    print(f'  {sev}: {count}')
"
else
    echo "Bandit not installed. Install: pip install bandit"
fi
 
# Step 3: Pattern-based checks for AI-specific issues
echo ""
echo "--- Step 3: AI-Specific Pattern Checks ---"
 
# Check for hardcoded credentials (common in AI-generated code)
echo "Hardcoded credentials:"
grep -rn "api_key\s*=\s*['\"]" "$PROJECT_DIR" --include="*.py" \
    --exclude-dir=".git" --exclude-dir="node_modules" -c 2>/dev/null || echo "  0 matches"
 
# Check for eval/exec usage
echo "eval/exec usage:"
grep -rn "\beval\b\|\bexec\b" "$PROJECT_DIR" --include="*.py" \
    --exclude-dir=".git" --exclude-dir="node_modules" -c 2>/dev/null || echo "  0 matches"
 
# Check for missing error handling patterns
echo "Bare except clauses:"
grep -rn "except:" "$PROJECT_DIR" --include="*.py" \
    --exclude-dir=".git" --exclude-dir="node_modules" -c 2>/dev/null || echo "  0 matches"
 
echo ""
echo "=== Static Analysis Complete ==="
echo "Results saved to: $REPORT_DIR"

Phase 3: Dynamic Testing

Targeted Testing Strategy

Dynamic testing for AI-generated code focuses on the vulnerability classes AI is most likely to introduce:

import requests
from urllib.parse import quote
 
class AICodeDynamicTester:
    """Dynamic testing focused on common AI-generated vulnerability patterns."""
 
    def __init__(self, base_url: str):
        self.base_url = base_url.rstrip("/")
        self.findings: list[dict] = []
 
    def test_sql_injection_endpoints(
        self, endpoints: list[dict]
    ) -> list[dict]:
        """Test endpoints for SQL injection vulnerabilities."""
        sqli_payloads = [
            "' OR '1'='1",
            "' OR '1'='1' --",
            "1; DROP TABLE users --",
            "' UNION SELECT NULL, username, password FROM users --",
            "1' AND SLEEP(5) --",
        ]
 
        findings = []
        for endpoint in endpoints:
            url = f"{self.base_url}{endpoint['path']}"
            for param in endpoint.get("params", []):
                for payload in sqli_payloads:
                    try:
                        params = {param: payload}
                        response = requests.get(url, params=params, timeout=10)
 
                        # Check for SQL error messages in response
                        sql_errors = [
                            "syntax error",
                            "mysql_fetch",
                            "sqlite3.OperationalError",
                            "psycopg2.errors",
                            "ORA-",
                            "SQLSTATE",
                        ]
 
                        for error in sql_errors:
                            if error.lower() in response.text.lower():
                                findings.append({
                                    "type": "sql_injection",
                                    "endpoint": endpoint["path"],
                                    "parameter": param,
                                    "payload": payload,
                                    "evidence": error,
                                    "severity": "critical",
                                })
                                break
 
                        # Check for time-based blind SQLi
                        if "SLEEP" in payload and response.elapsed.total_seconds() > 4:
                            findings.append({
                                "type": "blind_sql_injection",
                                "endpoint": endpoint["path"],
                                "parameter": param,
                                "payload": payload,
                                "evidence": f"Response time: {response.elapsed.total_seconds():.1f}s",
                                "severity": "critical",
                            })
 
                    except requests.exceptions.Timeout:
                        if "SLEEP" in payload:
                            findings.append({
                                "type": "blind_sql_injection",
                                "endpoint": endpoint["path"],
                                "parameter": param,
                                "payload": payload,
                                "evidence": "Request timed out (possible sleep injection)",
                                "severity": "critical",
                            })
                    except requests.exceptions.RequestException:
                        pass
 
        return findings
 
    def test_xss_endpoints(self, endpoints: list[dict]) -> list[dict]:
        """Test endpoints for reflected XSS vulnerabilities."""
        xss_payloads = [
            '<script>alert("XSS")</script>',
            '<img src=x onerror=alert(1)>',
            '" onmouseover="alert(1)"',
            "javascript:alert(1)",
            '<svg/onload=alert(1)>',
        ]
 
        findings = []
        for endpoint in endpoints:
            url = f"{self.base_url}{endpoint['path']}"
            for param in endpoint.get("params", []):
                for payload in xss_payloads:
                    try:
                        params = {param: payload}
                        response = requests.get(url, params=params, timeout=10)
 
                        # Check if payload is reflected without encoding
                        if payload in response.text:
                            findings.append({
                                "type": "reflected_xss",
                                "endpoint": endpoint["path"],
                                "parameter": param,
                                "payload": payload,
                                "evidence": "Payload reflected unencoded in response",
                                "severity": "high",
                            })
                    except requests.exceptions.RequestException:
                        pass
 
        return findings
 
    def test_authentication_bypass(
        self, protected_endpoints: list[str]
    ) -> list[dict]:
        """Test for authentication bypass in AI-generated auth code."""
        findings = []
 
        bypass_techniques = [
            {"name": "No auth header", "headers": {}},
            {"name": "Empty bearer", "headers": {"Authorization": "Bearer "}},
            {"name": "Invalid token", "headers": {"Authorization": "Bearer invalid"}},
            {"name": "Admin role claim", "headers": {"X-User-Role": "admin"}},
        ]
 
        for endpoint in protected_endpoints:
            url = f"{self.base_url}{endpoint}"
            for technique in bypass_techniques:
                try:
                    response = requests.get(
                        url, headers=technique["headers"], timeout=10
                    )
                    if response.status_code == 200:
                        findings.append({
                            "type": "auth_bypass",
                            "endpoint": endpoint,
                            "technique": technique["name"],
                            "evidence": f"Got 200 OK with {technique['name']}",
                            "severity": "critical",
                        })
                except requests.exceptions.RequestException:
                    pass
 
        return findings

Phase 4: Organizational Assessment

Maturity Model

from dataclasses import dataclass
 
@dataclass
class MaturityDimension:
    dimension: str
    level_1: str  # Initial
    level_2: str  # Developing
    level_3: str  # Defined
    level_4: str  # Managed
    level_5: str  # Optimizing
 
AI_CODE_SECURITY_MATURITY = [
    MaturityDimension(
        dimension="Tool Governance",
        level_1="No policy on AI tool usage",
        level_2="Informal guidance on approved tools",
        level_3="Formal policy with approved tool list",
        level_4="Policy enforced via technical controls",
        level_5="Continuous assessment of new tools",
    ),
    MaturityDimension(
        dimension="Code Review",
        level_1="No distinction between AI and human code",
        level_2="Awareness that AI code needs extra review",
        level_3="Documented review checklist for AI code",
        level_4="Automated checks in CI/CD for AI patterns",
        level_5="Metrics-driven continuous improvement",
    ),
    MaturityDimension(
        dimension="Developer Training",
        level_1="No training on AI code security",
        level_2="Ad hoc awareness communications",
        level_3="Formal training program for AI tool users",
        level_4="Regular training with assessments",
        level_5="Threat modeling exercises with AI scenarios",
    ),
    MaturityDimension(
        dimension="Static Analysis",
        level_1="No static analysis",
        level_2="Generic SAST tools without AI-specific rules",
        level_3="SAST with AI-specific rule sets",
        level_4="Custom rules for organization's AI patterns",
        level_5="ML-enhanced detection of AI code issues",
    ),
    MaturityDimension(
        dimension="Incident Response",
        level_1="No AI-specific incident procedures",
        level_2="AI code issues handled ad hoc",
        level_3="Documented playbook for AI code incidents",
        level_4="Practiced playbook with tabletop exercises",
        level_5="Automated detection and response",
    ),
]

Phase 5: Reporting

Report Template

def generate_audit_report(
    project_name: str,
    ai_commits: list,
    static_findings: list,
    dynamic_findings: list,
    maturity_scores: dict,
) -> str:
    """Generate the final audit report."""
 
    critical = sum(1 for f in static_findings + dynamic_findings if f.get("severity") == "critical")
    high = sum(1 for f in static_findings + dynamic_findings if f.get("severity") == "high")
 
    report = f"""
# AI-Generated Code Security Audit Report
## Project: {project_name}
## Date: {datetime.utcnow().strftime('%Y-%m-%d')}
 
---
 
## Executive Summary
 
This audit assessed the security of AI-generated code in the {project_name} project.
The assessment identified {len(ai_commits)} commits attributed to AI coding tools,
{len(static_findings)} static analysis findings, and {len(dynamic_findings)} confirmed
vulnerabilities through dynamic testing.
 
**Critical findings: {critical} | High findings: {high}**
 
## Key Findings
 
### 1. AI Code Generation Scope
- {len(ai_commits)} commits identified as AI-generated
- AI coding tools in use: [list from identification phase]
- Estimated percentage of codebase AI-generated: [X]%
 
### 2. Vulnerability Summary
| Severity | Static | Dynamic | Total |
|---|---|---|---|
| Critical | {sum(1 for f in static_findings if f.get('severity') == 'critical')} | {sum(1 for f in dynamic_findings if f.get('severity') == 'critical')} | {critical} |
| High | {sum(1 for f in static_findings if f.get('severity') == 'high')} | {sum(1 for f in dynamic_findings if f.get('severity') == 'high')} | {high} |
 
### 3. Top Vulnerability Categories
[Summarize by CWE]
 
### 4. Organizational Maturity
[Summarize maturity assessment]
 
## Recommendations
 
1. **Immediate**: Remediate all critical and high findings
2. **Short-term**: Deploy AI-specific Semgrep rules in CI/CD
3. **Medium-term**: Establish AI code generation governance policy
4. **Long-term**: Build organizational maturity to Level 3+
 
## Detailed Findings
 
[Individual findings with evidence, impact, and remediation]
"""
    return report

References

OWASP Code Review Guide — https://owasp.org/www-project-code-review-guide/
Semgrep Documentation — https://semgrep.dev/docs/
CodeQL Documentation — https://codeql.github.com/docs/
"Do Users Write More Insecure Code with AI Assistants?" — Perry et al., 2023 — https://arxiv.org/abs/2211.03622
OWASP Top 10 for LLM Applications 2025 — https://genai.owasp.org/llmrisk/
NIST AI Risk Management Framework — https://www.nist.gov/artificial-intelligence/ai-risk-management-framework
CWE Top 25 Most Dangerous Software Weaknesses — https://cwe.mitre.org/top25/

Edit this page on GitHub

Methodology for Auditing AI-Generated Code

Related articles

Methodology for Auditing AI-Generated Code

Related articles