SQL Injection via LLM Code Generation

beginner12 min readUpdated 2026-03-20

How LLMs generate SQL injection vulnerabilities through string formatting, improper parameterization, and ORM misuse, with detection and prevention strategies.

code-gen-security sql-injection llm-vulnerabilities secure-coding

Overview

SQL injection remains one of the most common vulnerability classes in web applications (CWE-89), and AI code generation tools make the problem worse. Research from Stanford and other institutions has shown that developers using AI coding assistants produce more security vulnerabilities than those coding without AI assistance, with SQL injection being one of the most frequently introduced flaw types.

The root cause is straightforward: LLMs learn from the vast corpus of code on the internet, and a significant portion of that code uses string formatting for SQL queries rather than parameterized queries. The model reproduces the patterns it has seen most frequently, and insecure patterns are often more common in tutorials, Stack Overflow answers, and introductory code examples that dominate training data.

This article examines the specific SQL injection patterns that LLMs generate, why they generate them, and how to detect and prevent these vulnerabilities in AI-assisted development workflows.

Why LLMs Generate SQL Injection

The Training Data Problem

LLMs learn code patterns from their training corpus. Consider the relative prevalence of these two approaches in publicly available code:

# Pattern 1: String formatting (INSECURE but COMMON in training data)
# This pattern appears frequently in tutorials, blog posts, and Stack Overflow
query = f"SELECT * FROM users WHERE username = '{username}'"
cursor.execute(query)
 
# Pattern 2: Parameterized query (SECURE but less common in tutorials)
query = "SELECT * FROM users WHERE username = %s"
cursor.execute(query, (username,))

The insecure pattern appears more frequently in training data for several reasons:

Tutorials prioritize readability and simplicity over security
Quick examples and prototypes skip parameterization
Stack Overflow answers often use the simplest approach
Older code (before modern security awareness) uses string formatting

The LLM does not understand security. It predicts the most likely next token based on patterns. When a developer asks "write a function to look up a user by name," the model generates the pattern it has seen most often.

Empirical Evidence

The 2023 Stanford study "Do Users Write More Insecure Code with AI Assistants?" found that participants using AI coding assistants were more likely to produce insecure code, including SQL injection. The researchers observed that AI suggestions often provided functionally correct but security-flawed implementations that developers accepted without modification.

# Common LLM-generated SQL injection patterns categorized by severity
 
SQL_INJECTION_PATTERNS = {
    "f_string_formatting": {
        "severity": "critical",
        "frequency": "very_common",
        "example": '''
def get_user(username: str):
    query = f"SELECT * FROM users WHERE name = '{username}'"
    return db.execute(query).fetchone()
''',
        "cwe": "CWE-89",
        "fix": '''
def get_user(username: str):
    query = "SELECT * FROM users WHERE name = ?"
    return db.execute(query, (username,)).fetchone()
''',
    },
 
    "format_method": {
        "severity": "critical",
        "frequency": "common",
        "example": '''
def search_products(category: str, min_price: float):
    query = "SELECT * FROM products WHERE category = '{}' AND price >= {}".format(
        category, min_price
    )
    return db.execute(query).fetchall()
''',
        "cwe": "CWE-89",
        "fix": '''
def search_products(category: str, min_price: float):
    query = "SELECT * FROM products WHERE category = ? AND price >= ?"
    return db.execute(query, (category, min_price)).fetchall()
''',
    },
 
    "percent_formatting": {
        "severity": "critical",
        "frequency": "common",
        "example": '''
def find_orders(customer_id: str, status: str):
    query = "SELECT * FROM orders WHERE customer_id = '%s' AND status = '%s'" % (
        customer_id, status
    )
    return db.execute(query).fetchall()
''',
        "cwe": "CWE-89",
        "fix": '''
def find_orders(customer_id: str, status: str):
    query = "SELECT * FROM orders WHERE customer_id = %s AND status = %s"
    return db.execute(query, (customer_id, status)).fetchall()
''',
    },
 
    "string_concatenation": {
        "severity": "critical",
        "frequency": "common",
        "example": '''
def delete_record(table: str, record_id: str):
    query = "DELETE FROM " + table + " WHERE id = " + record_id
    db.execute(query)
''',
        "cwe": "CWE-89",
        "fix": '''
def delete_record(table: str, record_id: int):
    # Table names cannot be parameterized - use allowlist
    allowed_tables = {"users", "orders", "products"}
    if table not in allowed_tables:
        raise ValueError(f"Invalid table: {table}")
    query = f"DELETE FROM {table} WHERE id = ?"
    db.execute(query, (record_id,))
''',
    },
 
    "orm_raw_query_misuse": {
        "severity": "high",
        "frequency": "moderate",
        "example": '''
from django.db import connection
 
def custom_report(start_date: str, end_date: str):
    with connection.cursor() as cursor:
        cursor.execute(
            f"SELECT * FROM reports WHERE date BETWEEN '{start_date}' AND '{end_date}'"
        )
        return cursor.fetchall()
''',
        "cwe": "CWE-89",
        "fix": '''
from django.db import connection
 
def custom_report(start_date: str, end_date: str):
    with connection.cursor() as cursor:
        cursor.execute(
            "SELECT * FROM reports WHERE date BETWEEN %s AND %s",
            [start_date, end_date],
        )
        return cursor.fetchall()
''',
    },
}

Advanced LLM SQL Injection Patterns

Dynamic Table and Column Names

One of the trickiest patterns LLMs generate involves dynamic table or column names. Parameterized queries cannot handle identifiers (table names, column names), so the LLM falls back to string formatting. Without an allowlist, this creates injection vulnerabilities:

# Pattern LLMs commonly generate (INSECURE)
def get_column_value(table: str, column: str, record_id: int):
    """LLMs often generate this pattern for dynamic queries."""
    query = f"SELECT {column} FROM {table} WHERE id = {record_id}"
    return db.execute(query).fetchone()
 
# Secure alternative with allowlisting
ALLOWED_TABLES = {"users", "products", "orders", "categories"}
ALLOWED_COLUMNS = {
    "users": {"id", "name", "email", "created_at"},
    "products": {"id", "name", "price", "category_id"},
    "orders": {"id", "user_id", "total", "status", "created_at"},
    "categories": {"id", "name", "description"},
}
 
def get_column_value_safe(table: str, column: str, record_id: int):
    """Secure version with allowlist validation."""
    if table not in ALLOWED_TABLES:
        raise ValueError(f"Invalid table: {table}")
    if column not in ALLOWED_COLUMNS.get(table, set()):
        raise ValueError(f"Invalid column: {column} for table {table}")
 
    # Safe because table and column are validated against allowlist
    query = f"SELECT {column} FROM {table} WHERE id = ?"
    return db.execute(query, (record_id,)).fetchone()

ORM Bypass Patterns

LLMs sometimes generate code that bypasses ORM protections by dropping to raw SQL, often because the developer asks for "better performance" or "a complex query":

# SQLAlchemy ORM bypass patterns LLMs generate
 
from sqlalchemy import create_engine, text
from sqlalchemy.orm import Session
 
engine = create_engine("sqlite:///app.db")
 
# INSECURE: LLM-generated raw SQL bypass
def search_users_insecure(search_term: str):
    """LLMs generate this when asked for 'fast search' or 'complex query'."""
    with Session(engine) as session:
        # This bypasses SQLAlchemy's parameterization
        result = session.execute(
            text(f"SELECT * FROM users WHERE name LIKE '%{search_term}%'")
        )
        return result.fetchall()
 
# SECURE: Proper parameterized approach
def search_users_secure(search_term: str):
    """Correct parameterized version."""
    with Session(engine) as session:
        result = session.execute(
            text("SELECT * FROM users WHERE name LIKE :pattern"),
            {"pattern": f"%{search_term}%"},
        )
        return result.fetchall()
 
# SECURE: Using ORM properly (preferred)
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from sqlalchemy import String, select
 
class Base(DeclarativeBase):
    pass
 
class User(Base):
    __tablename__ = "users"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(100))
 
def search_users_orm(search_term: str):
    """Correct ORM-based approach - no raw SQL needed."""
    with Session(engine) as session:
        stmt = select(User).where(User.name.contains(search_term))
        return session.scalars(stmt).all()

LIKE Clause Injection

LLMs frequently mishandle the LIKE clause, which requires special attention because the % and _ characters have special meaning:

# INSECURE: LLM-generated LIKE query (very common pattern)
def search_products(query: str):
    sql = f"SELECT * FROM products WHERE name LIKE '%{query}%'"
    return db.execute(sql).fetchall()
 
# Still INSECURE: parameterized but LIKE wildcards not escaped
def search_products_partial_fix(query: str):
    sql = "SELECT * FROM products WHERE name LIKE %s"
    return db.execute(sql, (f"%{query}%",)).fetchall()
    # User can still inject LIKE wildcards: "%" matches everything
 
# SECURE: proper LIKE with wildcard escaping
def search_products_secure(query: str):
    # Escape LIKE wildcards in user input
    escaped = query.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
    sql = "SELECT * FROM products WHERE name LIKE %s ESCAPE '\\'"
    return db.execute(sql, (f"%{escaped}%",)).fetchall()

Detection with Static Analysis

Semgrep Rules

Semgrep is effective at catching LLM-generated SQL injection patterns:

# Semgrep rules for detecting LLM-generated SQL injection
SEMGREP_SQL_INJECTION_RULES = """
rules:
  - id: python-sql-fstring-injection
    patterns:
      - pattern: |
          $CURSOR.execute(f"...{$VAR}...")
    message: >
      SQL query uses f-string with variable interpolation.
      Use parameterized queries instead. This pattern is commonly
      generated by AI coding assistants.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-89
      owasp: A03:2021 - Injection
 
  - id: python-sql-format-injection
    patterns:
      - pattern: |
          $CURSOR.execute("...".format(...))
    message: >
      SQL query uses .format() string interpolation.
      Use parameterized queries instead.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-89
 
  - id: python-sql-concat-injection
    patterns:
      - pattern: |
          $CURSOR.execute("..." + $VAR + "...")
    message: >
      SQL query uses string concatenation.
      Use parameterized queries instead.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-89
 
  - id: sqlalchemy-raw-fstring
    patterns:
      - pattern: |
          text(f"...{$VAR}...")
    message: >
      SQLAlchemy text() with f-string bypasses ORM parameterization.
      Use text("... :param ...") with named parameters.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-89
"""

CodeQL Queries

# CodeQL query for detecting SQL injection in AI-generated Python code
CODEQL_SQL_INJECTION = """
/**
 * @name SQL injection from AI-generated code patterns
 * @description Detects common SQL injection patterns that AI coding
 *              assistants frequently generate.
 * @kind path-problem
 * @problem.severity error
 * @precision high
 * @id py/ai-generated-sql-injection
 * @tags security
 *       cwe-89
 */
 
import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.Concepts
import semmle.python.dataflow.new.RemoteFlowSources
 
module SqlInjectionConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    source instanceof RemoteFlowSource
  }
 
  predicate isSink(DataFlow::Node sink) {
    exists(SqlExecution exec | sink = exec.getSql())
  }
}
 
module SqlInjectionFlow = TaintTracking::Global<SqlInjectionConfig>;
 
import SqlInjectionFlow::PathGraph
 
from SqlInjectionFlow::PathNode source, SqlInjectionFlow::PathNode sink
where SqlInjectionFlow::flowPath(source, sink)
select sink.getNode(), source, sink,
  "SQL injection vulnerability: user input from $@ flows to SQL query.",
  source.getNode(), "user input"
"""

CI/CD Integration

Automated SQL Injection Detection Pipeline

#!/bin/bash
# CI script to detect SQL injection in AI-generated code
# Run this in your CI/CD pipeline on every PR
 
set -euo pipefail
 
echo "=== SQL Injection Detection for AI-Generated Code ==="
 
PROJECT_DIR="${1:-.}"
 
# Step 1: Run Semgrep with SQL injection rules
echo ""
echo "--- Semgrep SQL Injection Scan ---"
if command -v semgrep &>/dev/null; then
    semgrep --config "p/python-sql-injection" \
            --config "p/owasp-top-ten" \
            "$PROJECT_DIR" \
            --json \
            --output /tmp/semgrep-sql-results.json \
            || true
 
    # Count findings
    FINDINGS=$(python3 -c "
import json
with open('/tmp/semgrep-sql-results.json') as f:
    data = json.load(f)
results = data.get('results', [])
sql_findings = [r for r in results if 'sql' in r.get('check_id', '').lower() or 'injection' in r.get('check_id', '').lower()]
print(len(sql_findings))
")
    echo "SQL injection findings: $FINDINGS"
else
    echo "Semgrep not installed. Install with: pip install semgrep"
fi
 
# Step 2: Check for common patterns with grep
echo ""
echo "--- Pattern-Based Detection ---"
echo "Checking for f-string SQL patterns..."
FSTRING_COUNT=$(grep -rn 'execute(f"' "$PROJECT_DIR" --include="*.py" | grep -ic "select\|insert\|update\|delete\|where" || echo "0")
echo "f-string SQL patterns found: $FSTRING_COUNT"
 
echo "Checking for .format() SQL patterns..."
FORMAT_COUNT=$(grep -rn '.format(' "$PROJECT_DIR" --include="*.py" | grep -ic "select\|insert\|update\|delete\|where" || echo "0")
echo ".format() SQL patterns found: $FORMAT_COUNT"
 
# Step 3: Summary
echo ""
echo "=== Summary ==="
TOTAL=$((FSTRING_COUNT + FORMAT_COUNT))
if [ "$TOTAL" -gt 0 ]; then
    echo "FAIL: Found $TOTAL potential SQL injection patterns"
    echo "Review findings and convert to parameterized queries"
    exit 1
else
    echo "PASS: No obvious SQL injection patterns detected"
fi

Python Pre-Commit Hook

#!/usr/bin/env python3
"""Pre-commit hook to detect SQL injection in AI-generated code."""
 
import ast
import sys
from pathlib import Path
 
class SQLInjectionDetector(ast.NodeVisitor):
    """AST-based detector for SQL injection patterns."""
 
    def __init__(self, filename: str):
        self.filename = filename
        self.findings: list[dict] = []
 
    def visit_Call(self, node: ast.Call):
        """Check for cursor.execute() calls with unsafe arguments."""
        # Check if this is a .execute() call
        if isinstance(node.func, ast.Attribute) and node.func.attr == "execute":
            if node.args:
                first_arg = node.args[0]
                self._check_sql_argument(first_arg, node.lineno)
 
        self.generic_visit(node)
 
    def _check_sql_argument(self, node: ast.expr, lineno: int):
        """Check if a SQL argument uses unsafe string formatting."""
        # f-string detection
        if isinstance(node, ast.JoinedStr):
            # Check if it looks like SQL
            for value in node.values:
                if isinstance(value, ast.Constant) and isinstance(value.value, str):
                    sql_keywords = ["select", "insert", "update", "delete", "where", "from"]
                    if any(kw in value.value.lower() for kw in sql_keywords):
                        self.findings.append({
                            "file": self.filename,
                            "line": lineno,
                            "type": "f-string SQL",
                            "severity": "ERROR",
                        })
                        return
 
        # .format() detection
        if isinstance(node, ast.Call):
            if isinstance(node.func, ast.Attribute) and node.func.attr == "format":
                if isinstance(node.func.value, ast.Constant):
                    sql_keywords = ["select", "insert", "update", "delete", "where"]
                    if any(kw in str(node.func.value.value).lower() for kw in sql_keywords):
                        self.findings.append({
                            "file": self.filename,
                            "line": lineno,
                            "type": ".format() SQL",
                            "severity": "ERROR",
                        })
 
        # String concatenation with + operator
        if isinstance(node, ast.BinOp) and isinstance(node.op, ast.Add):
            self.findings.append({
                "file": self.filename,
                "line": lineno,
                "type": "concatenation SQL (check manually)",
                "severity": "WARNING",
            })
 
def main():
    exit_code = 0
    for filepath in sys.argv[1:]:
        if not filepath.endswith(".py"):
            continue
        try:
            source = Path(filepath).read_text()
            tree = ast.parse(source)
            detector = SQLInjectionDetector(filepath)
            detector.visit(tree)
 
            for finding in detector.findings:
                print(
                    f"{finding['severity']}: {finding['file']}:{finding['line']} "
                    f"- {finding['type']} detected in .execute() call"
                )
                if finding["severity"] == "ERROR":
                    exit_code = 1
        except SyntaxError:
            pass
 
    sys.exit(exit_code)
 
if __name__ == "__main__":
    main()

Developer Education

Quick Reference: Safe SQL Patterns

Framework	Insecure (LLM often generates)	Secure
sqlite3	`execute(f"... {var} ...")`	`execute("... ? ...", (var,))`
psycopg2	`execute(f"... {var} ...")`	`execute("... %s ...", (var,))`
MySQL Connector	`execute(f"... {var} ...")`	`execute("... %s ...", (var,))`
SQLAlchemy Core	`text(f"... {var} ...")`	`text("... :param ..."), {"param": var}`
Django ORM	`raw(f"... {var} ...")`	`raw("... %s ...", [var])`
Flask-SQLAlchemy	`db.engine.execute(f"...")`	Use ORM methods or parameterized `text()`

Prompt Engineering for Safer Code

When using AI coding assistants, developers can reduce SQL injection risk by being explicit in their prompts:

# Instead of: "write a function to search users by name"
# Use: "write a function to search users by name using parameterized queries"
 
# Instead of: "create a database query for the report"
# Use: "create a parameterized database query using SQLAlchemy ORM for the report"
 
# Add to project .cursorrules or CLAUDE.md:
PROJECT_SECURITY_RULES = """
## Database Access Rules
- NEVER use f-strings, .format(), or string concatenation in SQL queries
- ALWAYS use parameterized queries with placeholders
- For dynamic table/column names, ALWAYS use an allowlist
- Prefer ORM methods over raw SQL
- When raw SQL is necessary, use framework-specific parameterization
"""

References

CWE-89: Improper Neutralization of Special Elements used in an SQL Command — https://cwe.mitre.org/data/definitions/89.html
"Do Users Write More Insecure Code with AI Assistants?" — Perry et al., Stanford University, 2023 — https://arxiv.org/abs/2211.03622
OWASP Top 10 for LLM Applications 2025 — LLM02: Insecure Output Handling — https://genai.owasp.org/llmrisk/
Semgrep SQL Injection Rules — https://semgrep.dev/p/python-sql-injection
OWASP SQL Injection Prevention Cheat Sheet — https://cheatsheetseries.owasp.org/cheatsheets/SQL_Injection_Prevention_Cheat_Sheet.html
CodeQL Python SQL Injection Queries — https://codeql.github.com/codeql-query-help/python/py-sql-injection/

Edit this page on GitHub

SQL Injection via LLM Code Generation

beginner12 min readUpdated 2026-03-20

How LLMs generate SQL injection vulnerabilities through string formatting, improper parameterization, and ORM misuse, with detection and prevention strategies.

code-gen-security sql-injection llm-vulnerabilities secure-coding

Overview

This article examines the specific SQL injection patterns that LLMs generate, why they generate them, and how to detect and prevent these vulnerabilities in AI-assisted development workflows.

Why LLMs Generate SQL Injection

The Training Data Problem

LLMs learn code patterns from their training corpus. Consider the relative prevalence of these two approaches in publicly available code:

# Pattern 1: String formatting (INSECURE but COMMON in training data)
# This pattern appears frequently in tutorials, blog posts, and Stack Overflow
query = f"SELECT * FROM users WHERE username = '{username}'"
cursor.execute(query)
 
# Pattern 2: Parameterized query (SECURE but less common in tutorials)
query = "SELECT * FROM users WHERE username = %s"
cursor.execute(query, (username,))

The insecure pattern appears more frequently in training data for several reasons:

Tutorials prioritize readability and simplicity over security
Quick examples and prototypes skip parameterization
Stack Overflow answers often use the simplest approach
Older code (before modern security awareness) uses string formatting

Empirical Evidence

# Common LLM-generated SQL injection patterns categorized by severity
 
SQL_INJECTION_PATTERNS = {
    "f_string_formatting": {
        "severity": "critical",
        "frequency": "very_common",
        "example": '''
def get_user(username: str):
    query = f"SELECT * FROM users WHERE name = '{username}'"
    return db.execute(query).fetchone()
''',
        "cwe": "CWE-89",
        "fix": '''
def get_user(username: str):
    query = "SELECT * FROM users WHERE name = ?"
    return db.execute(query, (username,)).fetchone()
''',
    },
 
    "format_method": {
        "severity": "critical",
        "frequency": "common",
        "example": '''
def search_products(category: str, min_price: float):
    query = "SELECT * FROM products WHERE category = '{}' AND price >= {}".format(
        category, min_price
    )
    return db.execute(query).fetchall()
''',
        "cwe": "CWE-89",
        "fix": '''
def search_products(category: str, min_price: float):
    query = "SELECT * FROM products WHERE category = ? AND price >= ?"
    return db.execute(query, (category, min_price)).fetchall()
''',
    },
 
    "percent_formatting": {
        "severity": "critical",
        "frequency": "common",
        "example": '''
def find_orders(customer_id: str, status: str):
    query = "SELECT * FROM orders WHERE customer_id = '%s' AND status = '%s'" % (
        customer_id, status
    )
    return db.execute(query).fetchall()
''',
        "cwe": "CWE-89",
        "fix": '''
def find_orders(customer_id: str, status: str):
    query = "SELECT * FROM orders WHERE customer_id = %s AND status = %s"
    return db.execute(query, (customer_id, status)).fetchall()
''',
    },
 
    "string_concatenation": {
        "severity": "critical",
        "frequency": "common",
        "example": '''
def delete_record(table: str, record_id: str):
    query = "DELETE FROM " + table + " WHERE id = " + record_id
    db.execute(query)
''',
        "cwe": "CWE-89",
        "fix": '''
def delete_record(table: str, record_id: int):
    # Table names cannot be parameterized - use allowlist
    allowed_tables = {"users", "orders", "products"}
    if table not in allowed_tables:
        raise ValueError(f"Invalid table: {table}")
    query = f"DELETE FROM {table} WHERE id = ?"
    db.execute(query, (record_id,))
''',
    },
 
    "orm_raw_query_misuse": {
        "severity": "high",
        "frequency": "moderate",
        "example": '''
from django.db import connection
 
def custom_report(start_date: str, end_date: str):
    with connection.cursor() as cursor:
        cursor.execute(
            f"SELECT * FROM reports WHERE date BETWEEN '{start_date}' AND '{end_date}'"
        )
        return cursor.fetchall()
''',
        "cwe": "CWE-89",
        "fix": '''
from django.db import connection
 
def custom_report(start_date: str, end_date: str):
    with connection.cursor() as cursor:
        cursor.execute(
            "SELECT * FROM reports WHERE date BETWEEN %s AND %s",
            [start_date, end_date],
        )
        return cursor.fetchall()
''',
    },
}

Advanced LLM SQL Injection Patterns

Dynamic Table and Column Names

# Pattern LLMs commonly generate (INSECURE)
def get_column_value(table: str, column: str, record_id: int):
    """LLMs often generate this pattern for dynamic queries."""
    query = f"SELECT {column} FROM {table} WHERE id = {record_id}"
    return db.execute(query).fetchone()
 
# Secure alternative with allowlisting
ALLOWED_TABLES = {"users", "products", "orders", "categories"}
ALLOWED_COLUMNS = {
    "users": {"id", "name", "email", "created_at"},
    "products": {"id", "name", "price", "category_id"},
    "orders": {"id", "user_id", "total", "status", "created_at"},
    "categories": {"id", "name", "description"},
}
 
def get_column_value_safe(table: str, column: str, record_id: int):
    """Secure version with allowlist validation."""
    if table not in ALLOWED_TABLES:
        raise ValueError(f"Invalid table: {table}")
    if column not in ALLOWED_COLUMNS.get(table, set()):
        raise ValueError(f"Invalid column: {column} for table {table}")
 
    # Safe because table and column are validated against allowlist
    query = f"SELECT {column} FROM {table} WHERE id = ?"
    return db.execute(query, (record_id,)).fetchone()

ORM Bypass Patterns

LLMs sometimes generate code that bypasses ORM protections by dropping to raw SQL, often because the developer asks for "better performance" or "a complex query":

# SQLAlchemy ORM bypass patterns LLMs generate
 
from sqlalchemy import create_engine, text
from sqlalchemy.orm import Session
 
engine = create_engine("sqlite:///app.db")
 
# INSECURE: LLM-generated raw SQL bypass
def search_users_insecure(search_term: str):
    """LLMs generate this when asked for 'fast search' or 'complex query'."""
    with Session(engine) as session:
        # This bypasses SQLAlchemy's parameterization
        result = session.execute(
            text(f"SELECT * FROM users WHERE name LIKE '%{search_term}%'")
        )
        return result.fetchall()
 
# SECURE: Proper parameterized approach
def search_users_secure(search_term: str):
    """Correct parameterized version."""
    with Session(engine) as session:
        result = session.execute(
            text("SELECT * FROM users WHERE name LIKE :pattern"),
            {"pattern": f"%{search_term}%"},
        )
        return result.fetchall()
 
# SECURE: Using ORM properly (preferred)
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from sqlalchemy import String, select
 
class Base(DeclarativeBase):
    pass
 
class User(Base):
    __tablename__ = "users"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(100))
 
def search_users_orm(search_term: str):
    """Correct ORM-based approach - no raw SQL needed."""
    with Session(engine) as session:
        stmt = select(User).where(User.name.contains(search_term))
        return session.scalars(stmt).all()

LIKE Clause Injection

LLMs frequently mishandle the LIKE clause, which requires special attention because the % and _ characters have special meaning:

# INSECURE: LLM-generated LIKE query (very common pattern)
def search_products(query: str):
    sql = f"SELECT * FROM products WHERE name LIKE '%{query}%'"
    return db.execute(sql).fetchall()
 
# Still INSECURE: parameterized but LIKE wildcards not escaped
def search_products_partial_fix(query: str):
    sql = "SELECT * FROM products WHERE name LIKE %s"
    return db.execute(sql, (f"%{query}%",)).fetchall()
    # User can still inject LIKE wildcards: "%" matches everything
 
# SECURE: proper LIKE with wildcard escaping
def search_products_secure(query: str):
    # Escape LIKE wildcards in user input
    escaped = query.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
    sql = "SELECT * FROM products WHERE name LIKE %s ESCAPE '\\'"
    return db.execute(sql, (f"%{escaped}%",)).fetchall()

Detection with Static Analysis

Semgrep Rules

Semgrep is effective at catching LLM-generated SQL injection patterns:

# Semgrep rules for detecting LLM-generated SQL injection
SEMGREP_SQL_INJECTION_RULES = """
rules:
  - id: python-sql-fstring-injection
    patterns:
      - pattern: |
          $CURSOR.execute(f"...{$VAR}...")
    message: >
      SQL query uses f-string with variable interpolation.
      Use parameterized queries instead. This pattern is commonly
      generated by AI coding assistants.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-89
      owasp: A03:2021 - Injection
 
  - id: python-sql-format-injection
    patterns:
      - pattern: |
          $CURSOR.execute("...".format(...))
    message: >
      SQL query uses .format() string interpolation.
      Use parameterized queries instead.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-89
 
  - id: python-sql-concat-injection
    patterns:
      - pattern: |
          $CURSOR.execute("..." + $VAR + "...")
    message: >
      SQL query uses string concatenation.
      Use parameterized queries instead.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-89
 
  - id: sqlalchemy-raw-fstring
    patterns:
      - pattern: |
          text(f"...{$VAR}...")
    message: >
      SQLAlchemy text() with f-string bypasses ORM parameterization.
      Use text("... :param ...") with named parameters.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-89
"""

CodeQL Queries

# CodeQL query for detecting SQL injection in AI-generated Python code
CODEQL_SQL_INJECTION = """
/**
 * @name SQL injection from AI-generated code patterns
 * @description Detects common SQL injection patterns that AI coding
 *              assistants frequently generate.
 * @kind path-problem
 * @problem.severity error
 * @precision high
 * @id py/ai-generated-sql-injection
 * @tags security
 *       cwe-89
 */
 
import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.Concepts
import semmle.python.dataflow.new.RemoteFlowSources
 
module SqlInjectionConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    source instanceof RemoteFlowSource
  }
 
  predicate isSink(DataFlow::Node sink) {
    exists(SqlExecution exec | sink = exec.getSql())
  }
}
 
module SqlInjectionFlow = TaintTracking::Global<SqlInjectionConfig>;
 
import SqlInjectionFlow::PathGraph
 
from SqlInjectionFlow::PathNode source, SqlInjectionFlow::PathNode sink
where SqlInjectionFlow::flowPath(source, sink)
select sink.getNode(), source, sink,
  "SQL injection vulnerability: user input from $@ flows to SQL query.",
  source.getNode(), "user input"
"""

CI/CD Integration

Automated SQL Injection Detection Pipeline

#!/bin/bash
# CI script to detect SQL injection in AI-generated code
# Run this in your CI/CD pipeline on every PR
 
set -euo pipefail
 
echo "=== SQL Injection Detection for AI-Generated Code ==="
 
PROJECT_DIR="${1:-.}"
 
# Step 1: Run Semgrep with SQL injection rules
echo ""
echo "--- Semgrep SQL Injection Scan ---"
if command -v semgrep &>/dev/null; then
    semgrep --config "p/python-sql-injection" \
            --config "p/owasp-top-ten" \
            "$PROJECT_DIR" \
            --json \
            --output /tmp/semgrep-sql-results.json \
            || true
 
    # Count findings
    FINDINGS=$(python3 -c "
import json
with open('/tmp/semgrep-sql-results.json') as f:
    data = json.load(f)
results = data.get('results', [])
sql_findings = [r for r in results if 'sql' in r.get('check_id', '').lower() or 'injection' in r.get('check_id', '').lower()]
print(len(sql_findings))
")
    echo "SQL injection findings: $FINDINGS"
else
    echo "Semgrep not installed. Install with: pip install semgrep"
fi
 
# Step 2: Check for common patterns with grep
echo ""
echo "--- Pattern-Based Detection ---"
echo "Checking for f-string SQL patterns..."
FSTRING_COUNT=$(grep -rn 'execute(f"' "$PROJECT_DIR" --include="*.py" | grep -ic "select\|insert\|update\|delete\|where" || echo "0")
echo "f-string SQL patterns found: $FSTRING_COUNT"
 
echo "Checking for .format() SQL patterns..."
FORMAT_COUNT=$(grep -rn '.format(' "$PROJECT_DIR" --include="*.py" | grep -ic "select\|insert\|update\|delete\|where" || echo "0")
echo ".format() SQL patterns found: $FORMAT_COUNT"
 
# Step 3: Summary
echo ""
echo "=== Summary ==="
TOTAL=$((FSTRING_COUNT + FORMAT_COUNT))
if [ "$TOTAL" -gt 0 ]; then
    echo "FAIL: Found $TOTAL potential SQL injection patterns"
    echo "Review findings and convert to parameterized queries"
    exit 1
else
    echo "PASS: No obvious SQL injection patterns detected"
fi

Python Pre-Commit Hook

#!/usr/bin/env python3
"""Pre-commit hook to detect SQL injection in AI-generated code."""
 
import ast
import sys
from pathlib import Path
 
class SQLInjectionDetector(ast.NodeVisitor):
    """AST-based detector for SQL injection patterns."""
 
    def __init__(self, filename: str):
        self.filename = filename
        self.findings: list[dict] = []
 
    def visit_Call(self, node: ast.Call):
        """Check for cursor.execute() calls with unsafe arguments."""
        # Check if this is a .execute() call
        if isinstance(node.func, ast.Attribute) and node.func.attr == "execute":
            if node.args:
                first_arg = node.args[0]
                self._check_sql_argument(first_arg, node.lineno)
 
        self.generic_visit(node)
 
    def _check_sql_argument(self, node: ast.expr, lineno: int):
        """Check if a SQL argument uses unsafe string formatting."""
        # f-string detection
        if isinstance(node, ast.JoinedStr):
            # Check if it looks like SQL
            for value in node.values:
                if isinstance(value, ast.Constant) and isinstance(value.value, str):
                    sql_keywords = ["select", "insert", "update", "delete", "where", "from"]
                    if any(kw in value.value.lower() for kw in sql_keywords):
                        self.findings.append({
                            "file": self.filename,
                            "line": lineno,
                            "type": "f-string SQL",
                            "severity": "ERROR",
                        })
                        return
 
        # .format() detection
        if isinstance(node, ast.Call):
            if isinstance(node.func, ast.Attribute) and node.func.attr == "format":
                if isinstance(node.func.value, ast.Constant):
                    sql_keywords = ["select", "insert", "update", "delete", "where"]
                    if any(kw in str(node.func.value.value).lower() for kw in sql_keywords):
                        self.findings.append({
                            "file": self.filename,
                            "line": lineno,
                            "type": ".format() SQL",
                            "severity": "ERROR",
                        })
 
        # String concatenation with + operator
        if isinstance(node, ast.BinOp) and isinstance(node.op, ast.Add):
            self.findings.append({
                "file": self.filename,
                "line": lineno,
                "type": "concatenation SQL (check manually)",
                "severity": "WARNING",
            })
 
def main():
    exit_code = 0
    for filepath in sys.argv[1:]:
        if not filepath.endswith(".py"):
            continue
        try:
            source = Path(filepath).read_text()
            tree = ast.parse(source)
            detector = SQLInjectionDetector(filepath)
            detector.visit(tree)
 
            for finding in detector.findings:
                print(
                    f"{finding['severity']}: {finding['file']}:{finding['line']} "
                    f"- {finding['type']} detected in .execute() call"
                )
                if finding["severity"] == "ERROR":
                    exit_code = 1
        except SyntaxError:
            pass
 
    sys.exit(exit_code)
 
if __name__ == "__main__":
    main()

Developer Education

Quick Reference: Safe SQL Patterns

Framework	Insecure (LLM often generates)	Secure
sqlite3	`execute(f"... {var} ...")`	`execute("... ? ...", (var,))`
psycopg2	`execute(f"... {var} ...")`	`execute("... %s ...", (var,))`
MySQL Connector	`execute(f"... {var} ...")`	`execute("... %s ...", (var,))`
SQLAlchemy Core	`text(f"... {var} ...")`	`text("... :param ..."), {"param": var}`
Django ORM	`raw(f"... {var} ...")`	`raw("... %s ...", [var])`
Flask-SQLAlchemy	`db.engine.execute(f"...")`	Use ORM methods or parameterized `text()`

Prompt Engineering for Safer Code

When using AI coding assistants, developers can reduce SQL injection risk by being explicit in their prompts:

# Instead of: "write a function to search users by name"
# Use: "write a function to search users by name using parameterized queries"
 
# Instead of: "create a database query for the report"
# Use: "create a parameterized database query using SQLAlchemy ORM for the report"
 
# Add to project .cursorrules or CLAUDE.md:
PROJECT_SECURITY_RULES = """
## Database Access Rules
- NEVER use f-strings, .format(), or string concatenation in SQL queries
- ALWAYS use parameterized queries with placeholders
- For dynamic table/column names, ALWAYS use an allowlist
- Prefer ORM methods over raw SQL
- When raw SQL is necessary, use framework-specific parameterization
"""

References

CWE-89: Improper Neutralization of Special Elements used in an SQL Command — https://cwe.mitre.org/data/definitions/89.html
"Do Users Write More Insecure Code with AI Assistants?" — Perry et al., Stanford University, 2023 — https://arxiv.org/abs/2211.03622
OWASP Top 10 for LLM Applications 2025 — LLM02: Insecure Output Handling — https://genai.owasp.org/llmrisk/
Semgrep SQL Injection Rules — https://semgrep.dev/p/python-sql-injection
OWASP SQL Injection Prevention Cheat Sheet — https://cheatsheetseries.owasp.org/cheatsheets/SQL_Injection_Prevention_Cheat_Sheet.html
CodeQL Python SQL Injection Queries — https://codeql.github.com/codeql-query-help/python/py-sql-injection/

Edit this page on GitHub

SQL Injection via LLM Code Generation

Related articles

SQL Injection via LLM Code Generation

Related articles