CWE Mapping for AI-Generated Vulnerabilities
Common AI-generated vulnerabilities mapped to CWE identifiers with real examples: SQL injection (CWE-89), XSS (CWE-79), path traversal (CWE-22), command injection (CWE-78), and hardcoded credentials (CWE-798).
This page maps the most prevalent vulnerabilities found in AI-generated code to their CWE (Common Weakness Enumeration) identifiers. Each entry includes examples of how AI models typically generate the vulnerability, why the model produces this pattern, and how to detect it. These mappings are based on published research analyzing thousands of AI-generated code samples.
CWE-89: SQL Injection
Prevalence in AI-generated code: High
SQL injection is the most frequently observed vulnerability in AI-generated code. Models consistently suggest string concatenation or f-string formatting for SQL queries because these patterns are overwhelmingly common in their training data.
How AI Generates This Vulnerability
# AI-generated: vulnerable to SQL injection
def get_user(username):
query = f"SELECT * FROM users WHERE username = '{username}'"
cursor.execute(query)
return cursor.fetchone()
# Secure alternative the model should suggest
def get_user(username):
query = "SELECT * FROM users WHERE username = %s"
cursor.execute(query, (username,))
return cursor.fetchone()// AI-generated: vulnerable to SQL injection
app.get('/users/:id', async (req, res) => {
const result = await db.query(
`SELECT * FROM users WHERE id = ${req.params.id}`
);
res.json(result.rows);
});
// Secure alternative
app.get('/users/:id', async (req, res) => {
const result = await db.query(
'SELECT * FROM users WHERE id = $1',
[req.params.id]
);
res.json(result.rows);
});Why Models Produce This Pattern
The training data contains far more examples of string-formatted SQL queries than parameterized queries. Tutorial code, Stack Overflow answers, and quick prototypes overwhelmingly use string formatting because it is simpler to write and explain. The model learns that string formatting is the "default" way to construct SQL queries.
Additionally, models optimize for code that is syntactically similar to the surrounding context. If any existing code in the project uses string formatting for SQL, the model will follow that pattern.
Detection Approach
- Static analysis: Search for string concatenation or interpolation in arguments to
execute(),query(), or ORM raw query methods - Pattern matching: Flag any SQL-like string that contains variable interpolation markers (
${},%soutside parameterized context,f"...{}")
CWE-79: Cross-Site Scripting (XSS)
Prevalence in AI-generated code: High
AI models frequently generate code that renders user-supplied content without encoding or sanitization. This is especially common in server-side rendering and API response construction.
How AI Generates This Vulnerability
# AI-generated: reflected XSS
@app.route('/search')
def search():
query = request.args.get('q', '')
results = search_engine.search(query)
return f"<h1>Results for: {query}</h1>" + render_results(results)
# Secure alternative
from markupsafe import escape
@app.route('/search')
def search():
query = request.args.get('q', '')
results = search_engine.search(query)
return f"<h1>Results for: {escape(query)}</h1>" + render_results(results)// AI-generated: DOM-based XSS
function displayMessage(message) {
document.getElementById('output').innerHTML = message;
}
// Secure alternative
function displayMessage(message) {
document.getElementById('output').textContent = message;
}Why Models Produce This Pattern
Models see innerHTML far more frequently than textContent in training data because innerHTML is more versatile and commonly used in tutorials. Similarly, server-side code that directly embeds variables in HTML templates is more common in training data than code using proper output encoding.
Modern frameworks like React mitigate this through automatic escaping, but AI models often generate code that bypasses these protections through dangerouslySetInnerHTML or raw HTML string construction.
CWE-22: Path Traversal
Prevalence in AI-generated code: Medium
AI models generate file access code that does not validate paths against traversal sequences (../). This is because path validation is rarely included in training data examples of file handling.
How AI Generates This Vulnerability
# AI-generated: path traversal
@app.route('/files/<filename>')
def serve_file(filename):
return send_file(os.path.join(UPLOAD_DIR, filename))
# Secure alternative
@app.route('/files/<filename>')
def serve_file(filename):
filename = secure_filename(filename)
file_path = os.path.join(UPLOAD_DIR, filename)
if not os.path.abspath(file_path).startswith(os.path.abspath(UPLOAD_DIR)):
abort(403)
return send_file(file_path)// AI-generated: path traversal in Node.js
app.get('/download', (req, res) => {
const filePath = path.join(__dirname, 'uploads', req.query.file);
res.sendFile(filePath);
});
// Secure alternative
app.get('/download', (req, res) => {
const fileName = path.basename(req.query.file); // Strip path components
const filePath = path.join(__dirname, 'uploads', fileName);
const resolved = path.resolve(filePath);
if (!resolved.startsWith(path.resolve(path.join(__dirname, 'uploads')))) {
return res.status(403).send('Forbidden');
}
res.sendFile(resolved);
});Why Models Produce This Pattern
File serving code in tutorials and examples almost never includes path traversal validation because it is not relevant to demonstrating file serving functionality. The model learns the "happy path" pattern of joining paths and serving files without understanding the security implications.
CWE-78: Command Injection
Prevalence in AI-generated code: Medium
AI models generate code that constructs shell commands using string formatting rather than using safe subprocess APIs with argument arrays.
How AI Generates This Vulnerability
# AI-generated: command injection
def convert_image(input_path, output_format):
os.system(f"convert {input_path} output.{output_format}")
# Secure alternative
def convert_image(input_path, output_format):
allowed_formats = {'png', 'jpg', 'gif', 'webp'}
if output_format not in allowed_formats:
raise ValueError(f"Unsupported format: {output_format}")
subprocess.run(
['convert', input_path, f'output.{output_format}'],
check=True,
capture_output=True
)// AI-generated: command injection in Go
func runDiagnostic(host string) (string, error) {
cmd := exec.Command("sh", "-c", "ping -c 4 " + host)
output, err := cmd.Output()
return string(output), err
}
// Secure alternative
func runDiagnostic(host string) (string, error) {
cmd := exec.Command("ping", "-c", "4", host)
output, err := cmd.Output()
return string(output), err
}Why Models Produce This Pattern
os.system() and shell string construction are more concise and appear more frequently in training data than subprocess.run() with argument arrays. Models optimize for conciseness and pattern frequency, both of which favor the insecure approach.
The Go example is particularly notable: models frequently use sh -c with string concatenation instead of passing arguments directly to exec.Command, because many training examples use shell invocation for convenience.
CWE-798: Hardcoded Credentials
Prevalence in AI-generated code: Medium
AI models reproduce credential patterns from their training data, including API keys, database connection strings, and default passwords.
How AI Generates This Vulnerability
# AI-generated: hardcoded credentials
import boto3
client = boto3.client(
's3',
aws_access_key_id='AKIAIOSFODNN7EXAMPLE',
aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
region_name='us-east-1'
)
# Secure alternative
import boto3
client = boto3.client('s3') # Uses default credential chain// AI-generated: hardcoded database credentials
const pool = new Pool({
host: 'localhost',
port: 5432,
user: 'admin',
password: 'password123',
database: 'myapp'
});
// Secure alternative
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});Why Models Produce This Pattern
Training data is full of example credentials in documentation, tutorials, and code that was never meant for production. The model does not distinguish between placeholder values and real credentials — it reproduces patterns it has seen. The AWS example key above (AKIAIOSFODNN7EXAMPLE) is AWS's own documented example key and appears frequently in training data.
Additional CWE Patterns
Beyond the five primary patterns above, AI models commonly generate code exhibiting:
| CWE | Vulnerability | AI Generation Pattern |
|---|---|---|
| CWE-327 | Broken Cryptography | Using MD5/SHA1 for password hashing instead of bcrypt/argon2 |
| CWE-502 | Deserialization | Using pickle.loads(), yaml.load() without safe loader |
| CWE-611 | XXE Processing | Parsing XML without disabling external entities |
| CWE-918 | SSRF | Fetching user-supplied URLs without validation |
| CWE-209 | Error Information Leak | Returning full stack traces in API responses |
| CWE-330 | Insufficient Randomness | Using random module instead of secrets for security-sensitive operations |
| CWE-295 | Improper Certificate Validation | Setting verify=False in HTTP requests |
Systematic Detection
To systematically detect AI-generated vulnerabilities, combine:
- SAST tools configured for AI patterns — Standard SAST tools detect these vulnerabilities but may need tuning to reduce false positives on AI-generated code, which tends to have different characteristics than human code
- Custom rules for AI-specific patterns — Create detection rules for patterns that are uniquely common in AI output (e.g., the specific string formatting styles models prefer)
- CWE-focused code review checklists — Provide reviewers with checklists based on these CWE mappings when reviewing AI-generated pull requests
- Pre-commit hooks — Implement hooks that flag the most common AI vulnerability patterns before code enters version control
Related Topics
- Language-Specific Risks — How these patterns manifest differently across languages
- AI-Generated Vulnerability Patterns — Why models produce vulnerable code
- Code Suggestion Poisoning — How attackers amplify these patterns