Law Enforcement AI Security
Security testing for AI in law enforcement: predictive policing manipulation, facial recognition evasion and bias, evidence analysis AI exploitation, and surveillance system attacks.
Law enforcement AI operates in one of the highest-stakes environments for any AI deployment. These systems influence decisions about who police investigate, who gets arrested, how bail is set, how sentences are determined, and who is surveilled. Errors in law enforcement AI can cause wrongful arrests, civil liberties violations, discriminatory policing, and erosion of community trust. The security of these systems is therefore both a cybersecurity concern and a civil rights concern.
Red team testing of law enforcement AI must grapple with a fundamental tension: the same adversarial techniques that reveal security weaknesses can also be used by criminals to evade detection. This page addresses that tension by focusing on defensive testing methodologies that improve system security without creating a playbook for evasion.
Predictive Policing AI
Feedback Loop Exploitation
Predictive policing systems (such as PredPol/Geolitica, HunchLab, and custom agency systems) use historical crime data to predict where and when crimes are likely to occur. These systems are vulnerable to feedback loops: increased police presence in predicted areas leads to more arrests, which generates more crime data for that area, which reinforces the prediction, regardless of the actual crime rate.
An adversary can exploit these feedback loops to manipulate policing patterns:
# Modeling feedback loop exploitation in predictive policing
def simulate_feedback_attack(predictive_system, target_area, duration_days):
"""
Simulate how an adversary can amplify policing in a
target area by generating artificial crime signals.
"""
results = []
for day in range(duration_days):
# Step 1: Generate artificial crime reports in target area
# (filing false reports, staging visible "suspicious activity")
artificial_signals = generate_artificial_signals(
area=target_area,
count=3, # small number to avoid detection
crime_types=["suspicious_activity", "noise_complaint",
"loitering"],
)
# Step 2: System ingests signals and updates predictions
predictions = predictive_system.predict(day)
# Step 3: Increased police presence generates real data
# (more stops, more minor arrests in the area)
real_data_from_presence = estimate_presence_effect(
patrol_hours=predictions[target_area].recommended_hours
)
# Step 4: Real data reinforces the artificial signal
results.append({
"day": day,
"artificial_signals": len(artificial_signals),
"predicted_risk_score": predictions[target_area].risk_score,
"estimated_patrol_hours": predictions[target_area].recommended_hours,
"feedback_amplification": predictions[target_area].risk_score /
results[0]["predicted_risk_score"]
if results else 1.0,
})
return resultsGeographic Bias Amplification
Predictive policing systems trained on historical arrest data inherit and amplify historical policing biases. A red team assessment should measure:
- Baseline geographic distribution — Where does the system predict crime? How does this map to demographic data?
- Sensitivity to data manipulation — How many additional reports in an area are needed to significantly change predictions?
- Decay rate — How long do artificial signals persist in predictions after the signal source is removed?
- Cross-area contamination — Can manipulation in one area affect predictions in adjacent areas?
Facial Recognition Systems
Adversarial Evasion Techniques
Facial recognition systems used by law enforcement for identification, surveillance, and access control are vulnerable to adversarial evasion at multiple levels:
Physical adversarial attacks:
| Technique | Description | Effectiveness | Detectability |
|---|---|---|---|
| Adversarial makeup | Geometric patterns that disrupt facial landmark detection | Moderate — effectiveness varies by system | Low — patterns can be subtle |
| Adversarial accessories | Glasses, hats, or scarves with printed adversarial patterns | High against targeted systems | Medium — unusual appearance |
| IR illumination | IR LEDs that overwhelm camera sensors without visible light | High against IR-based systems | Low — invisible to humans |
| Projection attacks | Projecting adversarial patterns onto the face | High in controlled settings | High — requires equipment |
| 3D-printed masks | Masks designed to impersonate another identity | Variable — depends on liveness detection | High — detectable by humans |
# Testing facial recognition robustness
# NOTE: Use synthetic face data only — never real individuals
def test_facial_recognition_robustness(fr_system, synthetic_faces):
"""
Assess facial recognition robustness against adversarial
modifications using synthetic face data.
"""
results = {
"baseline_accuracy": None,
"evasion_results": {},
}
# Baseline accuracy on unmodified synthetic faces
baseline_correct = 0
for face in synthetic_faces:
match = fr_system.identify(face.image)
if match.identity == face.identity:
baseline_correct += 1
results["baseline_accuracy"] = baseline_correct / len(synthetic_faces)
# Test each evasion technique
evasion_techniques = [
"gaussian_noise",
"adversarial_patch",
"landmark_perturbation",
"color_channel_shift",
"geometric_transform",
]
for technique in evasion_techniques:
evaded = 0
for face in synthetic_faces:
modified = apply_evasion(face.image, technique)
match = fr_system.identify(modified)
if match.identity != face.identity:
evaded += 1
results["evasion_results"][technique] = {
"evasion_rate": evaded / len(synthetic_faces),
"technique": technique,
}
return resultsDemographic Bias Testing
Facial recognition bias is well-documented: systems consistently perform worse on darker-skinned faces, women, and older individuals. Red team testing must quantify this bias:
# Demographic bias assessment for facial recognition
def test_demographic_bias(fr_system, demographic_dataset):
"""
Measure accuracy across demographic groups using the
intersectional analysis methodology from Buolamwini & Gebru.
"""
accuracy_by_group = {}
for face in demographic_dataset:
group_key = f"{face.skin_tone}_{face.gender}_{face.age_range}"
if group_key not in accuracy_by_group:
accuracy_by_group[group_key] = {"correct": 0, "total": 0,
"false_positive": 0,
"false_negative": 0}
result = fr_system.identify(face.image)
accuracy_by_group[group_key]["total"] += 1
if result.identity == face.identity:
accuracy_by_group[group_key]["correct"] += 1
elif result.identity is not None:
accuracy_by_group[group_key]["false_positive"] += 1
else:
accuracy_by_group[group_key]["false_negative"] += 1
# Report disparities
accuracies = {k: v["correct"] / v["total"]
for k, v in accuracy_by_group.items()}
best = max(accuracies.values())
worst = min(accuracies.values())
return {
"accuracy_by_group": accuracies,
"best_group_accuracy": best,
"worst_group_accuracy": worst,
"accuracy_gap": best - worst,
"disparity_ratio": worst / best if best > 0 else 0,
}False Identification Risk
The most dangerous failure mode of law enforcement facial recognition is false identification — incorrectly matching an individual to a suspect or wanted person. Multiple cases of wrongful arrest based on facial recognition misidentification have been documented in the United States.
Red team testing should measure:
- False positive rate at operational thresholds — What is the false match rate at the confidence threshold the agency uses for investigative leads?
- Gallery size effects — How does accuracy degrade as the reference database grows?
- Image quality sensitivity — How does accuracy degrade with surveillance-quality images (low resolution, poor lighting, oblique angles)?
Evidence Analysis AI
Digital Evidence AI Manipulation
Law enforcement increasingly uses AI to process digital evidence: analyzing communications for criminal activity indicators, processing financial records for fraud patterns, and identifying child exploitation material. These systems can be manipulated to either miss evidence or fabricate indicators.
# Testing evidence analysis AI for manipulation
evidence_ai_tests = {
"evidence_suppression": {
"description": "Can communications be structured to evade "
"AI detection of criminal activity indicators?",
"technique": "Use code words, steganography, or adversarial "
"text that fools the AI while conveying meaning "
"to the human recipient",
"test_cases": [
"Substitution cipher in casual conversation",
"Meaning conveyed through sentence structure rather "
"than vocabulary",
"Information embedded in message metadata or timing",
],
},
"false_indicator_injection": {
"description": "Can an adversary plant communications that "
"trigger AI criminal activity indicators "
"against an innocent target?",
"technique": "Send messages to the target that contain "
"criminal activity language, causing AI to "
"flag the target's communications",
"severity": "Critical — could lead to wrongful investigation",
},
}Chain of Custody for AI Evidence
When AI processes evidence, the chain of custody must include the AI system itself. Red team assessments should verify:
- Model versioning — Is the exact model version used to process evidence recorded?
- Reproducibility — Can the AI analysis be reproduced with the same inputs and model version?
- Tampering detection — Can the AI system or its outputs be modified without detection?
- Audit trail completeness — Does the audit trail capture all AI processing steps, parameters, and intermediate results?
Surveillance System Attacks
Scope Expansion Attacks
AI-powered surveillance systems (automatic license plate readers, video analytics, communications monitoring) have defined legal scopes — they are authorized to collect specific data for specific purposes. An adversary (or a system error) can cause scope expansion where the AI collects or analyzes data beyond its authorized scope.
Red team testing should probe whether:
- The AI can be configured (or misconfigured) to monitor communications or individuals outside its legal authority
- Alerting thresholds can be manipulated to generate false alerts that justify expanded surveillance
- Data retention policies are enforced by the AI or can be bypassed
- Cross-system data sharing is properly restricted
Defensive Recommendations
Bias auditing as a security requirement
Require regular demographic bias audits of all law enforcement AI. Treat disparate accuracy across demographic groups as a security vulnerability, not just a fairness concern.
Human decision authority
Ensure that no law enforcement action (arrest, search, surveillance initiation) is taken solely on the basis of AI output. AI should inform human decision-making, not replace it.
Feedback loop monitoring
Implement monitoring to detect and break feedback loops in predictive systems. Regularly audit whether predictions reflect actual crime patterns or policing patterns.
Evidence handling standards
Establish chain-of-custody requirements for AI-processed evidence that include model version, processing parameters, and reproducibility verification.
Further Reading
- Government AI Security Overview — Broader government AI context
- Public Services AI — Citizen-facing government AI
- FedRAMP for AI — Federal compliance framework
- AI Incident Classification — How to classify AI security incidents