EU AI Act Compliance Testing
EU AI Act risk categories, testing requirements for high-risk AI systems, conformity assessment procedures, and how red teaming supports EU AI Act compliance.
The EU AI Act is the most comprehensive AI regulation globally and creates binding testing requirements that directly affect AI red teaming practice. Understanding these requirements is essential for red teamers serving clients that deploy AI in the EU market.
Risk Classification System
The Four Tiers
| Tier | Risk Level | Regulatory Approach | Examples |
|---|---|---|---|
| Title II | Unacceptable | Prohibited | Social scoring by governments, untargeted facial recognition, emotion recognition in workplaces/schools, manipulative AI |
| Annex III | High | Mandatory requirements + conformity assessment | AI in hiring, credit scoring, law enforcement, critical infrastructure, education, migration |
| Article 50 | Limited | Transparency obligations | Chatbots (must disclose AI nature), deepfake generators (must label), emotion recognition |
| Unregulated | Minimal | No specific requirements | Spam filters, AI in video games, inventory management |
High-Risk Classification Criteria
An AI system is high-risk if it falls into one of the categories in Annex III:
| Annex III Area | Examples | Red Team Focus |
|---|---|---|
| Biometrics | Facial recognition, emotion detection | Accuracy, bias, adversarial robustness |
| Critical infrastructure | Energy grid management, water treatment, traffic control | Safety, reliability, adversarial resilience |
| Education | Automated grading, admission decisions | Fairness, bias, manipulation resistance |
| Employment | CV screening, interview analysis, performance monitoring | Bias, fairness, system prompt security |
| Essential services | Credit scoring, insurance pricing, emergency dispatch | Discrimination testing, accuracy validation |
| Law enforcement | Predictive policing, evidence analysis | Bias, accuracy, adversarial manipulation |
| Migration | Visa processing, border control | Fairness, accuracy, adversarial robustness |
| Justice | Sentencing recommendations, recidivism prediction | Bias, fairness, manipulation resistance |
Testing Requirements by Risk Tier
High-Risk Systems (Article 9 - Risk Management)
High-risk AI providers must implement a risk management system that includes:
| Requirement | Article | Red Team Relevance |
|---|---|---|
| Known and foreseeable risk identification | Art. 9(2)(a) | Threat modeling and vulnerability assessment |
| Testing with "previously unknown inputs" | Art. 9(6) | Adversarial testing, fuzzing, edge case exploration |
| Reasonably foreseeable misuse evaluation | Art. 9(2)(b) | Red teaming for misuse scenarios |
| Residual risk assessment | Art. 9(2)(d) | Post-mitigation validation testing |
| Testing against impacts on health, safety, rights | Art. 9(7) | Safety-focused red teaming |
High-Risk Systems (Article 15 - Accuracy, Robustness, Cybersecurity)
| Requirement | What It Means | Testing Approach |
|---|---|---|
| Appropriate accuracy levels | System performs as intended | Accuracy benchmarking under normal and adversarial conditions |
| Resilience to errors | System handles unexpected inputs gracefully | Fuzzing, boundary testing, error injection |
| Resilience to unauthorized access | System resists attacks | Security red teaming, prompt injection, data extraction |
| Adversarial robustness | System resists adversarial manipulation | Adversarial example testing, evasion attacks |
| Cybersecurity measures | System has appropriate security controls | Infrastructure security assessment, API security |
Article 15 Testing in Practice
Article 15 is the most directly actionable article for red teamers. It requires that high-risk AI systems achieve "appropriate levels of accuracy, robustness, and cybersecurity" and perform consistently throughout their lifecycle. Here is how to structure a test plan against Article 15 requirements:
| Art. 15 Paragraph | Requirement | Red Team Test Cases |
|---|---|---|
| 15(1) | Accuracy appropriate to purpose | Benchmark on representative test set, measure degradation under adversarial conditions |
| 15(2) | Resilience to errors and faults | Input validation testing, malformed data handling, graceful degradation |
| 15(3) | Resilience to unauthorized third-party attempts | Prompt injection, data exfiltration, model extraction, jailbreaking |
| 15(4) | Technical redundancy solutions | Failover testing, backup model validation, degraded mode operation |
| 15(5) | Cybersecurity throughout lifecycle | API security assessment, dependency scanning, CI/CD pipeline security |
General-Purpose AI (GPAI) Models with Systemic Risk
Models exceeding 10^25 FLOPs (or designated by the AI Office) face additional obligations under Articles 51-55. As of early 2026, this threshold captures frontier models from major AI labs. The AI Office can also designate models below this threshold if they present comparable systemic risks.
Adversarial testing
Conduct adversarial testing, including red teaming, to identify and mitigate systemic risks. This is an explicit red teaming mandate under Article 55(1)(a).
Model evaluation
Evaluate the model against standardized benchmarks and state-of-the-art evaluation tools. Article 55(1)(b) requires evaluation against EU-harmonized benchmarks once available.
Systemic risk assessment
Assess and mitigate systemic risks, including those related to the model's capabilities and limitations.
Incident reporting
Report serious incidents to the AI Office and relevant national authorities within 30 days.
Cybersecurity
Ensure adequate cybersecurity protections for the model and its weights. This includes protecting against model theft, unauthorized access, and adversarial manipulation.
What "Adversarial Testing Including Red Teaming" Means
The EU AI Act does not define a specific red teaming methodology, but the supporting guidance from CEN/CENELEC and the AI Office points to several expectations:
| Expectation | What It Implies for Red Teams |
|---|---|
| "State of the art" testing methods | Use current techniques (GCG attacks, multi-step jailbreaks, indirect injection), not just basic prompt testing |
| Coverage of systemic risks | Test for CBRN content generation, disinformation capabilities, cyber-offense capabilities, and discrimination |
| Documented methodology | Maintain detailed test logs, tool configurations, and reproducible test cases |
| Regular cadence | Testing is not a one-time event; it must be repeated as the model is updated |
| Independent evaluation | While self-assessment is the default, the AI Office can request third-party evaluations |
GPAI Transparency Obligations (All GPAI Models)
Even GPAI models without systemic risk must comply with Article 53 transparency requirements:
| Obligation | Description | Red Team Relevance |
|---|---|---|
| Technical documentation | Detailed model documentation following Annex XI | Red team reports contribute to this documentation |
| Training data summary | Summary of training data including copyrighted material | Relates to data extraction testing |
| EU AI Office cooperation | Provide information on request | Red team findings may be requested |
| Downstream notification | Inform downstream providers of capabilities and limitations | Findings about model limitations inform this |
Conformity Assessment
Self-Assessment vs. Third-Party Assessment
| System Type | Assessment Type | Who Performs It |
|---|---|---|
| Most high-risk AI (Annex III) | Self-assessment (internal) | The provider, following Annex VI procedures |
| Biometric AI (Annex III, point 1) | Third-party assessment | Notified body (except law enforcement use) |
| GPAI with systemic risk | Model evaluation + codes of practice | Provider + AI Office oversight |
What a Conformity Assessment Requires
For self-assessment under Annex VI, providers must document:
| Documentation Element | Red Team Contribution |
|---|---|
| Technical documentation (Annex IV) | Red team methodology, test cases, results |
| Quality management system | Testing processes, tool validation, team qualifications |
| Risk management system documentation | Threat models, vulnerability assessments, risk ratings |
| Testing and validation results | Red team findings, benchmarks, metrics |
| Corrective actions taken | Remediation evidence, retest results |
Implementation Timeline
| Date | Milestone | Red Team Impact |
|---|---|---|
| Aug 2024 | AI Act enters into force | Start preparing testing methodologies |
| Feb 2025 | Prohibited practices apply | Ensure clients' AI does not fall into prohibited categories |
| Aug 2025 | GPAI obligations apply | Systemic risk model testing required |
| Aug 2026 | High-risk obligations (Annex III) apply | Full conformity assessment testing required |
| Aug 2027 | Remaining obligations (Annex I) apply | Complete framework in effect |
Penalties and Enforcement
The EU AI Act includes significant penalties that motivate compliance investment:
| Violation | Maximum Fine | Applies To |
|---|---|---|
| Prohibited AI practices (Title II) | 35 million EUR or 7% of global annual turnover | Deploying banned AI systems |
| High-risk non-compliance | 15 million EUR or 3% of global annual turnover | Failing to meet high-risk requirements |
| Incorrect information to authorities | 7.5 million EUR or 1% of global annual turnover | Providing false data in conformity assessments |
| GPAI non-compliance | Up to 15 million EUR or 3% of global annual turnover | Failing to meet GPAI obligations |
Interaction with Other Regulations
The EU AI Act does not operate in isolation. Red teamers serving EU clients must understand how it interacts with existing regulations:
| Regulation | Interaction with EU AI Act | Red Team Impact |
|---|---|---|
| GDPR | AI processing personal data must comply with both | Data extraction testing must consider GDPR Article 5 principles |
| NIS2 Directive | Critical infrastructure AI subject to both cybersecurity frameworks | Infrastructure security assessment scope expands |
| Product Safety Directive | AI in products must meet both safety standards | Safety-critical AI testing requires domain expertise |
| Sector regulations | Financial services (MiFID II), medical devices (MDR), etc. | Sector-specific testing requirements layer on top |
| Copyright Directive | Training data copyright compliance | Training data extraction testing has copyright implications |
Structuring Compliance-Oriented Reports
When conducting red teaming for EU AI Act compliance, structure your report to map directly to regulatory requirements:
| Report Section | Maps to | Content |
|---|---|---|
| System description | Annex IV (Technical documentation) | Architecture, intended use, risk classification justification |
| Threat model | Art. 9(2)(a) (Known/foreseeable risks) | ATLAS-mapped threats, OWASP categories |
| Testing methodology | Art. 9(6) (Testing with unknown inputs) | Attack categories, tools, procedures |
| Findings | Art. 9(2)(b) (Misuse evaluation) | Vulnerabilities with severity, exploitability, OWASP mapping |
| Risk assessment | Art. 9(2)(d) (Residual risk) | Risk ratings with likelihood and impact |
| Remediation recommendations | Art. 9(4) (Appropriate risk measures) | Specific mitigations with priority |
| Retesting results | Art. 15 (Accuracy/robustness) | Post-remediation validation |
Example Report Finding Mapped to EU AI Act
Finding: Adversarial Robustness Failure in Hiring AI System
EU AI Act Reference: Article 15(3) - High-risk AI system
is not resilient to attempts by unauthorized third parties
to alter its use, outputs or performance
Risk Classification: High-Risk (Annex III, Area 4 - Employment)
OWASP LLM Mapping: LLM01 (Prompt Injection)
ATLAS Mapping: AML.T0051 (Prompt Injection)
Description: The AI-powered CV screening system is susceptible
to adversarial prompt injection via specially crafted text in
resume fields. An applicant embedding specific instruction
strings in white-on-white text within a PDF resume can
manipulate the scoring algorithm.
Conformity Assessment Impact: This finding indicates
non-compliance with Article 15(3) robustness requirements.
The system cannot demonstrate resilience to foreseeable misuse
as required by Article 9(2)(b).
Remediation: Implement input sanitization for resume text
extraction, add adversarial robustness testing to the CI/CD
pipeline, and document the residual risk per Article 9(2)(d).Related Topics
- International AI Security Law -- broader international regulatory context
- NIST AI RMF & ISO 42001 -- complementary risk management frameworks
- Cross-Framework Mapping Reference -- EU AI Act mapped to OWASP and ATLAS
- AI Security Frameworks Overview -- how the EU AI Act fits in the framework landscape
References
- "Regulation (EU) 2024/1689: Artificial Intelligence Act" - European Parliament and Council (2024) - Full text of the EU AI Act including Articles 9, 15, and 55 on adversarial testing requirements
- "EU AI Act Compliance Checker" - European Commission (2025) - Official tool for determining AI system risk classification under the EU AI Act
- "Harmonised Standards for AI: Technical Documentation and Conformity Assessment" - European Committee for Standardization (CEN/CENELEC) (2025) - Standards supporting EU AI Act conformity assessment
- "Guidelines on High-Risk AI System Classification" - European AI Office (2025) - Official guidance on determining whether AI systems fall under Annex III high-risk categories
Under the EU AI Act, which AI systems are explicitly required to undergo adversarial testing including red teaming?