Capstone: Implement an AI Compliance Framework
Build a comprehensive AI compliance framework that maps security testing to regulatory requirements including the EU AI Act, NIST AI RMF, and ISO 42001.
Overview
As AI regulation accelerates globally, organizations deploying AI systems need a structured way to connect their security testing activities to compliance requirements. The EU AI Act requires risk-based testing of high-risk AI systems. The NIST AI Risk Management Framework calls for ongoing red teaming and adversarial testing. ISO 42001 mandates documented risk assessments. Yet most organizations perform these activities in isolation — security testing happens in one team, compliance documentation in another, and the two rarely connect.
This capstone project tasks you with building an AI compliance framework that bridges the gap between technical red teaming and regulatory compliance. The framework takes outputs from AI security tools (Garak, PyRIT, Promptfoo) and maps findings to specific regulatory requirements, generating compliance evidence that auditors can review. It also works in the other direction: given a regulatory requirement, it determines which security tests must be performed and whether existing test results provide sufficient evidence.
This is not a compliance checklist tool — it is an engineering project that automates the translation between technical security findings and regulatory language, making compliance a natural byproduct of good security testing rather than a separate documentation burden.
Project Requirements
Functional Requirements
-
Regulatory Requirement Database — Structured representation of requirements from EU AI Act, NIST AI RMF, and ISO 42001, with cross-references between frameworks.
-
Compliance Mapping Engine — Maps AI security test results to regulatory requirements. Each mapping includes the requirement, the test that provides evidence, the strength of the evidence, and any gaps.
-
Risk Classification Module — Classifies AI systems according to the EU AI Act risk tiers (unacceptable, high-risk, limited, minimal) and determines required testing based on classification.
-
Evidence Management — Stores and organizes compliance evidence (test results, reports, attestations) with version control and audit trail.
-
Gap Analysis — Identifies regulatory requirements that lack sufficient test evidence and recommends specific tests to fill gaps.
-
Reporting — Generates compliance status reports for different audiences (regulators, auditors, executives, engineering teams).
Implementation Guide
Phase 1: Regulatory Requirement Model
# compliance/requirements.py
"""Structured representation of AI regulatory requirements."""
from __future__ import annotations
import enum
import json
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
class Framework(str, enum.Enum):
EU_AI_ACT = "eu_ai_act"
NIST_AI_RMF = "nist_ai_rmf"
ISO_42001 = "iso_42001"
class RequirementCategory(str, enum.Enum):
RISK_MANAGEMENT = "risk_management"
TESTING_VALIDATION = "testing_validation"
TRANSPARENCY = "transparency"
DATA_GOVERNANCE = "data_governance"
HUMAN_OVERSIGHT = "human_oversight"
ROBUSTNESS_SAFETY = "robustness_safety"
ACCOUNTABILITY = "accountability"
DOCUMENTATION = "documentation"
MONITORING = "monitoring"
class EvidenceStrength(str, enum.Enum):
STRONG = "strong" # Automated test directly addresses the requirement.
MODERATE = "moderate" # Test partially addresses the requirement.
WEAK = "weak" # Test is tangentially related.
NONE = "none" # No evidence available.
@dataclass
class RegulatoryRequirement:
"""A single regulatory requirement from a compliance framework."""
req_id: str # e.g., "EU-AI-ACT-ART-9.2.a"
framework: Framework
article_section: str # e.g., "Article 9, Section 2(a)"
title: str
description: str
category: RequirementCategory
applies_to_risk_levels: list[str] # e.g., ["high", "limited"]
required_evidence_types: list[str] # e.g., ["test_results", "documentation"]
cross_references: list[str] = field(default_factory=list) # IDs of related reqs in other frameworks
testing_guidance: str = ""
metadata: dict[str, Any] = field(default_factory=dict)
@dataclass
class RequirementDatabase:
"""Database of regulatory requirements with cross-reference support."""
requirements: dict[str, RegulatoryRequirement] = field(default_factory=dict)
def add(self, req: RegulatoryRequirement) -> None:
self.requirements[req.req_id] = req
def get(self, req_id: str) -> RegulatoryRequirement | None:
return self.requirements.get(req_id)
def by_framework(self, framework: Framework) -> list[RegulatoryRequirement]:
return [r for r in self.requirements.values() if r.framework == framework]
def by_category(self, category: RequirementCategory) -> list[RegulatoryRequirement]:
return [r for r in self.requirements.values() if r.category == category]
def by_risk_level(self, level: str) -> list[RegulatoryRequirement]:
return [r for r in self.requirements.values() if level in r.applies_to_risk_levels]
def cross_references_for(self, req_id: str) -> list[RegulatoryRequirement]:
req = self.requirements.get(req_id)
if not req:
return []
return [
self.requirements[xref]
for xref in req.cross_references
if xref in self.requirements
]
def save(self, path: Path) -> None:
data = {
req_id: {
"req_id": r.req_id,
"framework": r.framework.value,
"article_section": r.article_section,
"title": r.title,
"description": r.description,
"category": r.category.value,
"applies_to_risk_levels": r.applies_to_risk_levels,
"required_evidence_types": r.required_evidence_types,
"cross_references": r.cross_references,
"testing_guidance": r.testing_guidance,
}
for req_id, r in self.requirements.items()
}
path.write_text(json.dumps(data, indent=2))
def build_default_database() -> RequirementDatabase:
"""Build a database with key requirements from major AI frameworks."""
db = RequirementDatabase()
# EU AI Act requirements (selected key articles).
db.add(RegulatoryRequirement(
req_id="EU-AIA-ART9-2a",
framework=Framework.EU_AI_ACT,
article_section="Article 9, Section 2(a)",
title="Risk management: identification and analysis of known risks",
description="The risk management system shall identify and analyse the known and the reasonably foreseeable risks that the high-risk AI system can pose to health, safety or fundamental rights.",
category=RequirementCategory.RISK_MANAGEMENT,
applies_to_risk_levels=["high"],
required_evidence_types=["risk_assessment", "test_results"],
cross_references=["NIST-MAP-1.1", "ISO42001-6.1.2"],
testing_guidance="Conduct comprehensive red teaming covering prompt injection, data extraction, and harmful output generation. Map findings to risk categories.",
))
db.add(RegulatoryRequirement(
req_id="EU-AIA-ART9-7",
framework=Framework.EU_AI_ACT,
article_section="Article 9, Section 7",
title="Testing to ensure appropriate and consistent performance",
description="Testing shall be made against prior defined metrics and probabilistic thresholds that are appropriate to the intended purpose of the high-risk AI system.",
category=RequirementCategory.TESTING_VALIDATION,
applies_to_risk_levels=["high"],
required_evidence_types=["test_results", "metrics", "thresholds"],
cross_references=["NIST-MEASURE-2.6"],
testing_guidance="Use Garak and Promptfoo to establish baseline safety metrics. Define pass/fail thresholds for each safety category. Run tests across model updates.",
))
db.add(RegulatoryRequirement(
req_id="EU-AIA-ART15-1",
framework=Framework.EU_AI_ACT,
article_section="Article 15, Section 1",
title="Accuracy, robustness and cybersecurity",
description="High-risk AI systems shall be designed and developed in such a way that they achieve an appropriate level of accuracy, robustness and cybersecurity.",
category=RequirementCategory.ROBUSTNESS_SAFETY,
applies_to_risk_levels=["high"],
required_evidence_types=["test_results", "security_audit"],
cross_references=["NIST-MEASURE-2.6", "ISO42001-A.6.2.5"],
testing_guidance="Test adversarial robustness using PyRIT. Assess cybersecurity posture including API security, authentication, and access controls.",
))
# NIST AI RMF requirements.
db.add(RegulatoryRequirement(
req_id="NIST-MAP-1.1",
framework=Framework.NIST_AI_RMF,
article_section="MAP 1.1",
title="Intended purpose and context of use are documented",
description="Document the intended purpose, context of use, and expected users of the AI system to inform risk identification.",
category=RequirementCategory.DOCUMENTATION,
applies_to_risk_levels=["high", "limited", "minimal"],
required_evidence_types=["documentation"],
cross_references=["EU-AIA-ART9-2a"],
testing_guidance="Document the system scope before testing. Ensure red team scope aligns with documented intended use.",
))
db.add(RegulatoryRequirement(
req_id="NIST-MEASURE-2.6",
framework=Framework.NIST_AI_RMF,
article_section="MEASURE 2.6",
title="AI system performance measured against safety requirements",
description="AI system performance or assurance criteria are measured qualitatively or quantitatively and demonstrated for conditions similar to deployment conditions, including adversarial conditions.",
category=RequirementCategory.TESTING_VALIDATION,
applies_to_risk_levels=["high", "limited"],
required_evidence_types=["test_results", "metrics"],
cross_references=["EU-AIA-ART9-7", "EU-AIA-ART15-1"],
testing_guidance="Run adversarial testing under deployment-realistic conditions. Use Promptfoo for continuous testing with defined thresholds.",
))
db.add(RegulatoryRequirement(
req_id="NIST-MANAGE-2.2",
framework=Framework.NIST_AI_RMF,
article_section="MANAGE 2.2",
title="Mechanisms for tracking risks over time",
description="Mechanisms are in place and applied to sustain the value of deployed AI systems and to track identified risks over time.",
category=RequirementCategory.MONITORING,
applies_to_risk_levels=["high", "limited"],
required_evidence_types=["monitoring_config", "incident_records"],
cross_references=["ISO42001-9.1"],
testing_guidance="Implement continuous red team pipeline. Monitor for regression in safety metrics across model updates.",
))
# ISO 42001 requirements.
db.add(RegulatoryRequirement(
req_id="ISO42001-6.1.2",
framework=Framework.ISO_42001,
article_section="6.1.2",
title="AI risk assessment",
description="The organization shall define and apply an AI risk assessment process that identifies AI risks and determines their significance.",
category=RequirementCategory.RISK_MANAGEMENT,
applies_to_risk_levels=["high", "limited", "minimal"],
required_evidence_types=["risk_assessment", "test_results"],
cross_references=["EU-AIA-ART9-2a", "NIST-MAP-1.1"],
testing_guidance="Conduct threat modeling for AI-specific risks. Use red team findings to populate the risk register.",
))
return dbPhase 2: Compliance Mapping Engine
# compliance/mapping.py
"""Maps security test results to regulatory requirements."""
from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any
from .requirements import (
EvidenceStrength,
Framework,
RequirementCategory,
RequirementDatabase,
RegulatoryRequirement,
)
@dataclass
class TestEvidence:
"""Evidence from a security test that may satisfy a regulatory requirement."""
evidence_id: str
source_tool: str # "garak", "pyrit", "promptfoo", "manual"
test_name: str
test_date: datetime
result_summary: str
findings_count: int
severity_distribution: dict[str, int] # severity -> count
pass_rate: float # 0.0 to 1.0
raw_data_path: str = ""
metadata: dict[str, Any] = field(default_factory=dict)
@dataclass
class ComplianceMapping:
"""A mapping between a test evidence and a regulatory requirement."""
requirement: RegulatoryRequirement
evidence: TestEvidence | None
strength: EvidenceStrength
gap_description: str = ""
recommendation: str = ""
assessed_at: datetime = field(default_factory=datetime.utcnow)
class ComplianceMappingEngine:
"""Maps security test results to regulatory requirements."""
# Mapping from test tool categories to requirement categories they can satisfy.
TOOL_CATEGORY_MAP: dict[str, list[RequirementCategory]] = {
"garak": [
RequirementCategory.TESTING_VALIDATION,
RequirementCategory.ROBUSTNESS_SAFETY,
RequirementCategory.RISK_MANAGEMENT,
],
"pyrit": [
RequirementCategory.TESTING_VALIDATION,
RequirementCategory.ROBUSTNESS_SAFETY,
RequirementCategory.RISK_MANAGEMENT,
],
"promptfoo": [
RequirementCategory.TESTING_VALIDATION,
RequirementCategory.MONITORING,
RequirementCategory.ROBUSTNESS_SAFETY,
],
"manual_audit": [
RequirementCategory.RISK_MANAGEMENT,
RequirementCategory.DOCUMENTATION,
RequirementCategory.HUMAN_OVERSIGHT,
RequirementCategory.ACCOUNTABILITY,
RequirementCategory.TRANSPARENCY,
],
}
# Minimum pass rate thresholds for evidence strength.
STRENGTH_THRESHOLDS = {
EvidenceStrength.STRONG: 0.90,
EvidenceStrength.MODERATE: 0.70,
EvidenceStrength.WEAK: 0.50,
}
def __init__(self, requirements_db: RequirementDatabase) -> None:
self._db = requirements_db
def map_evidence(
self,
evidence: TestEvidence,
risk_level: str = "high",
) -> list[ComplianceMapping]:
"""Map a piece of test evidence to all applicable requirements."""
mappings: list[ComplianceMapping] = []
applicable_categories = self.TOOL_CATEGORY_MAP.get(evidence.source_tool, [])
applicable_requirements = self._db.by_risk_level(risk_level)
for req in applicable_requirements:
if req.category not in applicable_categories:
continue
strength = self._assess_evidence_strength(evidence, req)
gap = ""
recommendation = ""
if strength == EvidenceStrength.NONE:
gap = f"Test '{evidence.test_name}' does not address requirement '{req.title}'."
recommendation = req.testing_guidance
elif strength == EvidenceStrength.WEAK:
gap = f"Test provides partial evidence. Pass rate ({evidence.pass_rate:.0%}) below moderate threshold."
recommendation = f"Expand testing to improve coverage. {req.testing_guidance}"
mappings.append(ComplianceMapping(
requirement=req,
evidence=evidence,
strength=strength,
gap_description=gap,
recommendation=recommendation,
))
return mappings
def _assess_evidence_strength(
self,
evidence: TestEvidence,
requirement: RegulatoryRequirement,
) -> EvidenceStrength:
"""Determine how strongly a piece of evidence addresses a requirement."""
# Check if the evidence type matches what the requirement expects.
has_matching_type = "test_results" in requirement.required_evidence_types
if not has_matching_type:
return EvidenceStrength.NONE
# Assess based on pass rate.
if evidence.pass_rate >= self.STRENGTH_THRESHOLDS[EvidenceStrength.STRONG]:
return EvidenceStrength.STRONG
elif evidence.pass_rate >= self.STRENGTH_THRESHOLDS[EvidenceStrength.MODERATE]:
return EvidenceStrength.MODERATE
elif evidence.pass_rate >= self.STRENGTH_THRESHOLDS[EvidenceStrength.WEAK]:
return EvidenceStrength.WEAK
else:
return EvidenceStrength.WEAK # Low pass rate is still evidence (of a problem).
def gap_analysis(
self,
all_evidence: list[TestEvidence],
risk_level: str = "high",
) -> list[ComplianceMapping]:
"""Identify requirements that lack sufficient evidence."""
requirements = self._db.by_risk_level(risk_level)
covered: dict[str, EvidenceStrength] = {}
# Build coverage map from all evidence.
for evidence in all_evidence:
mappings = self.map_evidence(evidence, risk_level)
for mapping in mappings:
req_id = mapping.requirement.req_id
current = covered.get(req_id, EvidenceStrength.NONE)
strength_order = [
EvidenceStrength.NONE,
EvidenceStrength.WEAK,
EvidenceStrength.MODERATE,
EvidenceStrength.STRONG,
]
if strength_order.index(mapping.strength) > strength_order.index(current):
covered[req_id] = mapping.strength
# Identify gaps.
gaps: list[ComplianceMapping] = []
for req in requirements:
strength = covered.get(req.req_id, EvidenceStrength.NONE)
if strength in (EvidenceStrength.NONE, EvidenceStrength.WEAK):
gaps.append(ComplianceMapping(
requirement=req,
evidence=None,
strength=strength,
gap_description=f"Requirement '{req.title}' has {strength.value} evidence coverage.",
recommendation=req.testing_guidance or "Conduct targeted testing for this requirement.",
))
return gapsPhase 3: Risk Classification Module
# compliance/risk_classifier.py
"""EU AI Act risk classification for AI systems."""
from __future__ import annotations
from dataclasses import dataclass
from typing import Any
class RiskLevel:
UNACCEPTABLE = "unacceptable"
HIGH = "high"
LIMITED = "limited"
MINIMAL = "minimal"
@dataclass
class AISystemProfile:
"""Profile of an AI system for risk classification."""
name: str
description: str
domain: str # e.g., "healthcare", "finance", "education", "general"
capabilities: list[str] # e.g., ["text_generation", "decision_making", "biometric"]
deployment_context: str # e.g., "customer_facing", "internal", "embedded"
data_types_processed: list[str] # e.g., ["personal_data", "health_data", "financial_data"]
autonomy_level: str # "advisory", "semi_autonomous", "autonomous"
affected_population_size: str # "individual", "group", "large_scale"
reversibility: str # "fully_reversible", "partially_reversible", "irreversible"
metadata: dict[str, Any] | None = None
@dataclass
class ClassificationResult:
"""Result of risk classification with justification."""
risk_level: str
justification: list[str]
applicable_requirements: list[str] # requirement IDs
mandatory_testing: list[str] # required test types
documentation_requirements: list[str]
class EUAIActClassifier:
"""Classifies AI systems according to EU AI Act risk tiers."""
# Domains that trigger high-risk classification under Annex III.
HIGH_RISK_DOMAINS = {
"biometric_identification",
"critical_infrastructure",
"education_vocational_training",
"employment",
"essential_services",
"law_enforcement",
"migration_asylum",
"justice_democracy",
}
# Capabilities that trigger unacceptable risk.
UNACCEPTABLE_CAPABILITIES = {
"social_scoring",
"real_time_biometric_mass_surveillance",
"subliminal_manipulation",
"exploitation_of_vulnerabilities",
}
def classify(self, profile: AISystemProfile) -> ClassificationResult:
"""Classify an AI system according to EU AI Act risk tiers."""
justification: list[str] = []
# Check for unacceptable risk.
unacceptable_caps = set(profile.capabilities) & self.UNACCEPTABLE_CAPABILITIES
if unacceptable_caps:
justification.append(
f"System has capabilities classified as unacceptable risk: {unacceptable_caps}"
)
return ClassificationResult(
risk_level=RiskLevel.UNACCEPTABLE,
justification=justification,
applicable_requirements=[],
mandatory_testing=["prohibited_use_verification"],
documentation_requirements=["prohibition_notice"],
)
# Check for high-risk classification.
is_high_risk = False
if profile.domain in self.HIGH_RISK_DOMAINS:
is_high_risk = True
justification.append(
f"System operates in high-risk domain: {profile.domain} (Annex III)"
)
if "decision_making" in profile.capabilities and profile.autonomy_level != "advisory":
is_high_risk = True
justification.append(
"System makes autonomous or semi-autonomous decisions affecting individuals"
)
if any(dt in profile.data_types_processed for dt in ["health_data", "biometric_data"]):
is_high_risk = True
justification.append("System processes sensitive data categories (health/biometric)")
if profile.reversibility == "irreversible":
is_high_risk = True
justification.append("System actions are irreversible")
if is_high_risk:
return ClassificationResult(
risk_level=RiskLevel.HIGH,
justification=justification,
applicable_requirements=[
"EU-AIA-ART9-2a", "EU-AIA-ART9-7", "EU-AIA-ART15-1",
"NIST-MAP-1.1", "NIST-MEASURE-2.6", "ISO42001-6.1.2",
],
mandatory_testing=[
"adversarial_robustness_testing",
"safety_classification_testing",
"bias_fairness_testing",
"data_privacy_testing",
"security_penetration_testing",
],
documentation_requirements=[
"risk_management_system",
"technical_documentation",
"conformity_assessment",
"eu_declaration_of_conformity",
"quality_management_system",
],
)
# Limited risk (transparency obligations).
if profile.deployment_context == "customer_facing":
justification.append("Customer-facing system has transparency obligations")
return ClassificationResult(
risk_level=RiskLevel.LIMITED,
justification=justification,
applicable_requirements=["NIST-MAP-1.1", "NIST-MEASURE-2.6"],
mandatory_testing=[
"safety_classification_testing",
"transparency_verification",
],
documentation_requirements=["transparency_notice", "user_documentation"],
)
# Minimal risk.
justification.append("System does not trigger high-risk or limited-risk criteria")
return ClassificationResult(
risk_level=RiskLevel.MINIMAL,
justification=justification,
applicable_requirements=["NIST-MAP-1.1"],
mandatory_testing=["basic_safety_testing"],
documentation_requirements=["voluntary_code_of_conduct"],
)Phase 4: Compliance Report Generator
# compliance/reporting.py
"""Compliance status reporting for different audiences."""
from __future__ import annotations
from datetime import datetime
from typing import Any
from .mapping import ComplianceMapping, ComplianceMappingEngine, TestEvidence
from .requirements import EvidenceStrength, Framework, RequirementDatabase
from .risk_classifier import ClassificationResult
class ComplianceReportGenerator:
"""Generates compliance status reports."""
def __init__(
self,
db: RequirementDatabase,
engine: ComplianceMappingEngine,
classification: ClassificationResult,
) -> None:
self.db = db
self.engine = engine
self.classification = classification
def executive_report(
self,
all_evidence: list[TestEvidence],
system_name: str,
) -> str:
"""Generate an executive compliance report."""
all_mappings: list[ComplianceMapping] = []
for ev in all_evidence:
all_mappings.extend(self.engine.map_evidence(ev, self.classification.risk_level))
gaps = self.engine.gap_analysis(all_evidence, self.classification.risk_level)
# Count coverage.
covered_strong = sum(1 for m in all_mappings if m.strength == EvidenceStrength.STRONG)
covered_moderate = sum(1 for m in all_mappings if m.strength == EvidenceStrength.MODERATE)
total_applicable = len(self.db.by_risk_level(self.classification.risk_level))
coverage_pct = (
(covered_strong + covered_moderate) / max(total_applicable, 1) * 100
)
lines = [
f"# AI Compliance Status Report: {system_name}",
f"**Generated**: {datetime.utcnow().strftime('%Y-%m-%d')}",
f"**Risk Classification**: {self.classification.risk_level.upper()}",
"",
"## Executive Summary",
"",
f"The AI system '{system_name}' has been classified as **{self.classification.risk_level}** risk "
f"under the EU AI Act.",
"",
f"**Compliance Coverage**: {coverage_pct:.0f}% of applicable requirements have moderate or strong evidence.",
f"- Strong evidence: {covered_strong} requirements",
f"- Moderate evidence: {covered_moderate} requirements",
f"- Gaps identified: {len(gaps)} requirements need additional testing",
"",
"## Classification Justification",
"",
]
for j in self.classification.justification:
lines.append(f"- {j}")
lines.extend([
"",
"## Required Actions",
"",
])
if gaps:
lines.append("### Testing Gaps")
lines.append("")
lines.append("| Requirement | Framework | Gap | Recommended Action |")
lines.append("|------------|-----------|-----|-------------------|")
for gap in gaps:
lines.append(
f"| {gap.requirement.title[:40]} "
f"| {gap.requirement.framework.value} "
f"| {gap.gap_description[:40]} "
f"| {gap.recommendation[:50]} |"
)
lines.extend([
"",
"## Mandatory Testing Checklist",
"",
])
for test in self.classification.mandatory_testing:
completed = any(
m.strength in (EvidenceStrength.STRONG, EvidenceStrength.MODERATE)
for m in all_mappings
if test.replace("_", " ") in m.requirement.title.lower()
)
status = "[x]" if completed else "[ ]"
lines.append(f"- {status} {test.replace('_', ' ').title()}")
return "\n".join(lines)
def auditor_report(
self,
all_evidence: list[TestEvidence],
system_name: str,
) -> str:
"""Generate a detailed report for compliance auditors."""
lines = [
f"# Detailed Compliance Audit Report: {system_name}",
f"**Report Date**: {datetime.utcnow().strftime('%Y-%m-%d')}",
f"**Risk Level**: {self.classification.risk_level}",
"",
"## Evidence Inventory",
"",
"| Evidence ID | Tool | Test Name | Date | Pass Rate | Findings |",
"|------------|------|-----------|------|-----------|----------|",
]
for ev in all_evidence:
lines.append(
f"| {ev.evidence_id} | {ev.source_tool} | {ev.test_name} "
f"| {ev.test_date.strftime('%Y-%m-%d')} | {ev.pass_rate:.0%} "
f"| {ev.findings_count} |"
)
lines.extend(["", "## Requirement-by-Requirement Assessment", ""])
for req in self.db.by_risk_level(self.classification.risk_level):
mappings = [
m for ev in all_evidence
for m in self.engine.map_evidence(ev, self.classification.risk_level)
if m.requirement.req_id == req.req_id
]
best_strength = max(
(m.strength for m in mappings),
default=EvidenceStrength.NONE,
key=lambda s: ["none", "weak", "moderate", "strong"].index(s.value),
)
lines.append(f"### {req.req_id}: {req.title}")
lines.append(f"**Framework**: {req.framework.value}")
lines.append(f"**Category**: {req.category.value}")
lines.append(f"**Evidence Strength**: {best_strength.value.upper()}")
lines.append("")
lines.append(f"*Requirement*: {req.description}")
lines.append("")
if mappings:
lines.append("*Supporting Evidence*:")
for m in mappings:
if m.evidence:
lines.append(f"- [{m.strength.value}] {m.evidence.source_tool}: {m.evidence.test_name}")
else:
lines.append("*No evidence mapped to this requirement.*")
lines.append("")
return "\n".join(lines)Evaluation Criteria
| Criterion | Weight | Excellent | Satisfactory | Needs Improvement |
|---|---|---|---|---|
| Requirement Coverage | 25% | 3+ frameworks with 15+ requirements, cross-references | 2 frameworks with 8+ requirements | Single framework or fewer than 8 requirements |
| Mapping Engine | 25% | Automated evidence mapping with strength assessment and gap analysis | Basic mapping with manual strength assignment | No automated mapping |
| Risk Classification | 20% | EU AI Act risk tier classification with justification and testing requirements | Basic high/low classification | No classification system |
| Reporting | 20% | Multiple audience-specific reports (executive, auditor, engineering) | Single report format | No structured reporting |
| Evidence Management | 10% | Versioned evidence store with audit trail | Basic file storage | No evidence management |
Stretch Goals
- Add support for sector-specific regulations (FDA guidance for AI in healthcare, financial services AI regulations).
- Build a compliance dashboard that visualizes coverage heatmaps across frameworks.
- Implement automated evidence collection from CI/CD pipelines (pull Garak/Promptfoo results automatically).
- Add a regulatory change tracker that monitors for framework updates and flags affected requirements.
References
- European Parliament. (2024). "Regulation (EU) 2024/1689 — Artificial Intelligence Act." https://eur-lex.europa.eu/eli/reg/2024/1689/oj
- NIST. (2023). "AI Risk Management Framework (AI RMF 1.0)." https://doi.org/10.6028/NIST.AI.100-1
- ISO/IEC. (2023). "ISO/IEC 42001:2023 — Information technology — Artificial intelligence — Management system." https://www.iso.org/standard/81230.html