Capstone: Implement an AI Compliance Framework
Build a comprehensive AI compliance framework that maps security testing to regulatory requirements including the EU AI Act, NIST AI RMF, and ISO 42001.
概覽
As AI regulation accelerates globally, organizations deploying AI systems need a structured way to connect their 安全 測試 activities to compliance requirements. The EU AI Act requires risk-based 測試 of high-risk AI systems. The NIST AI Risk Management Framework calls for ongoing 紅隊演練 and 對抗性 測試. ISO 42001 mandates documented risk assessments. Yet most organizations perform these activities in isolation — 安全 測試 happens in one team, compliance documentation in another, and the two rarely connect.
This capstone project tasks you with building an AI compliance framework that bridges the gap between technical 紅隊演練 and regulatory compliance. The framework takes outputs from AI 安全 tools (Garak, PyRIT, Promptfoo) and maps findings to specific regulatory requirements, generating compliance evidence that auditors can review. It also works in the other direction: given a regulatory requirement, it determines which 安全 tests must be performed and whether existing 測試 results provide sufficient evidence.
這是 not a compliance checklist tool — it is an engineering project that automates the translation between technical 安全 findings and regulatory language, making compliance a natural byproduct of good 安全 測試 rather than a separate documentation burden.
Project Requirements
Functional Requirements
-
Regulatory Requirement 資料庫 — Structured representation of requirements from EU AI Act, NIST AI RMF, and ISO 42001, with cross-references between frameworks.
-
Compliance Mapping Engine — Maps AI 安全 測試 results to regulatory requirements. Each mapping includes the requirement, the 測試 that provides evidence, the strength of the evidence, and any gaps.
-
Risk Classification Module — Classifies AI systems according to the EU AI Act risk tiers (unacceptable, high-risk, limited, minimal) and determines required 測試 based on classification.
-
Evidence Management — Stores and organizes compliance evidence (測試 results, reports, attestations) with version control and audit trail.
-
Gap Analysis — Identifies regulatory requirements that lack sufficient 測試 evidence and recommends specific tests to fill gaps.
-
Reporting — Generates compliance status reports for different audiences (regulators, auditors, executives, engineering teams).
實作 Guide
Phase 1: Regulatory Requirement Model
# compliance/requirements.py
"""Structured representation of AI regulatory requirements."""
from __future__ import annotations
import enum
import json
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
class Framework(str, enum.Enum):
EU_AI_ACT = "eu_ai_act"
NIST_AI_RMF = "nist_ai_rmf"
ISO_42001 = "iso_42001"
class RequirementCategory(str, enum.Enum):
RISK_MANAGEMENT = "risk_management"
TESTING_VALIDATION = "testing_validation"
TRANSPARENCY = "transparency"
DATA_GOVERNANCE = "data_governance"
HUMAN_OVERSIGHT = "human_oversight"
ROBUSTNESS_SAFETY = "robustness_safety"
ACCOUNTABILITY = "accountability"
DOCUMENTATION = "documentation"
MONITORING = "監控"
class EvidenceStrength(str, enum.Enum):
STRONG = "strong" # Automated 測試 directly addresses the requirement.
MODERATE = "moderate" # 測試 partially addresses the requirement.
WEAK = "weak" # 測試 is tangentially related.
NONE = "none" # No evidence available.
@dataclass
class RegulatoryRequirement:
"""A single regulatory requirement from a compliance framework."""
req_id: str # e.g., "EU-AI-ACT-ART-9.2.a"
framework: Framework
article_section: str # e.g., "Article 9, Section 2(a)"
title: str
description: str
category: RequirementCategory
applies_to_risk_levels: list[str] # e.g., ["high", "limited"]
required_evidence_types: list[str] # e.g., ["test_results", "documentation"]
cross_references: list[str] = field(default_factory=list) # IDs of related reqs in other frameworks
testing_guidance: str = ""
metadata: dict[str, Any] = field(default_factory=dict)
@dataclass
class RequirementDatabase:
"""資料庫 of regulatory requirements with cross-reference support."""
requirements: dict[str, RegulatoryRequirement] = field(default_factory=dict)
def add(self, req: RegulatoryRequirement) -> None:
self.requirements[req.req_id] = req
def get(self, req_id: str) -> RegulatoryRequirement | None:
return self.requirements.get(req_id)
def by_framework(self, framework: Framework) -> list[RegulatoryRequirement]:
return [r for r in self.requirements.values() if r.framework == framework]
def by_category(self, category: RequirementCategory) -> list[RegulatoryRequirement]:
return [r for r in self.requirements.values() if r.category == category]
def by_risk_level(self, level: str) -> list[RegulatoryRequirement]:
return [r for r in self.requirements.values() if level in r.applies_to_risk_levels]
def cross_references_for(self, req_id: str) -> list[RegulatoryRequirement]:
req = self.requirements.get(req_id)
if not req:
return []
return [
self.requirements[xref]
for xref in req.cross_references
if xref in self.requirements
]
def save(self, path: Path) -> None:
data = {
req_id: {
"req_id": r.req_id,
"framework": r.framework.value,
"article_section": r.article_section,
"title": r.title,
"description": r.description,
"category": r.category.value,
"applies_to_risk_levels": r.applies_to_risk_levels,
"required_evidence_types": r.required_evidence_types,
"cross_references": r.cross_references,
"testing_guidance": r.testing_guidance,
}
for req_id, r in self.requirements.items()
}
path.write_text(json.dumps(data, indent=2))
def build_default_database() -> RequirementDatabase:
"""Build a 資料庫 with key requirements from major AI frameworks."""
db = RequirementDatabase()
# EU AI Act requirements (selected key articles).
db.add(RegulatoryRequirement(
req_id="EU-AIA-ART9-2a",
framework=Framework.EU_AI_ACT,
article_section="Article 9, Section 2(a)",
title="Risk management: identification and analysis of known risks",
description="The risk management system shall 識別 and analyse the known and the reasonably foreseeable risks that the high-risk AI system can pose to health, 安全 or fundamental rights.",
category=RequirementCategory.RISK_MANAGEMENT,
applies_to_risk_levels=["high"],
required_evidence_types=["risk_assessment", "test_results"],
cross_references=["NIST-MAP-1.1", "ISO42001-6.1.2"],
testing_guidance="Conduct comprehensive 紅隊演練 covering 提示詞注入, data extraction, and harmful 輸出 generation. Map findings to risk categories.",
))
db.add(RegulatoryRequirement(
req_id="EU-AIA-ART9-7",
framework=Framework.EU_AI_ACT,
article_section="Article 9, Section 7",
title="測試 to ensure appropriate and consistent performance",
description="測試 shall be made against prior defined metrics and probabilistic thresholds that are appropriate to the intended purpose of the high-risk AI system.",
category=RequirementCategory.TESTING_VALIDATION,
applies_to_risk_levels=["high"],
required_evidence_types=["test_results", "metrics", "thresholds"],
cross_references=["NIST-MEASURE-2.6"],
testing_guidance="Use Garak and Promptfoo to establish baseline 安全 metrics. Define pass/fail thresholds 對每個 安全 category. Run tests across model updates.",
))
db.add(RegulatoryRequirement(
req_id="EU-AIA-ART15-1",
framework=Framework.EU_AI_ACT,
article_section="Article 15, Section 1",
title="Accuracy, robustness and cybersecurity",
description="High-risk AI systems shall be designed and developed in such a way that they achieve an appropriate level of accuracy, robustness and cybersecurity.",
category=RequirementCategory.ROBUSTNESS_SAFETY,
applies_to_risk_levels=["high"],
required_evidence_types=["test_results", "security_audit"],
cross_references=["NIST-MEASURE-2.6", "ISO42001-A.6.2.5"],
testing_guidance="測試 對抗性 robustness using PyRIT. 評估 cybersecurity posture including API 安全, 認證, and access controls.",
))
# NIST AI RMF requirements.
db.add(RegulatoryRequirement(
req_id="NIST-MAP-1.1",
framework=Framework.NIST_AI_RMF,
article_section="MAP 1.1",
title="Intended purpose and context of use are documented",
description="Document the intended purpose, context of use, and expected users of the AI system to inform risk identification.",
category=RequirementCategory.DOCUMENTATION,
applies_to_risk_levels=["high", "limited", "minimal"],
required_evidence_types=["documentation"],
cross_references=["EU-AIA-ART9-2a"],
testing_guidance="Document 系統 scope before 測試. Ensure 紅隊 scope aligns with documented intended use.",
))
db.add(RegulatoryRequirement(
req_id="NIST-MEASURE-2.6",
framework=Framework.NIST_AI_RMF,
article_section="MEASURE 2.6",
title="AI system performance measured against 安全 requirements",
description="AI system performance or assurance criteria are measured qualitatively or quantitatively and demonstrated for conditions similar to deployment conditions, including 對抗性 conditions.",
category=RequirementCategory.TESTING_VALIDATION,
applies_to_risk_levels=["high", "limited"],
required_evidence_types=["test_results", "metrics"],
cross_references=["EU-AIA-ART9-7", "EU-AIA-ART15-1"],
testing_guidance="Run 對抗性 測試 under deployment-realistic conditions. Use Promptfoo for continuous 測試 with defined thresholds.",
))
db.add(RegulatoryRequirement(
req_id="NIST-MANAGE-2.2",
framework=Framework.NIST_AI_RMF,
article_section="MANAGE 2.2",
title="Mechanisms for tracking risks over time",
description="Mechanisms are in place and applied to sustain the value of deployed AI systems and to track identified risks over time.",
category=RequirementCategory.MONITORING,
applies_to_risk_levels=["high", "limited"],
required_evidence_types=["monitoring_config", "incident_records"],
cross_references=["ISO42001-9.1"],
testing_guidance="實作 continuous 紅隊 pipeline. Monitor for regression in 安全 metrics across model updates.",
))
# ISO 42001 requirements.
db.add(RegulatoryRequirement(
req_id="ISO42001-6.1.2",
framework=Framework.ISO_42001,
article_section="6.1.2",
title="AI risk 評估",
description="The organization shall define and apply an AI risk 評估 process that identifies AI risks and determines their significance.",
category=RequirementCategory.RISK_MANAGEMENT,
applies_to_risk_levels=["high", "limited", "minimal"],
required_evidence_types=["risk_assessment", "test_results"],
cross_references=["EU-AIA-ART9-2a", "NIST-MAP-1.1"],
testing_guidance="Conduct threat modeling for AI-specific risks. Use 紅隊 findings to populate the risk register.",
))
return dbPhase 2: Compliance Mapping Engine
# compliance/mapping.py
"""Maps 安全 測試 results to regulatory requirements."""
from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any
from .requirements import (
EvidenceStrength,
Framework,
RequirementCategory,
RequirementDatabase,
RegulatoryRequirement,
)
@dataclass
class TestEvidence:
"""Evidence from a 安全 測試 that may satisfy a regulatory requirement."""
evidence_id: str
source_tool: str # "garak", "pyrit", "promptfoo", "manual"
test_name: str
test_date: datetime
result_summary: str
findings_count: int
severity_distribution: dict[str, int] # severity -> count
pass_rate: float # 0.0 to 1.0
raw_data_path: str = ""
metadata: dict[str, Any] = field(default_factory=dict)
@dataclass
class ComplianceMapping:
"""A mapping between a 測試 evidence and a regulatory requirement."""
requirement: RegulatoryRequirement
evidence: TestEvidence | None
strength: EvidenceStrength
gap_description: str = ""
recommendation: str = ""
assessed_at: datetime = field(default_factory=datetime.utcnow)
class ComplianceMappingEngine:
"""Maps 安全 測試 results to regulatory requirements."""
# Mapping from 測試 tool categories to requirement categories they can satisfy.
TOOL_CATEGORY_MAP: dict[str, list[RequirementCategory]] = {
"garak": [
RequirementCategory.TESTING_VALIDATION,
RequirementCategory.ROBUSTNESS_SAFETY,
RequirementCategory.RISK_MANAGEMENT,
],
"pyrit": [
RequirementCategory.TESTING_VALIDATION,
RequirementCategory.ROBUSTNESS_SAFETY,
RequirementCategory.RISK_MANAGEMENT,
],
"promptfoo": [
RequirementCategory.TESTING_VALIDATION,
RequirementCategory.MONITORING,
RequirementCategory.ROBUSTNESS_SAFETY,
],
"manual_audit": [
RequirementCategory.RISK_MANAGEMENT,
RequirementCategory.DOCUMENTATION,
RequirementCategory.HUMAN_OVERSIGHT,
RequirementCategory.ACCOUNTABILITY,
RequirementCategory.TRANSPARENCY,
],
}
# Minimum pass rate thresholds for evidence strength.
STRENGTH_THRESHOLDS = {
EvidenceStrength.STRONG: 0.90,
EvidenceStrength.MODERATE: 0.70,
EvidenceStrength.WEAK: 0.50,
}
def __init__(self, requirements_db: RequirementDatabase) -> None:
self._db = requirements_db
def map_evidence(
self,
evidence: TestEvidence,
risk_level: str = "high",
) -> list[ComplianceMapping]:
"""Map a piece of 測試 evidence to all applicable requirements."""
mappings: list[ComplianceMapping] = []
applicable_categories = self.TOOL_CATEGORY_MAP.get(evidence.source_tool, [])
applicable_requirements = self._db.by_risk_level(risk_level)
for req in applicable_requirements:
if req.category not in applicable_categories:
continue
strength = self._assess_evidence_strength(evidence, req)
gap = ""
recommendation = ""
if strength == EvidenceStrength.NONE:
gap = f"測試 '{evidence.test_name}' does not address requirement '{req.title}'."
recommendation = req.testing_guidance
elif strength == EvidenceStrength.WEAK:
gap = f"測試 provides partial evidence. Pass rate ({evidence.pass_rate:.0%}) below moderate threshold."
recommendation = f"Expand 測試 to improve coverage. {req.testing_guidance}"
mappings.append(ComplianceMapping(
requirement=req,
evidence=evidence,
strength=strength,
gap_description=gap,
recommendation=recommendation,
))
return mappings
def _assess_evidence_strength(
self,
evidence: TestEvidence,
requirement: RegulatoryRequirement,
) -> EvidenceStrength:
"""Determine how strongly a piece of evidence addresses a requirement."""
# Check if the evidence type matches what the requirement expects.
has_matching_type = "test_results" in requirement.required_evidence_types
if not has_matching_type:
return EvidenceStrength.NONE
# 評估 based on pass rate.
if evidence.pass_rate >= self.STRENGTH_THRESHOLDS[EvidenceStrength.STRONG]:
return EvidenceStrength.STRONG
elif evidence.pass_rate >= self.STRENGTH_THRESHOLDS[EvidenceStrength.MODERATE]:
return EvidenceStrength.MODERATE
elif evidence.pass_rate >= self.STRENGTH_THRESHOLDS[EvidenceStrength.WEAK]:
return EvidenceStrength.WEAK
else:
return EvidenceStrength.WEAK # Low pass rate is still evidence (of a problem).
def gap_analysis(
self,
all_evidence: list[TestEvidence],
risk_level: str = "high",
) -> list[ComplianceMapping]:
"""識別 requirements that lack sufficient evidence."""
requirements = self._db.by_risk_level(risk_level)
covered: dict[str, EvidenceStrength] = {}
# Build coverage map from all evidence.
for evidence in all_evidence:
mappings = self.map_evidence(evidence, risk_level)
for mapping in mappings:
req_id = mapping.requirement.req_id
current = covered.get(req_id, EvidenceStrength.NONE)
strength_order = [
EvidenceStrength.NONE,
EvidenceStrength.WEAK,
EvidenceStrength.MODERATE,
EvidenceStrength.STRONG,
]
if strength_order.index(mapping.strength) > strength_order.index(current):
covered[req_id] = mapping.strength
# 識別 gaps.
gaps: list[ComplianceMapping] = []
for req in requirements:
strength = covered.get(req.req_id, EvidenceStrength.NONE)
if strength in (EvidenceStrength.NONE, EvidenceStrength.WEAK):
gaps.append(ComplianceMapping(
requirement=req,
evidence=None,
strength=strength,
gap_description=f"Requirement '{req.title}' has {strength.value} evidence coverage.",
recommendation=req.testing_guidance or "Conduct targeted 測試 for this requirement.",
))
return gapsPhase 3: Risk Classification Module
# compliance/risk_classifier.py
"""EU AI Act risk classification for AI systems."""
from __future__ import annotations
from dataclasses import dataclass
from typing import Any
class RiskLevel:
UNACCEPTABLE = "unacceptable"
HIGH = "high"
LIMITED = "limited"
MINIMAL = "minimal"
@dataclass
class AISystemProfile:
"""Profile of an AI system for risk classification."""
name: str
description: str
domain: str # e.g., "healthcare", "finance", "education", "general"
capabilities: list[str] # e.g., ["text_generation", "decision_making", "biometric"]
deployment_context: str # e.g., "customer_facing", "internal", "embedded"
data_types_processed: list[str] # e.g., ["personal_data", "health_data", "financial_data"]
autonomy_level: str # "advisory", "semi_autonomous", "autonomous"
affected_population_size: str # "individual", "group", "large_scale"
reversibility: str # "fully_reversible", "partially_reversible", "irreversible"
metadata: dict[str, Any] | None = None
@dataclass
class ClassificationResult:
"""Result of risk classification with justification."""
risk_level: str
justification: list[str]
applicable_requirements: list[str] # requirement IDs
mandatory_testing: list[str] # required 測試 types
documentation_requirements: list[str]
class EUAIActClassifier:
"""Classifies AI systems according to EU AI Act risk tiers."""
# Domains that trigger high-risk classification under Annex III.
HIGH_RISK_DOMAINS = {
"biometric_identification",
"critical_infrastructure",
"education_vocational_training",
"employment",
"essential_services",
"law_enforcement",
"migration_asylum",
"justice_democracy",
}
# Capabilities that trigger unacceptable risk.
UNACCEPTABLE_CAPABILITIES = {
"social_scoring",
"real_time_biometric_mass_surveillance",
"subliminal_manipulation",
"exploitation_of_vulnerabilities",
}
def classify(self, profile: AISystemProfile) -> ClassificationResult:
"""Classify an AI system according to EU AI Act risk tiers."""
justification: list[str] = []
# Check for unacceptable risk.
unacceptable_caps = set(profile.capabilities) & self.UNACCEPTABLE_CAPABILITIES
if unacceptable_caps:
justification.append(
f"System has capabilities classified as unacceptable risk: {unacceptable_caps}"
)
return ClassificationResult(
risk_level=RiskLevel.UNACCEPTABLE,
justification=justification,
applicable_requirements=[],
mandatory_testing=["prohibited_use_verification"],
documentation_requirements=["prohibition_notice"],
)
# Check for high-risk classification.
is_high_risk = False
if profile.domain in self.HIGH_RISK_DOMAINS:
is_high_risk = True
justification.append(
f"System operates in high-risk domain: {profile.domain} (Annex III)"
)
if "decision_making" in profile.capabilities and profile.autonomy_level != "advisory":
is_high_risk = True
justification.append(
"System makes autonomous or semi-autonomous decisions affecting individuals"
)
if any(dt in profile.data_types_processed for dt in ["health_data", "biometric_data"]):
is_high_risk = True
justification.append("System processes sensitive data categories (health/biometric)")
if profile.reversibility == "irreversible":
is_high_risk = True
justification.append("System actions are irreversible")
if is_high_risk:
return ClassificationResult(
risk_level=RiskLevel.HIGH,
justification=justification,
applicable_requirements=[
"EU-AIA-ART9-2a", "EU-AIA-ART9-7", "EU-AIA-ART15-1",
"NIST-MAP-1.1", "NIST-MEASURE-2.6", "ISO42001-6.1.2",
],
mandatory_testing=[
"adversarial_robustness_testing",
"safety_classification_testing",
"bias_fairness_testing",
"data_privacy_testing",
"security_penetration_testing",
],
documentation_requirements=[
"risk_management_system",
"technical_documentation",
"conformity_assessment",
"eu_declaration_of_conformity",
"quality_management_system",
],
)
# Limited risk (transparency obligations).
if profile.deployment_context == "customer_facing":
justification.append("Customer-facing system has transparency obligations")
return ClassificationResult(
risk_level=RiskLevel.LIMITED,
justification=justification,
applicable_requirements=["NIST-MAP-1.1", "NIST-MEASURE-2.6"],
mandatory_testing=[
"safety_classification_testing",
"transparency_verification",
],
documentation_requirements=["transparency_notice", "user_documentation"],
)
# Minimal risk.
justification.append("System does not trigger high-risk or limited-risk criteria")
return ClassificationResult(
risk_level=RiskLevel.MINIMAL,
justification=justification,
applicable_requirements=["NIST-MAP-1.1"],
mandatory_testing=["basic_safety_testing"],
documentation_requirements=["voluntary_code_of_conduct"],
)Phase 4: Compliance Report Generator
# compliance/reporting.py
"""Compliance status reporting for different audiences."""
from __future__ import annotations
from datetime import datetime
from typing import Any
from .mapping import ComplianceMapping, ComplianceMappingEngine, TestEvidence
from .requirements import EvidenceStrength, Framework, RequirementDatabase
from .risk_classifier import ClassificationResult
class ComplianceReportGenerator:
"""Generates compliance status reports."""
def __init__(
self,
db: RequirementDatabase,
engine: ComplianceMappingEngine,
classification: ClassificationResult,
) -> None:
self.db = db
self.engine = engine
self.classification = classification
def executive_report(
self,
all_evidence: list[TestEvidence],
system_name: str,
) -> str:
"""Generate an executive compliance report."""
all_mappings: list[ComplianceMapping] = []
for ev in all_evidence:
all_mappings.extend(self.engine.map_evidence(ev, self.classification.risk_level))
gaps = self.engine.gap_analysis(all_evidence, self.classification.risk_level)
# Count coverage.
covered_strong = sum(1 for m in all_mappings if m.strength == EvidenceStrength.STRONG)
covered_moderate = sum(1 for m in all_mappings if m.strength == EvidenceStrength.MODERATE)
total_applicable = len(self.db.by_risk_level(self.classification.risk_level))
coverage_pct = (
(covered_strong + covered_moderate) / max(total_applicable, 1) * 100
)
lines = [
f"# AI Compliance Status Report: {system_name}",
f"**Generated**: {datetime.utcnow().strftime('%Y-%m-%d')}",
f"**Risk Classification**: {self.classification.risk_level.upper()}",
"",
"## Executive 總結",
"",
f"The AI system '{system_name}' has been classified as **{self.classification.risk_level}** risk "
f"under the EU AI Act.",
"",
f"**Compliance Coverage**: {coverage_pct:.0f}% of applicable requirements have moderate or strong evidence.",
f"- Strong evidence: {covered_strong} requirements",
f"- Moderate evidence: {covered_moderate} requirements",
f"- Gaps identified: {len(gaps)} requirements need additional 測試",
"",
"## Classification Justification",
"",
]
for j in self.classification.justification:
lines.append(f"- {j}")
lines.extend([
"",
"## Required Actions",
"",
])
if gaps:
lines.append("### 測試 Gaps")
lines.append("")
lines.append("| Requirement | Framework | Gap | Recommended Action |")
lines.append("|------------|-----------|-----|-------------------|")
for gap in gaps:
lines.append(
f"| {gap.requirement.title[:40]} "
f"| {gap.requirement.framework.value} "
f"| {gap.gap_description[:40]} "
f"| {gap.recommendation[:50]} |"
)
lines.extend([
"",
"## Mandatory 測試 Checklist",
"",
])
for 測試 in self.classification.mandatory_testing:
completed = any(
m.strength in (EvidenceStrength.STRONG, EvidenceStrength.MODERATE)
for m in all_mappings
if 測試.replace("_", " ") in m.requirement.title.lower()
)
status = "[x]" if completed else "[ ]"
lines.append(f"- {status} {測試.replace('_', ' ').title()}")
return "\n".join(lines)
def auditor_report(
self,
all_evidence: list[TestEvidence],
system_name: str,
) -> str:
"""Generate a detailed report for compliance auditors."""
lines = [
f"# Detailed Compliance Audit Report: {system_name}",
f"**Report Date**: {datetime.utcnow().strftime('%Y-%m-%d')}",
f"**Risk Level**: {self.classification.risk_level}",
"",
"## Evidence Inventory",
"",
"| Evidence ID | Tool | 測試 Name | Date | Pass Rate | Findings |",
"|------------|------|-----------|------|-----------|----------|",
]
for ev in all_evidence:
lines.append(
f"| {ev.evidence_id} | {ev.source_tool} | {ev.test_name} "
f"| {ev.test_date.strftime('%Y-%m-%d')} | {ev.pass_rate:.0%} "
f"| {ev.findings_count} |"
)
lines.extend(["", "## Requirement-by-Requirement 評估", ""])
for req in self.db.by_risk_level(self.classification.risk_level):
mappings = [
m for ev in all_evidence
for m in self.engine.map_evidence(ev, self.classification.risk_level)
if m.requirement.req_id == req.req_id
]
best_strength = max(
(m.strength for m in mappings),
default=EvidenceStrength.NONE,
key=lambda s: ["none", "weak", "moderate", "strong"].index(s.value),
)
lines.append(f"### {req.req_id}: {req.title}")
lines.append(f"**Framework**: {req.framework.value}")
lines.append(f"**Category**: {req.category.value}")
lines.append(f"**Evidence Strength**: {best_strength.value.upper()}")
lines.append("")
lines.append(f"*Requirement*: {req.description}")
lines.append("")
if mappings:
lines.append("*Supporting Evidence*:")
for m in mappings:
if m.evidence:
lines.append(f"- [{m.strength.value}] {m.evidence.source_tool}: {m.evidence.test_name}")
else:
lines.append("*No evidence mapped to this requirement.*")
lines.append("")
return "\n".join(lines)評估 Criteria
| Criterion | Weight | Excellent | Satisfactory | Needs Improvement |
|---|---|---|---|---|
| Requirement Coverage | 25% | 3+ frameworks with 15+ requirements, cross-references | 2 frameworks with 8+ requirements | Single framework or fewer than 8 requirements |
| Mapping Engine | 25% | Automated evidence mapping with strength 評估 and gap analysis | Basic mapping with manual strength assignment | No automated mapping |
| Risk Classification | 20% | EU AI Act risk tier classification with justification and 測試 requirements | Basic high/low classification | No classification system |
| Reporting | 20% | Multiple audience-specific reports (executive, auditor, engineering) | Single report format | No structured reporting |
| Evidence Management | 10% | Versioned evidence store with audit trail | Basic file storage | No evidence management |
Stretch Goals
- Add support for sector-specific regulations (FDA guidance for AI in healthcare, financial services AI regulations).
- Build a compliance dashboard that visualizes coverage heatmaps across frameworks.
- 實作 automated evidence collection from CI/CD pipelines (pull Garak/Promptfoo results automatically).
- Add a regulatory change tracker that monitors for framework updates and flags affected requirements.
參考文獻
- European Parliament. (2024). "Regulation (EU) 2024/1689 — Artificial Intelligence Act." https://eur-lex.europa.eu/eli/reg/2024/1689/oj
- NIST. (2023). "AI Risk Management Framework (AI RMF 1.0)." https://doi.org/10.6028/NIST.AI.100-1
- ISO/IEC. (2023). "ISO/IEC 42001:2023 — Information technology — Artificial intelligence — Management system." https://www.iso.org/standard/81230.html