AI Audit Methodology
Comprehensive methodology for auditing AI systems including planning, evidence collection, testing procedures, report templates, and integration with red team assessments.
An AI audit systematically evaluates whether an organization's AI systems meet defined standards, regulatory requirements, and internal policies. While distinct from 紅隊演練 -- which focuses on 對抗性 利用 -- auditing and 紅隊演練 are complementary activities. Red team findings provide critical evidence for auditors, and audit frameworks help red teamers 理解 which controls to 測試 and how to report findings.
Audit Planning
Pre-Audit Activities
| Activity | Purpose | Deliverable |
|---|---|---|
| Scope definition | Determine which AI systems, controls, and standards are in scope | Scope document signed by stakeholders |
| Standards identification | 識別 applicable standards and regulations | Compliance matrix |
| Team composition | Assemble audit team with required competencies | Team roster with qualifications |
| Timeline and logistics | Plan audit activities, schedule interviews, arrange access | Audit schedule |
| Risk-based prioritization | Focus audit effort on highest-risk areas | Risk-prioritized audit plan |
Audit Team Competencies
AI audits require a mix of skills not typically found in traditional audit teams:
| Role | Competency | Responsibility |
|---|---|---|
| Lead auditor | Audit methodology, standards knowledge, stakeholder management | Overall audit leadership, report quality |
| AI/ML specialist | Model development, 訓練, deployment practices | Technical 評估 of AI systems |
| Data specialist | Data governance, privacy, quality 評估 | Data-related control 評估 |
| 安全 specialist | AI 安全, 對抗性 測試, 漏洞 評估 | 安全 control 評估 (紅隊 integration) |
| Domain expert | Industry-specific requirements (healthcare, finance, etc.) | Sector-specific regulatory compliance |
| Ethics/fairness specialist | Bias 評估, fairness metrics, ethical AI | Fairness and bias control 評估 |
Scope Definition Framework
| Scope Element | Questions to Answer |
|---|---|
| AI systems | Which specific AI systems are being audited? Production only, or development and staging too? |
| Lifecycle stages | Which stages are in scope (development, 訓練, deployment, 監控, retirement)? |
| Control objectives | Which standard's controls apply (ISO 42001 Annex A, SOC 2 TSC, custom controls)? |
| Time period | Point-in-time 評估 or period-of-time 評估? |
| Third parties | Are third-party AI components in scope (foundation models, APIs, data providers)? |
| Exclusions | What is explicitly out of scope and why? |
Evidence Collection
Types of Audit Evidence
| Evidence Type | Description | Reliability | 範例 |
|---|---|---|---|
| Documentary | Policies, procedures, design documents | Medium (may not reflect practice) | AI governance policy, model cards, impact assessments |
| Observational | Direct observation of processes and controls | High (real-time verification) | Watching a model deployment go through change management |
| Testimonial | Interviews with personnel | Medium (subject to bias) | Interviews with AI engineers, data scientists, risk managers |
| Analytical | 測試 and analysis of system behavior | High (objective results) | Red team 測試 results, bias analysis, performance metrics |
| System-generated | Logs, metrics, automated reports | High (objective, timestamped) | Model 監控 logs, access logs, drift 偵測 alerts |
Evidence Collection Procedures
Document request
Issue a formal document request list to the organization covering all in-scope control areas. Allow adequate time for collection (typically 2-4 weeks).
Standard document requests for AI audits:
- AI governance policies and procedures
- AI system inventory with risk classifications
- Model development documentation (model cards, 訓練 procedures)
- Data governance documentation (data lineage, quality procedures)
- Risk assessments and impact assessments
- Incident reports and response records
- Change management records for AI systems
- 監控 and alerting configurations
- Red team and 安全 評估 reports
- Training records for AI development and operations staff
Interview planning
Schedule interviews with key personnel across AI governance, development, operations, and risk management functions.
Key interview subjects:
- AI governance lead / responsible AI officer
- AI/ML engineering leads
- Data engineering leads
- 安全 and 紅隊 leads
- Risk management and compliance
- Business stakeholders using AI outputs
Technical 測試
Plan and execute technical 測試 activities. 這是 where 紅隊 capabilities integrate directly.
測試 areas:
- Control effectiveness 測試 (does the control work as documented?)
- Configuration review (are systems configured according to policy?)
- 對抗性 測試 (do controls hold under attack?)
- Performance validation (does the AI system perform as documented?)
Evidence preservation
Maintain a structured evidence repository with clear chain of custody.
Evidence management requirements:
- Unique identifier 對每個 evidence item
- Date and time of collection
- Source of evidence
- Collector name and role
- Control objective the evidence supports
- Integrity verification (hash for digital evidence)
Evidence Quality Criteria
| Criterion | Description | How to Ensure |
|---|---|---|
| Sufficient | Enough evidence to support the conclusion | Collect evidence from multiple sources 對每個 control |
| Appropriate | Evidence is relevant to the control being assessed | Map each evidence item to a specific control objective |
| Reliable | Evidence source is trustworthy and verifiable | Prefer system-generated evidence over testimonial |
| Timely | Evidence reflects the audit period, not outdated information | Verify dates, request current documentation |
| Complete | Evidence covers the full scope of the control | Verify coverage across all in-scope AI systems |
測試 Procedures
Control 測試 Approaches
| Approach | Description | When to Use | Confidence Level |
|---|---|---|---|
| Inquiry | Ask personnel how a control works | Initial 理解, low-risk controls | Low |
| Observation | Watch the control operate in real time | Process-based controls | Medium |
| Inspection | Examine documentation and artifacts | Documentation-based controls | Medium |
| Re-performance | Independently execute the control process | Critical controls, high-risk areas | High |
| 對抗性 測試 | Attempt to bypass or break the control | 安全 controls, 安全 controls | Highest |
AI-Specific 測試 Procedures
Model governance 測試:
| 測試 | Procedure | Expected Evidence |
|---|---|---|
| Model inventory completeness | Compare documented inventory against discovered AI systems | All AI systems accounted for in the inventory |
| Change management adherence | Select a sample of model deployments and verify change management records | Approved change requests 對每個 deployment |
| Model documentation accuracy | Compare model cards against actual model behavior | Documentation matches tested behavior |
| Version control | Verify model versions in production match approved versions | Version 對齊 confirmed |
Data governance 測試:
| 測試 | Procedure | Expected Evidence |
|---|---|---|
| 訓練資料 provenance | Trace 訓練資料 to its source and verify 授權 | Documented data lineage with consent records |
| Data quality controls | Review data validation procedures and 測試 with sample data | Quality checks operating effectively |
| Data retention compliance | Verify data retention periods match policy and regulatory requirements | Data aged off per schedule |
| Privacy controls | 測試 for PII in 訓練資料 and model outputs | PII 偵測 and handling procedures effective |
安全 control 測試 (紅隊 integration):
| 測試 | Procedure | Expected Evidence |
|---|---|---|
| Prompt injection resistance | Execute 提示詞注入 測試 suite against production endpoints | Injection attempts blocked or detected |
| Data extraction prevention | Attempt 訓練資料 and 系統提示詞 extraction | Extraction attempts fail or trigger alerts |
| Access control effectiveness | 測試 API 認證, 授權, and rate limiting | Unauthorized access prevented |
| 監控 effectiveness | Execute attacks and verify 偵測 | 攻擊 detected within defined SLA |
Fairness and bias 測試:
| 測試 | Procedure | Expected Evidence |
|---|---|---|
| Demographic parity | Compare model outcomes across protected groups | Outcomes within acceptable disparity thresholds |
| Equal opportunity | Compare error rates across protected groups | Error rates consistent across groups |
| Calibration 測試 | Verify prediction confidence aligns with actual outcomes across groups | Calibration consistent across groups |
| Intersectional analysis | 測試 for bias at intersections of protected characteristics | No significant intersectional disparities |
Audit Reporting
Report Structure
| Section | Content | Audience |
|---|---|---|
| Executive summary | Overall 評估, critical findings, key recommendations | Board, C-suite |
| Scope and methodology | What was audited, how, and against which standards | All stakeholders |
| Findings summary | Categorized findings by severity and control area | Management, risk committee |
| Detailed findings | Individual finding descriptions with evidence and recommendations | Technical teams, compliance |
| Control effectiveness matrix | Control-by-control 評估 results | Auditors, compliance, risk management |
| Recommendations | Prioritized recommendations with 實作 guidance | Technical teams, management |
| Management response | Organization's response to findings and remediation plans | All stakeholders |
Finding Classification
| Severity | Definition | Timeline |
|---|---|---|
| Critical | Control failure that could result in significant harm, regulatory action, or material misstatement | Immediate remediation (within 30 days) |
| High | Control deficiency that significantly increases risk exposure | Remediation within 60 days |
| Medium | Control weakness that should be addressed but does not pose immediate significant risk | Remediation within 90 days |
| Low | Observation or improvement opportunity | Address during next review cycle |
| Advisory | Best practice recommendation with no current deficiency | 考慮 for 實作 |
Finding Documentation Template
| Field | Content |
|---|---|
| Finding ID | Unique identifier |
| Title | Concise description of the finding |
| Severity | Critical / High / Medium / Low / Advisory |
| Control reference | Applicable standard and control (e.g., ISO 42001 A.6.2.4) |
| Condition | What was found (the current state) |
| Criteria | What was expected (the standard or requirement) |
| Cause | Why the gap exists (root cause analysis) |
| Effect | What could happen if unaddressed (risk statement) |
| Recommendation | What should be done to remediate |
| Evidence | 參考文獻 to supporting evidence |
| Management response | Organization's planned remediation |
紅隊 Integration Points
Where 紅隊演練 Fits in the Audit
| Audit Phase | 紅隊 Contribution |
|---|---|
| Planning | Provide threat intelligence to inform risk-based audit scope |
| Evidence collection | Red team reports serve as analytical evidence of control effectiveness |
| 測試 | 對抗性 測試 provides the highest-confidence 評估 of 安全 controls |
| Reporting | Red team findings directly map to audit findings |
| Follow-up | Red team re-測試 validates remediation effectiveness |
Translating 紅隊 Findings to Audit Language
Red teams and auditors use different terminology. This translation table helps:
| 紅隊 Term | Audit Term |
|---|---|
| 漏洞 | Control deficiency |
| 利用 | Control failure |
| 攻擊面 | Risk exposure area |
| Severity rating | Finding classification |
| Proof of concept | Supporting evidence |
| Remediation guidance | Recommendation |
| Re-測試 | Remediation verification |
Continuous Audit Considerations
Traditional audits are point-in-time assessments. For AI systems that change frequently, 考慮 continuous audit approaches:
| Traditional Audit | Continuous Audit |
|---|---|
| Annual or periodic 評估 | Ongoing 評估 with periodic reporting |
| Manual evidence collection | Automated evidence collection and 監控 |
| Point-in-time control 測試 | Continuous control 監控 |
| Findings reported at audit completion | Findings reported in real time |
| Manual follow-up on remediation | Automated remediation verification |
Continuous Audit Architecture
AI System → 監控 Layer → Evidence Repository → Analysis Engine → Dashboard
│ │ │ │ │
├── Logs ├── Automated tests ├── Evidence store ├── Compliance ├── Alerts
├── Metrics ├── Drift 偵測 ├── Chain of custody ├── Trend ├── Reports
└── Events └── Red team tests └── Audit trail └── Gap detect └── KPIs
This architecture enables organizations to maintain audit readiness at all times rather than scrambling to prepare for periodic assessments. Red team 測試 results feed continuously into the evidence repository, providing ongoing assurance of control effectiveness.