AI Risk 評量 Methodology
Structured approaches to evaluating AI system risks including identification, scoring frameworks, treatment planning, and templates for conducting comprehensive AI risk assessments.
AI risk 評估 is the foundation of any AI governance program. It determines which systems need the most scrutiny, what controls are required, and where 紅隊 resources should be focused. Unlike traditional IT risk 評估, AI risk 評估 must account for emergent behaviors, algorithmic bias, 安全 failures, and the probabilistic nature of AI outputs.
Risk 評估 Process
The Five-Phase Approach
System identification and classification
Inventory all AI systems and classify them by risk level. This determines the depth of 評估 required.
Activities:
- Enumerate all AI systems (production, development, experimental)
- Classify by AI type (generative, predictive, classification, recommendation)
- 識別 the decision domain (healthcare, finance, employment, general)
- Assign initial risk tier based on use case and impact potential
Threat identification
識別 threats specific to each AI system. Use a combination of threat intelligence, framework-based enumeration, and 對抗性 thinking.
Threat sources:
- OWASP LLM Top 10 漏洞 categories
- MITRE ATLAS tactics and techniques
- NIST AI 600-1 risk categories (for generative AI)
- Industry-specific threat profiles
- Historical incident analysis
Risk analysis and scoring
評估 each identified threat against the specific AI system context. Score based on likelihood and impact using a consistent methodology.
Risk treatment planning
對每個 scored risk, determine the appropriate treatment: accept, mitigate, transfer, or avoid. Map treatments to specific controls and 測試 activities.
Documentation and communication
Document 評估 results and communicate to stakeholders. Feed results into compliance reporting and 紅隊 scoping.
AI System Classification
Risk Tier Framework
| Tier | Risk Level | Criteria | 評估 Depth | 紅隊 Frequency |
|---|---|---|---|---|
| Tier 1 | Critical | Autonomous decisions affecting health, 安全, liberty, or financial standing | Full 評估 with 對抗性 測試 | Quarterly |
| Tier 2 | High | AI significantly influences human decisions in regulated domains | Comprehensive 評估 | Semi-annually |
| Tier 3 | Medium | Customer-facing AI with moderate impact potential | Standard 評估 | Annually |
| Tier 4 | Low | Internal tools, non-consequential applications | Lightweight 評估 | Every 2 years or event-driven |
Classification Decision Tree
Is the AI system making autonomous decisions?
├── Yes → Does the decision affect health, 安全, liberty, or financial standing?
│ ├── Yes → Tier 1 (Critical)
│ └── No → Tier 2 (High)
└── No → Does the AI significantly influence human decisions?
├── Yes → Is the domain regulated (healthcare, finance, employment)?
│ ├── Yes → Tier 2 (High)
│ └── No → Tier 3 (Medium)
└── No → Tier 4 (Low)
Risk Scoring Frameworks
Quantitative Scoring Matrix
Use this scoring matrix to consistently 評估 AI risks across systems:
Likelihood Score (L):
| Score | Level | Description |
|---|---|---|
| 1 | Rare | Requires nation-state resources and novel techniques |
| 2 | Unlikely | Requires significant expertise and targeted effort |
| 3 | Possible | Achievable by skilled practitioners with available tools |
| 4 | Likely | Achievable by moderately skilled attackers with public techniques |
| 5 | Almost certain | Trivially exploitable, public exploits available |
Impact Score (I):
| Score | Level | Description |
|---|---|---|
| 1 | Negligible | No measurable harm, minor inconvenience |
| 2 | Minor | Limited data exposure, temporary service disruption |
| 3 | Moderate | Personal data breach, financial loss under $100K, reputational harm |
| 4 | Major | Large-scale data breach, financial loss over $1M, regulatory action |
| 5 | Catastrophic | Physical harm, loss of life, existential business risk, mass data breach |
Risk Score = L x I
| Risk Score | Rating | Treatment Approach |
|---|---|---|
| 1-4 | Low | Accept or monitor |
| 5-9 | Medium | Mitigate within standard timelines |
| 10-15 | High | Mitigate with priority |
| 16-25 | Critical | Immediate action required |
AI-Specific Risk Dimensions
Beyond likelihood and impact, AI systems require 評估 across additional dimensions:
| Dimension | Score Range | What It Measures |
|---|---|---|
| Autonomy | 1-5 | Degree of autonomous decision-making without human oversight |
| Opacity | 1-5 | Difficulty of explaining or interpreting AI decisions |
| Scale | 1-5 | Number of people or decisions affected |
| Reversibility | 1-5 | Difficulty of undoing harm caused by AI errors |
| Bias potential | 1-5 | Likelihood of discriminatory outcomes across protected groups |
Composite AI Risk Score = (L x I) + Autonomy + Opacity + Scale + Reversibility + Bias potential
This produces a score range of 6-50, providing finer granularity for prioritization:
| Composite Score | Priority | Action |
|---|---|---|
| 6-15 | Standard | Routine 評估 and 監控 |
| 16-25 | Elevated | Enhanced 評估, periodic 紅隊演練 |
| 26-35 | High | Comprehensive 評估, regular 紅隊演練, continuous 監控 |
| 36-50 | Critical | Immediate comprehensive 評估, frequent 紅隊演練, real-time 監控 |
Threat Identification for AI Systems
AI-Specific Threat Categories
| Threat Category | Description | 範例 攻擊 | 評估 Method |
|---|---|---|---|
| 輸入 manipulation | 對抗性 modification of AI inputs to alter outputs | Prompt injection, 對抗性 examples, 資料投毒 | Red team 測試, automated scanning |
| Data extraction | Unauthorized extraction of 訓練資料 or model information | 訓練資料 extraction, model stealing, membership 推論 | Data extraction 測試, privacy audits |
| 輸出 manipulation | Causing the AI to produce harmful, biased, or incorrect outputs | Jailbreaking, hallucination 利用, bias amplification | 安全 測試, bias 評估 |
| System compromise | Traditional 安全 attacks on AI infrastructure | API 利用, model weight theft, 供應鏈 attacks | Penetration 測試, infrastructure 評估 |
| Operational degradation | 攻擊 that degrade AI performance over time | Model drift 投毒, feedback loop manipulation, resource exhaustion | Performance 監控, drift 偵測 |
Threat Modeling Template
對每個 AI system, document threats using this structure:
| Field | Description |
|---|---|
| Threat ID | Unique identifier (e.g., T-GenAI-001) |
| Category | From the threat categories above |
| Description | Detailed description of the threat scenario |
| 攻擊 vector | How the threat would be realized (e.g., API 輸入, 訓練 pipeline, RAG data source) |
| Preconditions | What must be true for the attack to succeed |
| Impact | Specific consequences if the threat is realized |
| Existing controls | Current mitigations in place |
| Residual risk | Risk level after considering existing controls |
| 測試 approach | How 紅隊演練 would validate this threat |
Risk Treatment Planning
Treatment Options
| Treatment | When to Apply | Documentation Required |
|---|---|---|
| Accept | Risk score is within organizational risk appetite and regulatory tolerance | Risk acceptance form signed by risk owner |
| Mitigate | Controls can reduce risk to acceptable levels at reasonable cost | Control 實作 plan with timeline and validation criteria |
| Transfer | Risk can be shifted to a third party (insurance, contract provisions) | Insurance policy or contractual provisions documented |
| Avoid | Risk is unacceptable and no adequate 緩解 exists | Decision to discontinue the AI system or use case |
Control Selection Framework
When selecting controls to mitigate AI risks, map to applicable frameworks:
| Risk Category | Technical Controls | Process Controls | Framework Reference |
|---|---|---|---|
| Prompt injection | 輸入 filtering, 輸出 scanning, prompt hardening | 安全 review, change management | OWASP LLM01, ISO 42001 A.6.2.4 |
| Data extraction | Differential privacy, 輸出 filtering, rate limiting | Data classification, access management | NIST AI 600-1 (Privacy), ISO 42001 A.7.4 |
| Bias and fairness | Bias 測試, fairness constraints, diverse 訓練資料 | Impact assessments, human oversight | EU AI Act Art. 10, NIST AI 600-1 (Bias) |
| Model 安全 | Access controls, encryption, integrity verification | Model governance, lifecycle management | ISO 42001 A.6.2.5, SOC 2 CC6.1 |
| 安全 failures | 安全 測試, 護欄, kill switches | Incident response, escalation procedures | EU AI Act Art. 9, ISO 42001 A.5.4 |
Risk 評估 Templates
Template: GenAI Application Risk 評估
| Section | Content |
|---|---|
| System identification | System name, owner, purpose, users, deployment environment |
| AI characteristics | Model type, 訓練資料, 推論 method, update frequency |
| Regulatory applicability | Applicable regulations (EU AI Act tier, state laws, sector regulations) |
| Threat inventory | Enumerated threats with scores from the threat modeling template |
| Control inventory | Current controls mapped to threats |
| Gap analysis | Threats without adequate controls |
| Risk register | Scored and prioritized risk list |
| Treatment plan | Selected treatment 對每個 risk with timeline |
| 測試 requirements | Red team 測試 activities required to validate controls |
| Review schedule | When the 評估 will be updated |
Template: AI System Risk Register Entry
| Field | 範例 Value |
|---|---|
| Risk ID | R-2026-042 |
| AI System | Customer Service Chatbot v3.2 |
| Risk Description | 系統提示詞 extraction revealing internal business logic and API schemas |
| Threat Category | Data extraction |
| Likelihood | 4 (Likely -- public techniques available) |
| Impact | 3 (Moderate -- business logic exposure, potential for further 利用) |
| Risk Score | 12 (High) |
| Existing Controls | 輸入 filtering for known extraction patterns |
| Control Effectiveness | Partial -- bypassed in 3 of 10 紅隊 tests |
| Treatment | Mitigate -- 實作 multi-layer prompt protection and 監控 |
| Timeline | 30 days for 實作, 45 days for validation |
| Risk Owner | Head of AI Engineering |
| Last Assessed | 2026-03-15 |
| Next Review | 2026-06-15 |
Integrating 紅隊 Results
Feeding 紅隊 Findings into Risk Assessments
Red team assessments produce evidence that directly updates risk assessments:
| 紅隊 Finding | Risk 評估 Update |
|---|---|
| New 漏洞 discovered | Add to threat inventory, score, and add to risk register |
| Existing control bypassed | Update control effectiveness rating, increase residual risk score |
| Control validated as effective | Document as evidence of control operating effectiveness |
| New attack technique used | Add technique to 威脅模型, 評估 applicability to other systems |
| Severity rating from 紅隊 | Cross-reference with risk score to validate scoring methodology |
Continuous Risk 評估
Risk assessments should not be static documents. Integrate continuous inputs:
| 輸入 Source | Update Frequency | Risk 評估 Action |
|---|---|---|
| Automated 紅隊 tests | Daily/weekly | Update control effectiveness metrics |
| Manual 紅隊 assessments | Quarterly | Comprehensive risk re-評估 |
| Threat intelligence | Ongoing | Update threat landscape and likelihood scores |
| Regulatory changes | As published | Update regulatory applicability and required controls |
| AI system changes | Each deployment | Trigger reassessment of affected risk areas |
| Incident reports | As they occur | Update likelihood scores, validate impact assessments |
Common 評估 Mistakes
| Mistake | Consequence | Correction |
|---|---|---|
| Assessing AI risk with IT risk frameworks only | AI-specific risks (bias, hallucination, emergent behavior) are missed | Add AI-specific risk dimensions to existing frameworks |
| Scoring all AI risks equally | Resources spread thin across low-priority items | Use the composite scoring methodology to differentiate |
| Ignoring 供應鏈 risk | Third-party AI components create unassessed risk | Include all AI components in scope, including APIs and foundation models |
| Conflating model risk with system risk | Model may be safe but system integration creates 漏洞 | 評估 the complete system, not just 模型 |
| No re-評估 trigger | Assessments become stale after AI system updates | Define triggers for re-評估 (model change, data change, regulatory change) |