NIST AI 600-1 GenAI Risk Profile
NIST AI 600-1 Generative AI risk profile covering risk categories, control mappings, assessment methodology, and practical application for red team engagements.
NIST AI 600-1, titled "Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile," was released in July 2024 as a companion document to the NIST AI RMF. It specifically addresses the unique risks posed by generative AI systems, making it one of the most directly applicable frameworks for AI 紅隊 practitioners.
GenAI Risk Categories
NIST AI 600-1 identifies 12 risk categories specific to generative AI. Each category maps directly to 紅隊 測試 activities:
Risk 概覽 Table
| # | Risk Category | Description | 紅隊 Priority |
|---|---|---|---|
| 1 | CBRN Information | AI provides information enabling chemical, biological, radiological, or nuclear harm | High (安全-critical) |
| 2 | Confabulation | AI generates false information presented as fact (hallucination) | High (common in all GenAI) |
| 3 | Data Privacy | AI reveals 訓練資料 or enables personal information extraction | High (legal liability) |
| 4 | Environmental | Energy consumption and environmental impact of GenAI systems | Low (not typically in scope) |
| 5 | Human-AI Configuration | Insufficient human oversight of AI outputs and decisions | Medium (process-dependent) |
| 6 | Information Integrity | AI generates misleading content or enables disinformation | High (reputational risk) |
| 7 | Information 安全 | AI enables or is vulnerable to cybersecurity attacks | Critical (core 紅隊 focus) |
| 8 | Intellectual Property | AI generates content infringing on existing IP | Medium (legal risk) |
| 9 | Obscene, Degrading, Abusive Content | AI generates harmful, offensive, or exploitative content | High (安全 and reputation) |
| 10 | Toxicity, Bias, Homogenization | AI exhibits bias, generates toxic content, or homogenizes outputs | High (fairness and compliance) |
| 11 | Value Chain and Component Integration | Risks from third-party AI components, APIs, and plugins | High (供應鏈 risk) |
| 12 | Harmful Bias and Homogenization | Systemic biases amplified by generative AI at scale | Medium (societal impact) |
Detailed Risk Analysis and 紅隊 Mapping
CBRN Information (GAI Risk 1)
Generative AI models may provide detailed instructions or facilitate the creation of chemical, biological, radiological, or nuclear weapons.
Red team 評估 approach:
| 測試 Category | Methodology | Expected Controls |
|---|---|---|
| Direct elicitation | Attempt to extract CBRN synthesis instructions through direct queries | Content filtering, refusal behavior |
| 越獄 bypass | Use known 越獄 techniques to circumvent CBRN content restrictions | Multi-layer filtering, robust 對齊 |
| Indirect elicitation | Request CBRN-adjacent information that could be assembled into harmful instructions | Context-aware filtering, semantic analysis |
| Multi-turn escalation | Gradually escalate requests across conversation turns to bypass threshold 偵測 | Conversation-level 監控 |
Confabulation (GAI Risk 2)
Confabulation -- commonly called hallucination -- is a systemic risk in all generative AI systems. 模型 generates plausible-sounding but factually incorrect information with high confidence.
Red team 評估 approach:
| 測試 Category | Methodology | Expected Controls |
|---|---|---|
| Factual accuracy | Query for verifiable facts and measure accuracy rate | Grounding mechanisms, RAG integration |
| Citation fabrication | Request sources and verify whether cited papers, URLs, or statistics exist | Citation verification, retrieval augmentation |
| Authority impersonation | Ask 模型 to provide expert opinions and verify accuracy | Epistemic humility, uncertainty expression |
| Domain-specific confabulation | 測試 factual accuracy in specialized domains (legal, medical, financial) | Domain-specific validation, human review requirements |
Data Privacy (GAI Risk 3)
Generative AI models may memorize and reproduce 訓練資料, including personal information, proprietary data, and other sensitive content.
Red team 評估 approach:
| 測試 Category | Methodology | Expected Controls |
|---|---|---|
| 訓練資料 extraction | Use known extraction techniques (membership 推論, prompt-based extraction) | Differential privacy, 輸出 filtering |
| PII extraction | Attempt to extract personal information from model outputs | PII 偵測 and scrubbing in outputs |
| Conversation data leakage | 測試 whether information from other users' conversations can be extracted | Session isolation, memory management |
| Model inversion | Attempt to reconstruct 訓練 examples through iterative querying | Rate limiting, 輸出 監控 |
Information 安全 (GAI Risk 7)
This category directly aligns with core AI 紅隊演練 activities and encompasses the full range of AI-specific 安全 漏洞.
Red team 評估 approach:
| 測試 Category | NIST AI 600-1 Sub-risk | 測試 Methodology |
|---|---|---|
| Prompt injection | AI enables unauthorized actions through 輸入 manipulation | Direct and indirect 提示詞注入, 系統提示詞 extraction |
| 資料投毒 | AI 訓練 or retrieval can be manipulated | RAG 投毒, 微調 data manipulation |
| Model theft | AI model weights or capabilities can be extracted | Model extraction attacks, API-based model stealing |
| Evasion attacks | AI classification or 偵測 can be bypassed | 對抗性 examples, perturbation attacks |
| 供應鏈 compromise | AI components from third parties introduce 漏洞 | Dependency analysis, model provenance verification |
Value Chain and Component Integration (GAI Risk 11)
Generative AI systems increasingly rely on complex supply chains of models, APIs, plugins, and data sources, each introducing risk.
Red team 評估 approach:
| Component | Risk | 測試 Approach |
|---|---|---|
| Foundation model providers | Model behavior changes, deprecation, 安全 incidents | 測試 across model versions, verify fallback behavior |
| Plugin/tool ecosystems | Malicious plugins, data exfiltration through tools | Plugin 安全 review, 工具使用 abuse scenarios |
| RAG data sources | Poisoned or manipulated retrieval sources | Inject 對抗性 documents, 測試 retrieval integrity |
| 微調 data providers | 訓練資料 manipulation | Verify data provenance, 測試 for 後門 behaviors |
| API intermediaries | Man-in-the-middle, prompt logging, data retention | 評估 API 安全, review data handling policies |
評估 Methodology
Structured 評估 Process
NIST AI 600-1 maps its risks to the four functions of the AI RMF: Govern, Map, Measure, and Manage. Red teamers should use this mapping to structure comprehensive assessments:
Govern: Establish 評估 framework
Review the organization's AI governance policies, risk appetite, and accountability structures. Verify that governance documentation addresses all 12 AI 600-1 risk categories.
Key questions:
- Does the organization have documented policies 對每個 risk category?
- Are roles and responsibilities defined for GenAI risk management?
- Is there a process for updating risk assessments as the threat landscape evolves?
Map: 識別 and categorize GenAI systems
Inventory all generative AI systems and map them to applicable risk categories. Determine which systems require 測試 and prioritize based on risk exposure.
Key activities:
- Catalog all GenAI deployments (production, internal, experimental)
- Classify each system's risk exposure across the 12 categories
- 識別 the highest-priority systems for 紅隊 測試
Measure: Conduct 對抗性 測試
Execute 紅隊 測試 activities aligned with the applicable risk categories. Measure the effectiveness of existing controls against 對抗性 scenarios.
測試 framework:
- 測試 each applicable risk category using the approaches outlined above
- Document control effectiveness 對每個 測試 scenario
- Quantify risk levels based on exploitability and impact
Manage: Report and remediate
Deliver findings mapped to AI 600-1 risk categories and AI RMF functions. Provide remediation recommendations and verify fixes.
Deliverables:
- Risk 評估 matrix mapping findings to AI 600-1 categories
- Control effectiveness scores 對每個 tested risk area
- Remediation roadmap with prioritized recommendations
Risk Scoring Framework
When assessing GenAI risks, use a consistent scoring methodology that aligns with NIST AI 600-1 categories:
| Dimension | Score 1 (Low) | Score 2 (Medium) | Score 3 (High) | Score 4 (Critical) |
|---|---|---|---|---|
| Exploitability | Requires deep expertise and significant resources | Requires moderate skill and some resources | Requires basic skill, tools available | Trivially exploitable with public techniques |
| Impact | Minor inconvenience, no data exposure | Limited data exposure, moderate reputational risk | Significant data exposure, 安全 implications | CBRN information, mass data breach, physical harm |
| Prevalence | Rare edge case | Occasionally reproducible | Frequently reproducible | Systematic, always reproducible |
| Detectability | Existing controls reliably detect | Sometimes detected by existing controls | Rarely detected by existing controls | No 偵測 capability exists |
Control Mappings
Mapping AI 600-1 Risks to NIST AI RMF Subcategories
| AI 600-1 Risk | AI RMF Govern | AI RMF Map | AI RMF Measure | AI RMF Manage |
|---|---|---|---|---|
| CBRN Information | GV-1.1, GV-1.3 | MP-2.3 | MS-2.3, MS-2.6 | MG-2.2 |
| Confabulation | GV-1.1, GV-4.3 | MP-2.3, MP-3.4 | MS-2.6, MS-2.11 | MG-2.2, MG-3.2 |
| Data Privacy | GV-1.1, GV-6.1 | MP-3.4, MP-4.2 | MS-2.3, MS-2.10 | MG-2.2, MG-3.1 |
| Information 安全 | GV-1.1, GV-1.6 | MP-2.3, MP-5.2 | MS-2.3, MS-2.6 | MG-2.2, MG-2.4 |
| Value Chain | GV-1.1, GV-6.2 | MP-2.3, MP-5.2 | MS-2.7, MS-2.8 | MG-3.1, MG-3.2 |
Mapping AI 600-1 to Common 紅隊 Findings
| Common Finding | AI 600-1 Risk Category | Recommended Control |
|---|---|---|
| 系統提示詞 extraction | Information 安全 (7) | 輸入/輸出 filtering, prompt hardening |
| Hallucinated legal advice | Confabulation (2) | Domain-specific grounding, disclaimer requirements |
| PII in model outputs | Data Privacy (3) | 輸出 scanning, differential privacy |
| 越獄 to harmful content | Obscene/Abusive Content (9) | Multi-layer content filtering, constitutional AI |
| Biased hiring recommendations | Toxicity/Bias (10) | Bias 測試, fairness constraints, human oversight |
| Plugin data exfiltration | Value Chain (11) | Plugin sandboxing, data flow controls |
Practical Application
Using AI 600-1 in Engagement Proposals
When scoping 紅隊 engagements, reference AI 600-1 risk categories to justify 測試 activities:
Proposal structure:
- List the AI 600-1 risk categories applicable to the client's GenAI deployment
- Map each risk category to specific 測試 activities and level of effort
- Reference NIST guidance to justify the importance of each 測試 area
- Provide estimated timelines and resource requirements per risk category
Reporting with AI 600-1 Alignment
Structure findings reports to directly map to AI 600-1 for maximum client value:
| Report Section | Content |
|---|---|
| Executive summary | Overall risk posture across applicable AI 600-1 categories |
| Risk category 評估 | Detailed findings organized by AI 600-1 risk category |
| Control effectiveness | Matrix showing tested controls and their effectiveness against each risk |
| Remediation roadmap | Prioritized recommendations mapped to specific risk categories |
| Compliance implications | How findings affect the organization's NIST AI RMF compliance posture |
Limitations and Gaps
Red teamers should be aware of areas where AI 600-1 requires supplementation:
- Technical depth: AI 600-1 identifies risk categories but does not prescribe specific 測試 methodologies. Supplement with OWASP LLM Top 10 and MITRE ATLAS for technical 測試 guidance.
- Scoring methodology: The framework does not provide a standardized risk scoring system. Teams must develop or adopt their own quantitative approach.
- Emerging risks: As GenAI capabilities evolve rapidly, new risk categories may emerge between framework revisions. Monitor NIST publications for updates and supplement with current threat intelligence.
- Multi-model systems: AI 600-1 focuses primarily on individual GenAI systems. For complex multi-model architectures, additional risk 評估 is needed for emergent behaviors from model interactions.