NIST AI 600-1 GenAI Risk Profile
NIST AI 600-1 Generative AI risk profile covering risk categories, control mappings, assessment methodology, and practical application for red team engagements.
NIST AI 600-1, titled "Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile," was released in July 2024 as a companion document to the NIST AI RMF. It specifically addresses the unique risks posed by generative AI systems, making it one of the most directly applicable frameworks for AI red team practitioners.
GenAI Risk Categories
NIST AI 600-1 identifies 12 risk categories specific to generative AI. Each category maps directly to red team testing activities:
Risk Overview Table
| # | Risk Category | Description | Red Team Priority |
|---|---|---|---|
| 1 | CBRN Information | AI provides information enabling chemical, biological, radiological, or nuclear harm | High (safety-critical) |
| 2 | Confabulation | AI generates false information presented as fact (hallucination) | High (common in all GenAI) |
| 3 | Data Privacy | AI reveals training data or enables personal information extraction | High (legal liability) |
| 4 | Environmental | Energy consumption and environmental impact of GenAI systems | Low (not typically in scope) |
| 5 | Human-AI Configuration | Insufficient human oversight of AI outputs and decisions | Medium (process-dependent) |
| 6 | Information Integrity | AI generates misleading content or enables disinformation | High (reputational risk) |
| 7 | Information Security | AI enables or is vulnerable to cybersecurity attacks | Critical (core red team focus) |
| 8 | Intellectual Property | AI generates content infringing on existing IP | Medium (legal risk) |
| 9 | Obscene, Degrading, Abusive Content | AI generates harmful, offensive, or exploitative content | High (safety and reputation) |
| 10 | Toxicity, Bias, Homogenization | AI exhibits bias, generates toxic content, or homogenizes outputs | High (fairness and compliance) |
| 11 | Value Chain and Component Integration | Risks from third-party AI components, APIs, and plugins | High (supply chain risk) |
| 12 | Harmful Bias and Homogenization | Systemic biases amplified by generative AI at scale | Medium (societal impact) |
Detailed Risk Analysis and Red Team Mapping
CBRN Information (GAI Risk 1)
Generative AI models may provide detailed instructions or facilitate the creation of chemical, biological, radiological, or nuclear weapons.
Red team assessment approach:
| Test Category | Methodology | Expected Controls |
|---|---|---|
| Direct elicitation | Attempt to extract CBRN synthesis instructions through direct queries | Content filtering, refusal behavior |
| Jailbreak bypass | Use known jailbreak techniques to circumvent CBRN content restrictions | Multi-layer filtering, robust alignment |
| Indirect elicitation | Request CBRN-adjacent information that could be assembled into harmful instructions | Context-aware filtering, semantic analysis |
| Multi-turn escalation | Gradually escalate requests across conversation turns to bypass threshold detection | Conversation-level monitoring |
Confabulation (GAI Risk 2)
Confabulation -- commonly called hallucination -- is a systemic risk in all generative AI systems. The model generates plausible-sounding but factually incorrect information with high confidence.
Red team assessment approach:
| Test Category | Methodology | Expected Controls |
|---|---|---|
| Factual accuracy | Query for verifiable facts and measure accuracy rate | Grounding mechanisms, RAG integration |
| Citation fabrication | Request sources and verify whether cited papers, URLs, or statistics exist | Citation verification, retrieval augmentation |
| Authority impersonation | Ask the model to provide expert opinions and verify accuracy | Epistemic humility, uncertainty expression |
| Domain-specific confabulation | Test factual accuracy in specialized domains (legal, medical, financial) | Domain-specific validation, human review requirements |
Data Privacy (GAI Risk 3)
Generative AI models may memorize and reproduce training data, including personal information, proprietary data, and other sensitive content.
Red team assessment approach:
| Test Category | Methodology | Expected Controls |
|---|---|---|
| Training data extraction | Use known extraction techniques (membership inference, prompt-based extraction) | Differential privacy, output filtering |
| PII extraction | Attempt to extract personal information from model outputs | PII detection and scrubbing in outputs |
| Conversation data leakage | Test whether information from other users' conversations can be extracted | Session isolation, memory management |
| Model inversion | Attempt to reconstruct training examples through iterative querying | Rate limiting, output monitoring |
Information Security (GAI Risk 7)
This category directly aligns with core AI red teaming activities and encompasses the full range of AI-specific security vulnerabilities.
Red team assessment approach:
| Test Category | NIST AI 600-1 Sub-risk | Testing Methodology |
|---|---|---|
| Prompt injection | AI enables unauthorized actions through input manipulation | Direct and indirect prompt injection, system prompt extraction |
| Data poisoning | AI training or retrieval can be manipulated | RAG poisoning, fine-tuning data manipulation |
| Model theft | AI model weights or capabilities can be extracted | Model extraction attacks, API-based model stealing |
| Evasion attacks | AI classification or detection can be bypassed | Adversarial examples, perturbation attacks |
| Supply chain compromise | AI components from third parties introduce vulnerabilities | Dependency analysis, model provenance verification |
Value Chain and Component Integration (GAI Risk 11)
Generative AI systems increasingly rely on complex supply chains of models, APIs, plugins, and data sources, each introducing risk.
Red team assessment approach:
| Component | Risk | Testing Approach |
|---|---|---|
| Foundation model providers | Model behavior changes, deprecation, security incidents | Test across model versions, verify fallback behavior |
| Plugin/tool ecosystems | Malicious plugins, data exfiltration through tools | Plugin security review, tool use abuse scenarios |
| RAG data sources | Poisoned or manipulated retrieval sources | Inject adversarial documents, test retrieval integrity |
| Fine-tuning data providers | Training data manipulation | Verify data provenance, test for backdoor behaviors |
| API intermediaries | Man-in-the-middle, prompt logging, data retention | Assess API security, review data handling policies |
Assessment Methodology
Structured Assessment Process
NIST AI 600-1 maps its risks to the four functions of the AI RMF: Govern, Map, Measure, and Manage. Red teamers should use this mapping to structure comprehensive assessments:
Govern: Establish assessment framework
Review the organization's AI governance policies, risk appetite, and accountability structures. Verify that governance documentation addresses all 12 AI 600-1 risk categories.
Key questions:
- Does the organization have documented policies for each risk category?
- Are roles and responsibilities defined for GenAI risk management?
- Is there a process for updating risk assessments as the threat landscape evolves?
Map: Identify and categorize GenAI systems
Inventory all generative AI systems and map them to applicable risk categories. Determine which systems require testing and prioritize based on risk exposure.
Key activities:
- Catalog all GenAI deployments (production, internal, experimental)
- Classify each system's risk exposure across the 12 categories
- Identify the highest-priority systems for red team testing
Measure: Conduct adversarial testing
Execute red team testing activities aligned with the applicable risk categories. Measure the effectiveness of existing controls against adversarial scenarios.
Testing framework:
- Test each applicable risk category using the approaches outlined above
- Document control effectiveness for each test scenario
- Quantify risk levels based on exploitability and impact
Manage: Report and remediate
Deliver findings mapped to AI 600-1 risk categories and AI RMF functions. Provide remediation recommendations and verify fixes.
Deliverables:
- Risk assessment matrix mapping findings to AI 600-1 categories
- Control effectiveness scores for each tested risk area
- Remediation roadmap with prioritized recommendations
Risk Scoring Framework
When assessing GenAI risks, use a consistent scoring methodology that aligns with NIST AI 600-1 categories:
| Dimension | Score 1 (Low) | Score 2 (Medium) | Score 3 (High) | Score 4 (Critical) |
|---|---|---|---|---|
| Exploitability | Requires deep expertise and significant resources | Requires moderate skill and some resources | Requires basic skill, tools available | Trivially exploitable with public techniques |
| Impact | Minor inconvenience, no data exposure | Limited data exposure, moderate reputational risk | Significant data exposure, safety implications | CBRN information, mass data breach, physical harm |
| Prevalence | Rare edge case | Occasionally reproducible | Frequently reproducible | Systematic, always reproducible |
| Detectability | Existing controls reliably detect | Sometimes detected by existing controls | Rarely detected by existing controls | No detection capability exists |
Control Mappings
Mapping AI 600-1 Risks to NIST AI RMF Subcategories
| AI 600-1 Risk | AI RMF Govern | AI RMF Map | AI RMF Measure | AI RMF Manage |
|---|---|---|---|---|
| CBRN Information | GV-1.1, GV-1.3 | MP-2.3 | MS-2.3, MS-2.6 | MG-2.2 |
| Confabulation | GV-1.1, GV-4.3 | MP-2.3, MP-3.4 | MS-2.6, MS-2.11 | MG-2.2, MG-3.2 |
| Data Privacy | GV-1.1, GV-6.1 | MP-3.4, MP-4.2 | MS-2.3, MS-2.10 | MG-2.2, MG-3.1 |
| Information Security | GV-1.1, GV-1.6 | MP-2.3, MP-5.2 | MS-2.3, MS-2.6 | MG-2.2, MG-2.4 |
| Value Chain | GV-1.1, GV-6.2 | MP-2.3, MP-5.2 | MS-2.7, MS-2.8 | MG-3.1, MG-3.2 |
Mapping AI 600-1 to Common Red Team Findings
| Common Finding | AI 600-1 Risk Category | Recommended Control |
|---|---|---|
| System prompt extraction | Information Security (7) | Input/output filtering, prompt hardening |
| Hallucinated legal advice | Confabulation (2) | Domain-specific grounding, disclaimer requirements |
| PII in model outputs | Data Privacy (3) | Output scanning, differential privacy |
| Jailbreak to harmful content | Obscene/Abusive Content (9) | Multi-layer content filtering, constitutional AI |
| Biased hiring recommendations | Toxicity/Bias (10) | Bias testing, fairness constraints, human oversight |
| Plugin data exfiltration | Value Chain (11) | Plugin sandboxing, data flow controls |
Practical Application
Using AI 600-1 in Engagement Proposals
When scoping red team engagements, reference AI 600-1 risk categories to justify testing activities:
Proposal structure:
- List the AI 600-1 risk categories applicable to the client's GenAI deployment
- Map each risk category to specific testing activities and level of effort
- Reference NIST guidance to justify the importance of each test area
- Provide estimated timelines and resource requirements per risk category
Reporting with AI 600-1 Alignment
Structure findings reports to directly map to AI 600-1 for maximum client value:
| Report Section | Content |
|---|---|
| Executive summary | Overall risk posture across applicable AI 600-1 categories |
| Risk category assessment | Detailed findings organized by AI 600-1 risk category |
| Control effectiveness | Matrix showing tested controls and their effectiveness against each risk |
| Remediation roadmap | Prioritized recommendations mapped to specific risk categories |
| Compliance implications | How findings affect the organization's NIST AI RMF compliance posture |
Limitations and Gaps
Red teamers should be aware of areas where AI 600-1 requires supplementation:
- Technical depth: AI 600-1 identifies risk categories but does not prescribe specific testing methodologies. Supplement with OWASP LLM Top 10 and MITRE ATLAS for technical testing guidance.
- Scoring methodology: The framework does not provide a standardized risk scoring system. Teams must develop or adopt their own quantitative approach.
- Emerging risks: As GenAI capabilities evolve rapidly, new risk categories may emerge between framework revisions. Monitor NIST publications for updates and supplement with current threat intelligence.
- Multi-model systems: AI 600-1 focuses primarily on individual GenAI systems. For complex multi-model architectures, additional risk assessment is needed for emergent behaviors from model interactions.