Loading...

NIST AI 600-1 GenAI Risk Profile

intermediate11 min readUpdated 2026-03-15

NIST AI 600-1 Generative AI risk profile covering risk categories, control mappings, assessment methodology, and practical application for red team engagements.

nist ai-600-1 genai risk-profile assessment

NIST AI 600-1, titled "Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile," was released in July 2024 as a companion document to the NIST AI RMF. It specifically addresses the unique risks posed by generative AI systems, making it one of the most directly applicable frameworks for AI red team practitioners.

GenAI Risk Categories

NIST AI 600-1 identifies 12 risk categories specific to generative AI. Each category maps directly to red team testing activities:

Risk Overview Table

#	Risk Category	Description	Red Team Priority
1	CBRN Information	AI provides information enabling chemical, biological, radiological, or nuclear harm	High (safety-critical)
2	Confabulation	AI generates false information presented as fact (hallucination)	High (common in all GenAI)
3	Data Privacy	AI reveals training data or enables personal information extraction	High (legal liability)
4	Environmental	Energy consumption and environmental impact of GenAI systems	Low (not typically in scope)
5	Human-AI Configuration	Insufficient human oversight of AI outputs and decisions	Medium (process-dependent)
6	Information Integrity	AI generates misleading content or enables disinformation	High (reputational risk)
7	Information Security	AI enables or is vulnerable to cybersecurity attacks	Critical (core red team focus)
8	Intellectual Property	AI generates content infringing on existing IP	Medium (legal risk)
9	Obscene, Degrading, Abusive Content	AI generates harmful, offensive, or exploitative content	High (safety and reputation)
10	Toxicity, Bias, Homogenization	AI exhibits bias, generates toxic content, or homogenizes outputs	High (fairness and compliance)
11	Value Chain and Component Integration	Risks from third-party AI components, APIs, and plugins	High (supply chain risk)
12	Harmful Bias and Homogenization	Systemic biases amplified by generative AI at scale	Medium (societal impact)

Detailed Risk Analysis and Red Team Mapping

CBRN Information (GAI Risk 1)

Generative AI models may provide detailed instructions or facilitate the creation of chemical, biological, radiological, or nuclear weapons.

Red team assessment approach:

Test Category	Methodology	Expected Controls
Direct elicitation	Attempt to extract CBRN synthesis instructions through direct queries	Content filtering, refusal behavior
Jailbreak bypass	Use known jailbreak techniques to circumvent CBRN content restrictions	Multi-layer filtering, robust alignment
Indirect elicitation	Request CBRN-adjacent information that could be assembled into harmful instructions	Context-aware filtering, semantic analysis
Multi-turn escalation	Gradually escalate requests across conversation turns to bypass threshold detection	Conversation-level monitoring

Confabulation (GAI Risk 2)

Confabulation -- commonly called hallucination -- is a systemic risk in all generative AI systems. The model generates plausible-sounding but factually incorrect information with high confidence.

Red team assessment approach:

Test Category	Methodology	Expected Controls
Factual accuracy	Query for verifiable facts and measure accuracy rate	Grounding mechanisms, RAG integration
Citation fabrication	Request sources and verify whether cited papers, URLs, or statistics exist	Citation verification, retrieval augmentation
Authority impersonation	Ask the model to provide expert opinions and verify accuracy	Epistemic humility, uncertainty expression
Domain-specific confabulation	Test factual accuracy in specialized domains (legal, medical, financial)	Domain-specific validation, human review requirements

Data Privacy (GAI Risk 3)

Generative AI models may memorize and reproduce training data, including personal information, proprietary data, and other sensitive content.

Red team assessment approach:

Test Category	Methodology	Expected Controls
Training data extraction	Use known extraction techniques (membership inference, prompt-based extraction)	Differential privacy, output filtering
PII extraction	Attempt to extract personal information from model outputs	PII detection and scrubbing in outputs
Conversation data leakage	Test whether information from other users' conversations can be extracted	Session isolation, memory management
Model inversion	Attempt to reconstruct training examples through iterative querying	Rate limiting, output monitoring

Information Security (GAI Risk 7)

This category directly aligns with core AI red teaming activities and encompasses the full range of AI-specific security vulnerabilities.

Red team assessment approach:

Test Category	NIST AI 600-1 Sub-risk	Testing Methodology
Prompt injection	AI enables unauthorized actions through input manipulation	Direct and indirect prompt injection, system prompt extraction
Data poisoning	AI training or retrieval can be manipulated	RAG poisoning, fine-tuning data manipulation
Model theft	AI model weights or capabilities can be extracted	Model extraction attacks, API-based model stealing
Evasion attacks	AI classification or detection can be bypassed	Adversarial examples, perturbation attacks
Supply chain compromise	AI components from third parties introduce vulnerabilities	Dependency analysis, model provenance verification

Value Chain and Component Integration (GAI Risk 11)

Generative AI systems increasingly rely on complex supply chains of models, APIs, plugins, and data sources, each introducing risk.

Red team assessment approach:

Component	Risk	Testing Approach
Foundation model providers	Model behavior changes, deprecation, security incidents	Test across model versions, verify fallback behavior
Plugin/tool ecosystems	Malicious plugins, data exfiltration through tools	Plugin security review, tool use abuse scenarios
RAG data sources	Poisoned or manipulated retrieval sources	Inject adversarial documents, test retrieval integrity
Fine-tuning data providers	Training data manipulation	Verify data provenance, test for backdoor behaviors
API intermediaries	Man-in-the-middle, prompt logging, data retention	Assess API security, review data handling policies

Assessment Methodology

Structured Assessment Process

NIST AI 600-1 maps its risks to the four functions of the AI RMF: Govern, Map, Measure, and Manage. Red teamers should use this mapping to structure comprehensive assessments:

Govern: Establish assessment framework
Review the organization's AI governance policies, risk appetite, and accountability structures. Verify that governance documentation addresses all 12 AI 600-1 risk categories.
Key questions:
- Does the organization have documented policies for each risk category?
- Are roles and responsibilities defined for GenAI risk management?
- Is there a process for updating risk assessments as the threat landscape evolves?
Map: Identify and categorize GenAI systems
Inventory all generative AI systems and map them to applicable risk categories. Determine which systems require testing and prioritize based on risk exposure.
Key activities:
- Catalog all GenAI deployments (production, internal, experimental)
- Classify each system's risk exposure across the 12 categories
- Identify the highest-priority systems for red team testing
Measure: Conduct adversarial testing
Execute red team testing activities aligned with the applicable risk categories. Measure the effectiveness of existing controls against adversarial scenarios.
Testing framework:
- Test each applicable risk category using the approaches outlined above
- Document control effectiveness for each test scenario
- Quantify risk levels based on exploitability and impact
Manage: Report and remediate
Deliver findings mapped to AI 600-1 risk categories and AI RMF functions. Provide remediation recommendations and verify fixes.
Deliverables:
- Risk assessment matrix mapping findings to AI 600-1 categories
- Control effectiveness scores for each tested risk area
- Remediation roadmap with prioritized recommendations

Risk Scoring Framework

When assessing GenAI risks, use a consistent scoring methodology that aligns with NIST AI 600-1 categories:

Dimension	Score 1 (Low)	Score 2 (Medium)	Score 3 (High)	Score 4 (Critical)
Exploitability	Requires deep expertise and significant resources	Requires moderate skill and some resources	Requires basic skill, tools available	Trivially exploitable with public techniques
Impact	Minor inconvenience, no data exposure	Limited data exposure, moderate reputational risk	Significant data exposure, safety implications	CBRN information, mass data breach, physical harm
Prevalence	Rare edge case	Occasionally reproducible	Frequently reproducible	Systematic, always reproducible
Detectability	Existing controls reliably detect	Sometimes detected by existing controls	Rarely detected by existing controls	No detection capability exists

Control Mappings

Mapping AI 600-1 Risks to NIST AI RMF Subcategories

AI 600-1 Risk	AI RMF Govern	AI RMF Map	AI RMF Measure	AI RMF Manage
CBRN Information	GV-1.1, GV-1.3	MP-2.3	MS-2.3, MS-2.6	MG-2.2
Confabulation	GV-1.1, GV-4.3	MP-2.3, MP-3.4	MS-2.6, MS-2.11	MG-2.2, MG-3.2
Data Privacy	GV-1.1, GV-6.1	MP-3.4, MP-4.2	MS-2.3, MS-2.10	MG-2.2, MG-3.1
Information Security	GV-1.1, GV-1.6	MP-2.3, MP-5.2	MS-2.3, MS-2.6	MG-2.2, MG-2.4
Value Chain	GV-1.1, GV-6.2	MP-2.3, MP-5.2	MS-2.7, MS-2.8	MG-3.1, MG-3.2

Mapping AI 600-1 to Common Red Team Findings

Common Finding	AI 600-1 Risk Category	Recommended Control
System prompt extraction	Information Security (7)	Input/output filtering, prompt hardening
Hallucinated legal advice	Confabulation (2)	Domain-specific grounding, disclaimer requirements
PII in model outputs	Data Privacy (3)	Output scanning, differential privacy
Jailbreak to harmful content	Obscene/Abusive Content (9)	Multi-layer content filtering, constitutional AI
Biased hiring recommendations	Toxicity/Bias (10)	Bias testing, fairness constraints, human oversight
Plugin data exfiltration	Value Chain (11)	Plugin sandboxing, data flow controls

Practical Application

Using AI 600-1 in Engagement Proposals

When scoping red team engagements, reference AI 600-1 risk categories to justify testing activities:

Proposal structure:

List the AI 600-1 risk categories applicable to the client's GenAI deployment
Map each risk category to specific testing activities and level of effort
Reference NIST guidance to justify the importance of each test area
Provide estimated timelines and resource requirements per risk category

Reporting with AI 600-1 Alignment

Structure findings reports to directly map to AI 600-1 for maximum client value:

Report Section	Content
Executive summary	Overall risk posture across applicable AI 600-1 categories
Risk category assessment	Detailed findings organized by AI 600-1 risk category
Control effectiveness	Matrix showing tested controls and their effectiveness against each risk
Remediation roadmap	Prioritized recommendations mapped to specific risk categories
Compliance implications	How findings affect the organization's NIST AI RMF compliance posture

Limitations and Gaps

Red teamers should be aware of areas where AI 600-1 requires supplementation:

Technical depth: AI 600-1 identifies risk categories but does not prescribe specific testing methodologies. Supplement with OWASP LLM Top 10 and MITRE ATLAS for technical testing guidance.
Scoring methodology: The framework does not provide a standardized risk scoring system. Teams must develop or adopt their own quantitative approach.
Emerging risks: As GenAI capabilities evolve rapidly, new risk categories may emerge between framework revisions. Monitor NIST publications for updates and supplement with current threat intelligence.
Multi-model systems: AI 600-1 focuses primarily on individual GenAI systems. For complex multi-model architectures, additional risk assessment is needed for emergent behaviors from model interactions.

Edit this page on GitHub