# capabilities
標記為「capabilities」的 12 篇文章
Scaling Laws, Emergence & Capability Jumps
How scaling laws predict model performance, why emergent capabilities create unpredictable security properties, and what sleeper capabilities and emergent misalignment mean for red teaming.
Security Implications of Emergent Capabilities
How emergent capabilities in frontier models create new and unpredictable security risks.
Tool-Augmented Model Risks
Security risks introduced when models gain access to external tools, APIs, and code execution.
LLM API Enumeration
Advanced techniques for enumerating LLM API capabilities, restrictions, hidden parameters, and undocumented features to build a comprehensive attack surface map.
The Alignment Tax
How safety training affects model capabilities: capability-safety tradeoffs, the cost of alignment, measuring alignment tax, and strategies for minimizing capability loss during safety training.
Capability-Based Access Control
Step-by-step walkthrough for implementing fine-grained capability controls for LLM features, covering capability token design, permission scoping, dynamic capability grants, and audit trails.
縮放定律、湧現與能力躍升
縮放定律如何預測模型效能、湧現能力為何造成不可預期的安全特性,以及沉睡能力與湧現式對齊失誤對紅隊的意涵。
安全 Implications of Emergent Capabilities
How emergent capabilities in frontier models create new and unpredictable security risks.
工具-Augmented 模型 Risks
安全 risks introduced when models gain access to external tools, APIs, and code execution.
LLM API Enumeration
進階 techniques for enumerating LLM API capabilities, restrictions, hidden parameters, and undocumented features to build a comprehensive attack surface map.
The Alignment Tax
How safety training affects model capabilities: capability-safety tradeoffs, the cost of alignment, measuring alignment tax, and strategies for minimizing capability loss during safety training.
Capability-Based Access Control
Step-by-step walkthrough for implementing fine-grained capability controls for LLM features, covering capability token design, permission scoping, dynamic capability grants, and audit trails.