# scaling
標記為「scaling」的 12 篇文章
Scaling Laws, Emergence & Capability Jumps
How scaling laws predict model performance, why emergent capabilities create unpredictable security properties, and what sleeper capabilities and emergent misalignment mean for red teaming.
Injection Scaling Laws
Research into how injection effectiveness scales with model size, training compute, and safety training investment.
Red Team Automation Strategy
When and how to automate AI red teaming: tool selection, CI/CD integration, continuous automated red teaming (CART), human-in-the-loop design, and scaling assessment coverage through automation.
Scaling Red Team Programs
Growing AI red team programs from solo practitioner to full team: hiring strategies, process standardization, automation balance, and budget justification.
Emergence & Capability Jump Exploitation
How emergent capabilities create unpredictable security properties: testing for hidden capabilities, sleeper agent scenarios, deceptive alignment concerns, and capability elicitation.
Few-Shot Attack Scaling Analysis
Detailed analysis of how few-shot examples scale to influence model behavior, from 2-shot to many-shot regime.
縮放定律、湧現與能力躍升
縮放定律如何預測模型效能、湧現能力為何造成不可預期的安全特性,以及沉睡能力與湧現式對齊失誤對紅隊的意涵。
Injection Scaling Laws
Research into how injection effectiveness scales with model size, training compute, and safety training investment.
紅隊 Automation Strategy
When and how to automate AI red teaming: tool selection, CI/CD integration, continuous automated red teaming (CART), human-in-the-loop design, and scaling assessment coverage through automation.
擴展紅隊計畫
將 AI 紅隊計畫從獨立實務人員成長為完整團隊:招募策略、流程標準化、自動化平衡與預算論證。
湧現與能力跳躍利用
湧現能力如何造就不可預測之安全屬性:測試隱藏能力、sleeper agent 情境、欺騙性對齊關切,與能力引出。
Few-Shot 攻擊 Scaling Analysis
Detailed analysis of how few-shot examples scale to influence model behavior, from 2-shot to many-shot regime.