1 articletagged with “hidden-capability”
How emergent capabilities create unpredictable security properties: testing for hidden capabilities, sleeper agent scenarios, deceptive alignment concerns, and capability elicitation.