1 articletagged with “security-probing”
Design evaluations that discover security-relevant emergent capabilities in frontier language models.