1 articletagged with “exploit-discovery”
Use mechanistic interpretability tools to discover exploitable circuits and features in transformer models.