Loading...
1 artikelgetagd met “exploit-discovery”
Use mechanistic interpretability tools to discover exploitable circuits and features in transformer models.