1 articletagged with “representation-probing”
Probe model internal representations to discover exploitable features and latent vulnerability patterns.