1 articletagged with “intervention”
Modifying model behavior at inference time through activation patching, steering vectors, and attention manipulation.