1 articletagged with “information-routing”
How the self-attention mechanism in transformers can be leveraged to steer model behavior, hijack information routing, and bypass safety instructions.