# internal-representations

1 articletagged with “internal-representations”

Representation Engineering for Security

Reading and manipulating model internal representations for security: activation steering, concept probing, representation-level safety controls, and security applications of representation engineering.

representation-engineeringactivation-steeringinterpretabilityinternal-representationssafety

Expert