Skip to main content
redteams.ai
All tags

# internal-representations

1 articletagged with “internal-representations

Representation Engineering for Security

Reading and manipulating model internal representations for security: activation steering, concept probing, representation-level safety controls, and security applications of representation engineering.

representation-engineeringactivation-steeringinterpretabilityinternal-representationssafety
Expert