Skip to main content
redteams.ai
All tags

# sae

1 articletagged with “sae

Sparse Autoencoders for Security Analysis

Using sparse autoencoders and mechanistic interpretability to identify and manipulate safety-relevant features.

frontiersaeinterpretability
Expert