Skip to main content
redteams.ai
All tags

# features

1 articletagged with “features

Mechanistic Interpretability for Security

Understanding model circuits to find vulnerabilities: feature identification, circuit analysis, attention pattern exploitation, and using mechanistic interpretability for offensive and defensive AI security.

mechanistic-interpretabilitycircuitsfeaturesattentionsecurity
Expert