# features
標記為「features」的 2 篇文章
Mechanistic Interpretability for Security
Understanding model circuits to find vulnerabilities: feature identification, circuit analysis, attention pattern exploitation, and using mechanistic interpretability for offensive and defensive AI security.
mechanistic-interpretabilitycircuitsfeaturesattentionsecurity
Mechanistic Interpretability for 安全
Understanding model circuits to find vulnerabilities: feature identification, circuit analysis, attention pattern exploitation, and using mechanistic interpretability for offensive and defensive AI security.
mechanistic-interpretabilitycircuitsfeaturesattentionsecurity