# attention

標記為「attention」的 15 篇文章

注意力機制與安全

注意力機制如何運作，以及它們於啟用提示詞注入攻擊中的角色。

attentionmechanismsecurityfoundation

中級

給攻擊者的 Transformer 架構

深入探討 transformer 架構——注意力、前饋層與殘差連接——從哪些元件可被利用的視角。

transformerattentionarchitectureintermediate

中級

長上下文安全性

10 萬以上符元的上下文視窗所帶來的安全影響,包括注意力稀釋、指令遺忘與利用。

frontierlong-contextattention

進階

Mechanistic Interpretability for 安全

Understanding model circuits to find vulnerabilities: feature identification, circuit analysis, attention pattern exploitation, and using mechanistic interpretability for offensive and defensive AI security.

mechanistic-interpretabilitycircuitsfeaturesattentionsecurity

專家

注意力操弄研究

以機制可解釋性洞見為依據,研究直接操弄注意力模式以達成注入目標的方法。

researchattentionmanipulationmechanistic

進階

安全領域的注意力模式分析

運用注意力地圖來理解並利用模型行為,辨識安全相關的注意力模式,並將注意力機制用於紅隊操作。

attentiontransformersinterpretabilityattention-patternssecurity

進階

Attention Pattern 操控

Craft inputs that manipulate transformer attention patterns to prioritize adversarial content over safety instructions.

labsattentionmanipulationtransformeradvanced

進階

Lab: Context Overflow Attacks

探索 context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.

labcontext-overflowprompt-injectionattentionbeginner

入門

實驗室: 上下文 Window Overflow Attacks

動手lab exploring how overflowing a model's context window with padding content can push safety instructions out of the attention window與enable injection attacks.

labcontext-overflowattentioncontext-window

中級