# activations
標記為「activations」的 4 篇文章
LLM Internals
Deep technical exploration of LLM internal mechanisms for exploit development, covering activation analysis, alignment bypass primitives, and embedding space exploitation.
internalsactivationsalignmentembeddingsmechanistic-interpretabilityexploit-development
Activation Analysis & Hidden State Exploitation
Reading model internals via hidden state extraction, logprob probing, refusal direction analysis, and activation steering techniques.
activationshidden-statesprobinginformation-leakagemechanistic-interpretability
大型語言模型內部結構
為利用開發深入探索大型語言模型內部機制的技術,涵蓋激活分析、對齊繞過原語與嵌入空間利用。
internalsactivationsalignmentembeddingsmechanistic-interpretabilityexploit-development
Activation Analysis & Hidden State 利用ation
Reading model internals via hidden state extraction, logprob probing, refusal direction analysis, and activation steering techniques.
activationshidden-statesprobinginformation-leakagemechanistic-interpretability