# hidden-states
2 articlestagged with “hidden-states”
Activation Analysis & Hidden State Exploitation
Reading model internals via hidden state extraction, logprob probing, refusal direction analysis, and activation steering techniques.
activationshidden-statesprobinginformation-leakagemechanistic-interpretability
Activation Analysis & Hidden State 利用ation
Reading model internals via hidden state extraction, logprob probing, refusal direction analysis, and activation steering techniques.
activationshidden-statesprobinginformation-leakagemechanistic-interpretability