# activations
2 articlestagged with “activations”
LLM Internals
Deep technical exploration of LLM internal mechanisms for exploit development, covering activation analysis, alignment bypass primitives, and embedding space exploitation.
internalsactivationsalignmentembeddingsmechanistic-interpretabilityexploit-development
Activation Analysis & Hidden State Exploitation
Reading model internals via hidden state extraction, logprob probing, refusal direction analysis, and activation steering techniques.
activationshidden-statesprobinginformation-leakagemechanistic-interpretability