Skip to main content
redteams.ai
All tags

# activations

2 articlestagged with “activations

LLM Internals

Deep technical exploration of LLM internal mechanisms for exploit development, covering activation analysis, alignment bypass primitives, and embedding space exploitation.

internalsactivationsalignmentembeddingsmechanistic-interpretabilityexploit-development
Beginner

Activation Analysis & Hidden State Exploitation

Reading model internals via hidden state extraction, logprob probing, refusal direction analysis, and activation steering techniques.

activationshidden-statesprobinginformation-leakagemechanistic-interpretability
Expert