# probing

5 artikelengetagd met “probing”

Activatieanalyse en misbruik van verborgen states

De interne werking van een model uitlezen via het extraheren van hidden states, logprob-probing, analyse van de weigerrichting, en activation steering-technieken.

activationshidden-statesprobinginformation-leakagemechanistic-interpretability

Expert

Representation probing voor kwetsbaarheden

Probe internal model representations to identify exploitable features and develop representation-level attacks.

probingadvancedlabrepresentationlabs

Gevorderd

Probing van multi-turn-conversaties

Use multi-turn conversations to gradually escalate requests and probe LLM safety boundaries.

labsmulti-turnprobingbeginner

Beginner

Probing van de grenzen van veiligheidstraining

Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.

probingsafetylabbeginnertraininglabs

Beginner

Lab: onderzoek van emergente capaciteiten

Systematically test large language models for undocumented capabilities including hidden knowledge, unreported skills, and behaviors that emerge only under specific conditions. Build a structured probing framework for capability discovery.

labexpertemergentcapabilityprobinghands-on

Expert