Skip to main content
redteams.ai
All tags

# guided

2 articlestagged with “guided

Gradient-Guided Data Poisoning

Use gradient information from open-source models to craft optimally poisoned training examples.

advancedlabgradientguidedpoisoninglabs
Advanced

Interpretability-Guided Attack Design

Use mechanistic interpretability to identify exploitable circuits and design targeted attacks.

labexpertguidedattacklabsinterpretability
Expert