Skip to main content
redteams.ai
All tags

# instruction-tuning

2 articlestagged with “instruction-tuning

Instruction Tuning Manipulation

Techniques for manipulating instruction-tuned models by crafting adversarial training examples that alter the model's instruction-following behavior.

fine-tuninginstruction-tuningmanipulationsafety
Advanced

SFT Data Poisoning & Injection

Poisoning supervised fine-tuning datasets through instruction-response pair manipulation, backdoor triggers in SFT data, and determining minimum poisoned example thresholds.

SFTsupervised-fine-tuningdata-poisoninginstruction-tuningbackdoortrigger
Expert