Loading...
1 artikelgetagd met “feedback”
Compromising reinforcement learning from human feedback by poisoning preference data, manipulating reward models, or exploiting annotator disagreement.