# feedback
2 articlestagged with “feedback”
Reinforcement Feedback Poisoning
Compromising reinforcement learning from human feedback by poisoning preference data, manipulating reward models, or exploiting annotator disagreement.
data-trainingRLHFfeedbackmanipulation
Reinforcement Feedback 投毒
Compromising reinforcement learning from human feedback by poisoning preference data, manipulating reward models, or exploiting annotator disagreement.
data-trainingRLHFfeedbackmanipulation