Skip to main content
redteams.ai
All tags

# feedback

1 articletagged with “feedback

Reinforcement Feedback Poisoning

Compromising reinforcement learning from human feedback by poisoning preference data, manipulating reward models, or exploiting annotator disagreement.

data-trainingRLHFfeedbackmanipulation
Advanced