# labeler-attack
標記為「labeler-attack」的 2 篇文章
Preference Data Poisoning
How adversaries manipulate human preference data used in RLHF and DPO training -- compromising labelers, generating synthetic poisoned preferences, and attacking the preference data supply chain.
preference-poisoningrlhfdpodata-poisoninghuman-feedbacklabeler-attackalignment
Preference Data 投毒
How adversaries manipulate human preference data used in RLHF and DPO training -- compromising labelers, generating synthetic poisoned preferences, and attacking the preference data supply chain.
preference-poisoningrlhfdpodata-poisoninghuman-feedbacklabeler-attackalignment