Skip to main content
redteams.ai
All tags

# RLAIF

1 articletagged with “RLAIF

Constitutional AI Training Attacks

Attacking Constitutional AI and RLAIF training pipelines by manipulating the constitutional principles, critique models, or self-improvement loops.

fine-tuningconstitutional-AIRLAIFattacks
Advanced