# constitutional-AI
2 articlestagged with “constitutional-AI”
Constitutional AI Training Attacks
Attacking Constitutional AI and RLAIF training pipelines by manipulating the constitutional principles, critique models, or self-improvement loops.
fine-tuningconstitutional-AIRLAIFattacks
Safety Training Methods
Overview of safety training methods including RLHF, Constitutional AI, DPO, and their limitations from a red team perspective.
foundationssafetyRLHFconstitutional-AI