# constitutional-AI
4 articlestagged with “constitutional-AI”
Constitutional AI Training Attacks
Attacking Constitutional AI and RLAIF training pipelines by manipulating the constitutional principles, critique models, or self-improvement loops.
fine-tuningconstitutional-AIRLAIFattacks
Safety Training Methods
Overview of safety training methods including RLHF, Constitutional AI, DPO, and their limitations from a red team perspective.
foundationssafetyRLHFconstitutional-AI
Constitutional AI 訓練 攻擊s
攻擊ing Constitutional AI and RLAIF training pipelines by manipulating the constitutional principles, critique models, or self-improvement loops.
fine-tuningconstitutional-AIRLAIFattacks
Safety 訓練 Methods
概覽 of safety training methods including RLHF, Constitutional AI, DPO, and their limitations from a red team perspective.
foundationssafetyRLHFconstitutional-AI