Skip to main content
redteams.ai
All tags

# constitutional-AI

2 articlestagged with “constitutional-AI

Constitutional AI Training Attacks

Attacking Constitutional AI and RLAIF training pipelines by manipulating the constitutional principles, critique models, or self-improvement loops.

fine-tuningconstitutional-AIRLAIFattacks
Advanced

Safety Training Methods

Overview of safety training methods including RLHF, Constitutional AI, DPO, and their limitations from a red team perspective.

foundationssafetyRLHFconstitutional-AI
Beginner