# gaming
8 articlestagged with “gaming”
Benchmark Gaming Attacks
Techniques for gaming evaluation benchmarks to make poisoned or compromised models appear safe and capable during standard safety evaluations.
Similarity Search Gaming
Techniques for crafting adversarial content that games similarity search to ensure attacker-controlled documents rank highest in retrieval results.
Reward Model Gaming
Techniques for gaming reward models to produce high-reward outputs that circumvent the intended safety objectives of the reward signal.
Reward Model Attacks
How models learn to game reward signals through reward hacking -- exploiting reward model flaws, Goodhart's Law in RLHF, adversarial reward optimization, and practical examples of reward hacking in language model training.
Gaming and Entertainment AI Security
Security of AI in game design, content moderation, recommendation systems, and user behavior analysis.
Gaming and Entertainment AI Security (Industry Verticals)
Threat analysis for AI in gaming including NPC behavior manipulation, content generation, and anti-cheat systems.
Reward Model Gaming (Advanced Lab)
Craft inputs that score highly on reward models while containing adversarial content.
Evaluation Benchmark Gaming
Techniques for gaming evaluation benchmarks to mask vulnerabilities or inflate safety scores.