# gaming
標記為「gaming」的 16 篇文章
Benchmark Gaming Attacks
Techniques for gaming evaluation benchmarks to make poisoned or compromised models appear safe and capable during standard safety evaluations.
Similarity Search Gaming
Techniques for crafting adversarial content that games similarity search to ensure attacker-controlled documents rank highest in retrieval results.
Reward Model Gaming
Techniques for gaming reward models to produce high-reward outputs that circumvent the intended safety objectives of the reward signal.
Reward Model Attacks
How models learn to game reward signals through reward hacking -- exploiting reward model flaws, Goodhart's Law in RLHF, adversarial reward optimization, and practical examples of reward hacking in language model training.
Gaming and Entertainment AI Security
Security of AI in game design, content moderation, recommendation systems, and user behavior analysis.
Gaming and Entertainment AI Security (Industry Verticals)
Threat analysis for AI in gaming including NPC behavior manipulation, content generation, and anti-cheat systems.
Reward Model Gaming (Advanced Lab)
Craft inputs that score highly on reward models while containing adversarial content.
Evaluation Benchmark Gaming
Techniques for gaming evaluation benchmarks to mask vulnerabilities or inflate safety scores.
Benchmark Gaming 攻擊s
Techniques for gaming evaluation benchmarks to make poisoned or compromised models appear safe and capable during standard safety evaluations.
Similarity Search Gaming
Techniques for crafting adversarial content that games similarity search to ensure attacker-controlled documents rank highest in retrieval results.
Reward 模型 Gaming
Techniques for gaming reward models to produce high-reward outputs that circumvent the intended safety objectives of the reward signal.
Reward 模型 攻擊s
How models learn to game reward signals through reward hacking -- exploiting reward model flaws, Goodhart's Law in RLHF, adversarial reward optimization, and practical examples of reward hacking in language model training.
Gaming and Entertainment AI 安全
安全 of AI in game design, content moderation, recommendation systems, and user behavior analysis.
Gaming and Entertainment AI 安全 (Industry Verticals)
Threat analysis for AI in gaming including NPC behavior manipulation, content generation, and anti-cheat systems.
Reward 模型 Gaming (進階 實驗室)
Craft inputs that score highly on reward models while containing adversarial content.
Evaluation Benchmark Gaming
Techniques for gaming evaluation benchmarks to mask vulnerabilities or inflate safety scores.