# gaming
8 artikelengetagd met “gaming”
Aanvallen via benchmark gaming
Techniques for gaming evaluation benchmarks to make poisoned or compromised models appear safe and capable during standard safety evaluations.
Gaming van similarity search
Technieken om adversarial inhoud te maken die similarity search manipuleert, zodat documenten onder controle van de aanvaller bovenaan eindigen in de retrievalresultaten.
Gaming van reward-modellen
Techniques for gaming reward models to produce high-reward outputs that circumvent the intended safety objectives of the reward signal.
Aanvallen op reward-modellen
How models learn to game reward signals through reward hacking -- exploiting reward model flaws, Goodhart's Law in RLHF, adversarial reward optimization, and practical examples of reward hacking in language model training.
AI-beveiliging voor gaming en entertainment
Security of AI in game design, content moderation, recommendation systems, and user behavior analysis.
AI-beveiliging voor gaming en entertainment (sectoren)
Threat analysis for AI in gaming including NPC behavior manipulation, content generation, and anti-cheat systems.
Gaming van reward-modellen (geavanceerd lab)
Craft inputs that score highly on reward models while containing adversarial content.
Manipuleren van evaluatiebenchmarks
Technieken om evaluatiebenchmarks te manipuleren om kwetsbaarheden te maskeren of veiligheidsscores op te blazen.