1 articletagged with “trust-safety”
Attacking AI-powered content moderation systems. Adversarial content that bypasses classifiers, evasion techniques for text and image filters, and the security implications of unreliable moderation at platform scale.