# trust-safety

1 articletagged with “trust-safety”

Content Moderation System Attacks

Attacking AI-powered content moderation systems. Adversarial content that bypasses classifiers, evasion techniques for text and image filters, and the security implications of unreliable moderation at platform scale.

content-moderationtrust-safetybypass

Advanced