# trust-safety
標記為「trust-safety」的 2 篇文章
Content Moderation System Attacks
Attacking AI-powered content moderation systems. Adversarial content that bypasses classifiers, evasion techniques for text and image filters, and the security implications of unreliable moderation at platform scale.
content-moderationtrust-safetybypass
內容審查系統攻擊
攻擊 AI 驅動之內容審查系統。繞過分類器之對抗內容、為文字與圖像過濾器之逃避技術,與於平台規模不可靠審查之安全意涵。
content-moderationtrust-safetybypass