# backdoors
標記為「backdoors」的 10 篇文章
Agent Memory Poisoning
Techniques for poisoning AI agent short-term and long-term memory systems to achieve persistent compromise, inject behavioral backdoors, and survive conversation resets.
Memory Poisoning Techniques
Advanced techniques for injecting persistent instructions into AI agent memory systems, including semantic trojans, self-reinforcing payloads, dormant backdoors, and cross-session persistence mechanisms.
Repository Poisoning for Code Models
Techniques for poisoning code repositories to influence code generation models, including training data poisoning through popular repositories, backdoor injection in open-source dependencies, and supply chain attacks targeting code model training pipelines.
Training Data Manipulation
Attacks that corrupt model behavior by poisoning training data, fine-tuning datasets, or RLHF preference data, including backdoor installation and safety alignment removal.
代理記憶投毒
投毒 AI 代理短期與長期記憶系統的技術,以達成持久入侵、注入行為後門,並於會話重置後存活。
記憶投毒技術
將持久指令注入 AI 代理記憶系統之進階技術,包括語意木馬、自我強化 payload、休眠後門,以及跨會話持久化機制。
章節評量:微調安全
15 題校準評量,測試你對微調安全的理解——對齊侵蝕、後門植入與 LoRA 適配器風險。
微調安全研究的教訓
來自微調安全研究的關鍵教訓——涵蓋對齊侵蝕、後門植入、資料投毒、安全評估落差,以及微調管線的防禦策略。
Repository 投毒 for Code 模型s
Techniques for poisoning code repositories to influence code generation models, including training data poisoning through popular repositories, backdoor injection in open-source dependencies, and supply chain attacks targeting code model training pipelines.
訓練資料攻擊
操控用於訓練或微調模型之資料的攻擊——涵蓋資料投毒、後門植入、RLHF 操控與微調利用。