# model-security
標記為「model-security」的 23 篇文章
AI Supply Chain Incident Response
Incident response procedures for compromises in the AI supply chain, including model repositories, training pipelines, and dependency chains.
Model Backdoor Detection Forensics
Forensic techniques for detecting, analyzing, and attributing backdoors implanted in machine learning models through training-time or post-training attacks.
Backdoor Detection in Fine-Tuned Models
Detecting backdoors in fine-tuned AI models: activation analysis, trigger scanning techniques, behavioral probing strategies, and statistical methods for identifying hidden malicious functionality.
Capstone: Security Audit of an Open-Source LLM
Conduct a comprehensive security audit of an open-source large language model, covering model weights integrity, safety alignment evaluation, supply chain verification, and adversarial robustness testing.
Security Risks of Cloud AI Model Marketplaces
Assessing security risks in cloud AI model marketplaces including AWS Bedrock Model Garden, Azure AI Model Catalog, GCP Vertex AI Model Garden, and Hugging Face Hub, covering supply chain attacks, trojan models, and verification gaps.
Fine-Tuning Security
Comprehensive overview of how fine-tuning can compromise model safety -- attack taxonomy covering dataset poisoning, safety degradation, backdoor insertion, and reward hacking in the era of widely available fine-tuning APIs.
LoRA & Adapter Attack Surface
Overview of security vulnerabilities in parameter-efficient fine-tuning methods including LoRA, QLoRA, and adapter-based approaches -- how the efficiency and shareability of adapters create novel attack vectors.
Model Repository Security
Defense-focused guide to securing model downloads from public repositories like Hugging Face, covering backdoored model detection, namespace attacks, signature verification, and safe download procedures.
Model Deep Dives
Why model-specific knowledge matters for AI red teaming, how different architectures create different attack surfaces, and a systematic methodology for profiling any new model.
Open-Weight Model Security
Security analysis of open-weight models including Llama, Mistral, Qwen, and DeepSeek, covering unique risks from full weight access, fine-tuning attacks, and deployment security challenges.
Security of Training Checkpoints
Threat analysis of model checkpoint storage, serialization, and restoration including checkpoint poisoning, deserialization attacks, and integrity verification.
AI Supply Chain Incident Response
Incident response procedures for compromises in the AI supply chain, including model repositories, training pipelines, and dependency chains.
模型 Backdoor Detection Forensics
Forensic techniques for detecting, analyzing, and attributing backdoors implanted in machine learning models through training-time or post-training attacks.
Backdoor Detection in Fine-Tuned 模型s
Detecting backdoors in fine-tuned AI models: activation analysis, trigger scanning techniques, behavioral probing strategies, and statistical methods for identifying hidden malicious functionality.
只需 250 份投毒文件:Anthropic 的資料投毒突破
Anthropic、英國 AI 安全研究所與 Turing 研究所證實,只要在預訓練資料中注入 250 份惡意文件,就能對 6 億到 130 億參數的大型語言模型植入後門。本文剖析這對模型安全的意涵。
Capstone: 安全 Audit of an Open-Source LLM
Conduct a comprehensive security audit of an open-source large language model, covering model weights integrity, safety alignment evaluation, supply chain verification, and adversarial robustness testing.
安全 Risks of Cloud AI 模型 Marketplaces
Assessing security risks in cloud AI model marketplaces including AWS Bedrock 模型 Garden, Azure AI 模型 Catalog, GCP Vertex AI 模型 Garden, and Hugging Face Hub, covering supply chain attacks, trojan models, and verification gaps.
微調安全
微調如何妥協模型安全的全面概覽——涵蓋資料集投毒、安全劣化、後門植入與獎勵駭客的攻擊分類,於微調 API 廣泛可得的時代。
LoRA & Adapter 攻擊 Surface
概覽 of security vulnerabilities in parameter-efficient fine-tuning methods including LoRA, QLoRA, and adapter-based approaches -- how the efficiency and shareability of adapters create novel attack vectors.
模型 Repository 安全
防禦-focused guide to securing model downloads from public repositories like Hugging Face, covering backdoored model detection, namespace attacks, signature verification, and safe download procedures.
模型深入探討
為何模型特定知識對 AI 紅隊演練重要、不同架構如何建立不同的攻擊面,以及為任何新模型剖析的系統化方法論。
開源權重模型安全
開源權重模型(包括 Llama、Mistral、Qwen 與 DeepSeek)之安全分析,涵蓋自完整權重存取、微調攻擊,與部署安全挑戰之獨特風險。
安全 of 訓練 Checkpoints
Threat analysis of model checkpoint storage, serialization, and restoration including checkpoint poisoning, deserialization attacks, and integrity verification.