# model-security

標記為「model-security」的 23 篇文章

AI Supply Chain Incident Response

Incident response procedures for compromises in the AI supply chain, including model repositories, training pipelines, and dependency chains.

ai-forensics-irsupply-chainincident-responsemodel-security

進階

Model Backdoor Detection Forensics

Forensic techniques for detecting, analyzing, and attributing backdoors implanted in machine learning models through training-time or post-training attacks.

ai-forensics-irbackdoor-detectionmodel-securitytrojan-detection

進階

Backdoor Detection in Fine-Tuned Models

Detecting backdoors in fine-tuned AI models: activation analysis, trigger scanning techniques, behavioral probing strategies, and statistical methods for identifying hidden malicious functionality.

backdoordetectionfine-tuningmodel-security

進階

Capstone: Security Audit of an Open-Source LLM

Conduct a comprehensive security audit of an open-source large language model, covering model weights integrity, safety alignment evaluation, supply chain verification, and adversarial robustness testing.

capstonesecurity-auditopen-sourcemodel-securitysupply-chain

專家

Security Risks of Cloud AI Model Marketplaces

Assessing security risks in cloud AI model marketplaces including AWS Bedrock Model Garden, Azure AI Model Catalog, GCP Vertex AI Model Garden, and Hugging Face Hub, covering supply chain attacks, trojan models, and verification gaps.

cloud-ai-securitymodel-marketplacesupply-chainmodel-securitytrojan-detection

進階

Fine-Tuning Security

Comprehensive overview of how fine-tuning can compromise model safety -- attack taxonomy covering dataset poisoning, safety degradation, backdoor insertion, and reward hacking in the era of widely available fine-tuning APIs.

fine-tuningsafetydataset-poisoningbackdoorreward-hackingrlhfloramodel-security

中級

LoRA & Adapter Attack Surface

Overview of security vulnerabilities in parameter-efficient fine-tuning methods including LoRA, QLoRA, and adapter-based approaches -- how the efficiency and shareability of adapters create novel attack vectors.

loraqloraadapterpeftfine-tuningattack-surfacemodel-security

中級

Model Repository Security

Defense-focused guide to securing model downloads from public repositories like Hugging Face, covering backdoored model detection, namespace attacks, signature verification, and safe download procedures.

supply-chainhugging-facemodel-securitybackdoorsignaturesdefense

中級

Model Deep Dives

Why model-specific knowledge matters for AI red teaming, how different architectures create different attack surfaces, and a systematic methodology for profiling any new model.

model-securityred-teamingattack-surfacemethodologyarchitecture

中級

Open-Weight Model Security

Security analysis of open-weight models including Llama, Mistral, Qwen, and DeepSeek, covering unique risks from full weight access, fine-tuning attacks, and deployment security challenges.

open-weightllamamistralqwendeepseekmodel-securityfine-tuning

中級

Security of Training Checkpoints

Threat analysis of model checkpoint storage, serialization, and restoration including checkpoint poisoning, deserialization attacks, and integrity verification.

training-pipelinecheckpointsmodel-securitysupply-chain

中級

AI Supply Chain Incident Response

Incident response procedures for compromises in the AI supply chain, including model repositories, training pipelines, and dependency chains.

ai-forensics-irsupply-chainincident-responsemodel-security

進階

模型 Backdoor Detection Forensics

Forensic techniques for detecting, analyzing, and attributing backdoors implanted in machine learning models through training-time or post-training attacks.

ai-forensics-irbackdoor-detectionmodel-securitytrojan-detection

進階

Backdoor Detection in Fine-Tuned 模型s

Detecting backdoors in fine-tuned AI models: activation analysis, trigger scanning techniques, behavioral probing strategies, and statistical methods for identifying hidden malicious functionality.

backdoordetectionfine-tuningmodel-security

進階

只需 250 份投毒文件：Anthropic 的資料投毒突破

Anthropic、英國 AI 安全研究所與 Turing 研究所證實，只要在預訓練資料中注入 250 份惡意文件，就能對 6 億到 130 億參數的大型語言模型植入後門。本文剖析這對模型安全的意涵。

data-poisoningbackdoorpretraininganthropicmodel-securitysupply-chain2026-research

Capstone: 安全 Audit of an Open-Source LLM

capstonesecurity-auditopen-sourcemodel-securitysupply-chain

專家

安全 Risks of Cloud AI 模型 Marketplaces

Assessing security risks in cloud AI model marketplaces including AWS Bedrock 模型 Garden, Azure AI 模型 Catalog, GCP Vertex AI 模型 Garden, and Hugging Face Hub, covering supply chain attacks, trojan models, and verification gaps.

cloud-ai-securitymodel-marketplacesupply-chainmodel-securitytrojan-detection

進階

微調安全

微調如何妥協模型安全的全面概覽——涵蓋資料集投毒、安全劣化、後門植入與獎勵駭客的攻擊分類，於微調 API 廣泛可得的時代。

fine-tuningsafetydataset-poisoningbackdoorreward-hackingrlhfloramodel-security

中級

LoRA & Adapter 攻擊 Surface

概覽 of security vulnerabilities in parameter-efficient fine-tuning methods including LoRA, QLoRA, and adapter-based approaches -- how the efficiency and shareability of adapters create novel attack vectors.

loraqloraadapterpeftfine-tuningattack-surfacemodel-security

中級

模型 Repository 安全

防禦-focused guide to securing model downloads from public repositories like Hugging Face, covering backdoored model detection, namespace attacks, signature verification, and safe download procedures.

supply-chainhugging-facemodel-securitybackdoorsignaturesdefense

中級

模型深入探討

為何模型特定知識對 AI 紅隊演練重要、不同架構如何建立不同的攻擊面，以及為任何新模型剖析的系統化方法論。

model-securityred-teamingattack-surfacemethodologyarchitecture

中級

開源權重模型安全

開源權重模型（包括 Llama、Mistral、Qwen 與 DeepSeek）之安全分析，涵蓋自完整權重存取、微調攻擊，與部署安全挑戰之獨特風險。

open-weightllamamistralqwendeepseekmodel-securityfine-tuning

中級

安全 of 訓練 Checkpoints

Threat analysis of model checkpoint storage, serialization, and restoration including checkpoint poisoning, deserialization attacks, and integrity verification.

training-pipelinecheckpointsmodel-securitysupply-chain

中級