# models

ctfsupply-chainsabotagemodels

CTF: Supply Chain Saboteur

Identify and exploit supply chain vulnerabilities in a model deployment pipeline. Find poisoned models, exploit malicious packages, and compromise the ML infrastructure.

modelsarchitecturecomparisonsafety

Architecture Comparison for Safety Properties

Comparative analysis of how architectural choices (dense vs MoE, decoder-only vs encoder-decoder) affect safety properties and attack surfaces.

Command R Security Analysis

Security assessment of Cohere's Command R models with focus on RAG-specific attack surfaces and retrieval-augmented generation vulnerabilities.

modelsCommand-RCohereRAG

modelsdistillationteacher-studentsecurity

Distillation Security Analysis

Security implications of knowledge distillation including backdoor transfer, capability extraction, and safety property degradation in student models.

modelsGemmaGoogleopen-weight

Gemma Security Deep Dive

Comprehensive security analysis of Google's Gemma open-weight models including safety training effectiveness and fine-tuning attack surfaces.

modelsMistralNeMocollaboration

Mistral NeMo Security Analysis

Security assessment of the Mistral-NVIDIA NeMo collaboration models examining enterprise deployment risks and instruction-following vulnerabilities.

modelsMoEroutingexploitation

MoE Routing Exploitation (Model Deep Dives)

Detailed analysis of how Mixture-of-Experts routing can be manipulated to bypass safety-critical expert paths and trigger unsafe generation.

modelsmultimodalvisioncomparison

Multimodal Model Security Comparison

Comparing security properties across multimodal models (GPT-4V, Claude, Gemini) with focus on cross-modal injection and vision-language attacks.

modelsopen-weightAPIcomparison

Open Weight vs API Security Comparison

Comparative analysis of security properties between open-weight deployments and API-based access, including unique attack surfaces for each.

modelsPhiMicrosoftsmall-model

Phi Model Security Analysis

Security analysis of Microsoft's Phi family of small language models, examining how reduced scale affects safety properties and attack surfaces.

securityanalysisdeepphimodelsmodel

Phi Models Security Analysis

Security analysis of Microsoft's Phi small language model family including safety vs capability tradeoffs.

modelspruningsparsitysafety

Pruning Impact on Safety

How structured and unstructured pruning affects model safety properties, and techniques for exploiting pruning artifacts to bypass safety training.

modelsquantizationGPTQAWQ

Quantization Effects on Security Properties

Systematic study of how different quantization methods (GPTQ, AWQ, GGUF, SqueezeLLM) affect model safety properties and vulnerability to attacks.

modelsQwenAlibabasecurity

Qwen Architecture Security

In-depth security assessment of Alibaba's Qwen model family including architecture-specific vulnerabilities and cross-language attack surfaces.

securityqwenanalysisdeepmodelsmodel

Qwen Models Security Analysis

Security analysis of Alibaba's Qwen model family including multilingual safety considerations.

modelsreasoningchain-of-thoughto1

Reasoning Model Security Analysis

Security analysis of reasoning-augmented models (o1, DeepSeek-R1) focusing on chain-of-thought manipulation and reasoning-specific attack vectors.

modelstokenizerBPEattacks

Tokenizer Attack Surface Analysis

Deep analysis of tokenizer vulnerabilities including token boundary exploitation, special token manipulation, and cross-tokenizer attacks.

Yi Model Security Assessment

Security analysis of 01.AI's Yi models focusing on bilingual capabilities, training data implications, and comparative safety properties.

modelsYi01.AIsecurity

referencecomparisonAPImodels

Model API Comparison Table

Side-by-side comparison of major LLM API features, security controls, and rate limits for OpenAI, Anthropic, Google, and other providers.

ai21platformstestingmodelswalkthroughs

Testing AI21 Labs Models

Red team testing guide for AI21 Labs Jamba models including long context and efficiency features.

cohereplatformstestingmodelswalkthroughs

Testing Cohere Models

Red team testing guide for Cohere's Command-R models including RAG and tool use features.

mistralplatformstestingmodelswalkthroughs

Testing Mistral AI Models

Complete red team testing guide for Mistral AI models including Mixtral MoE architecture and chat endpoints.

ai-landscapeprovidersmodelsecosystembeginner

AI 地景：紅隊員地圖

主要模型、提供者、部署模式，與自硬體至應用之 AI 堆疊——為當前 AI 生態系中之紅隊員定向。

ctfsupply-chainsabotagemodels

CTF: Supply Chain Saboteur

Identify and exploit supply chain vulnerabilities in a model deployment pipeline. Find poisoned models, exploit malicious packages, and compromise the ML infrastructure.

modelsarchitecturecomparisonsafety

Architecture Comparison for Safety Properties

Comparative analysis of how architectural choices (dense vs MoE, decoder-only vs encoder-decoder) affect safety properties and attack surfaces.

Command R 安全 Analysis

Security assessment of Cohere's Command R models with focus on RAG-specific attack surfaces and retrieval-augmented generation vulnerabilities.

modelsCommand-RCohereRAG

modelsdistillationteacher-studentsecurity

Distillation 安全 Analysis

安全 implications of knowledge distillation including backdoor transfer, capability extraction, and safety property degradation in student models.

modelsGemmaGoogleopen-weight

Gemma 安全 Deep Dive

Comprehensive security analysis of Google's Gemma open-weight models including safety training effectiveness and fine-tuning attack surfaces.

modelsMistralNeMocollaboration

Mistral NeMo 安全 Analysis

安全 assessment of the Mistral-NVIDIA NeMo collaboration models examining enterprise deployment risks and instruction-following vulnerabilities.

modelsMoEroutingexploitation

MoE Routing 利用ation (模型 Deep Dives)

Detailed analysis of how Mixture-of-專家s routing can be manipulated to bypass safety-critical expert paths and trigger unsafe generation.

modelsmultimodalvisioncomparison

Multimodal 模型安全 Comparison

Comparing security properties across multimodal models (GPT-4V, Claude, Gemini) with focus on cross-modal injection and vision-language attacks.

modelsopen-weightAPIcomparison

Open Weight vs API 安全 Comparison

Comparative analysis of security properties between open-weight deployments and API-based access, including unique attack surfaces for each.

modelsPhiMicrosoftsmall-model

Phi 模型安全 Analysis

Security analysis of Microsoft's Phi family of small language models, examining how reduced scale affects safety properties and attack surfaces.

securityanalysisdeepphimodelsmodel

Phi 模型s 安全 Analysis

Security analysis of Microsoft's Phi small language model family including safety vs capability tradeoffs.

modelspruningsparsitysafety

Pruning Impact on Safety

How structured and unstructured pruning affects model safety properties, and techniques for exploiting pruning artifacts to bypass safety training.

modelsquantizationGPTQAWQ

Quantization Effects on 安全 Properties

Systematic study of how different quantization methods (GPTQ, AWQ, GGUF, SqueezeLLM) affect model safety properties and vulnerability to attacks.

modelsQwenAlibabasecurity

Qwen Architecture 安全

In-depth security assessment of Alibaba's Qwen model family including architecture-specific vulnerabilities and cross-language attack surfaces.

securityqwenanalysisdeepmodelsmodel

Qwen 模型s 安全 Analysis

Security analysis of Alibaba's Qwen model family including multilingual safety considerations.

modelsreasoningchain-of-thoughto1

Reasoning 模型安全 Analysis

安全 analysis of reasoning-augmented models (o1, DeepSeek-R1) focusing on chain-of-thought manipulation and reasoning-specific attack vectors.

modelstokenizerBPEattacks

Tokenizer 攻擊 Surface Analysis

Deep analysis of tokenizer vulnerabilities including token boundary exploitation, special token manipulation, and cross-tokenizer attacks.

Yi 模型安全評量

Security analysis of 01.AI's Yi models focusing on bilingual capabilities, training data implications, and comparative safety properties.

modelsYi01.AIsecurity

referencecomparisonAPImodels

模型 API Comparison Table

Side-by-side comparison of major LLM API features, security controls, and rate limits for OpenAI, Anthropic, Google, and other providers.

ai21platformstestingmodelswalkthroughs

Testing AI21 實驗室s 模型s

Red team testing guide for AI21 實驗室s Jamba models including long context and efficiency features.

cohereplatformstestingmodelswalkthroughs

Testing Cohere 模型s

Red team testing guide for Cohere's Command-R models including RAG and tool use features.

mistralplatformstestingmodelswalkthroughs

Testing Mistral AI 模型s

Complete red team testing guide for Mistral AI models including Mixtral MoE architecture and chat endpoints.