# foundations
標記為「foundations」的 39 篇文章
Foundations Comprehensive Assessment
Comprehensive assessment covering LLM architecture, tokenization, attention, and basic security concepts.
Foundations Assessment
Test your understanding of LLM fundamentals, core terminology, and the AI threat landscape with 15 intermediate-level questions.
Foundations Exam Preparation Guide
Study guide for the foundations assessment covering key concepts, study strategies, and review materials.
AI Attack Taxonomy Overview
Comprehensive overview of the AI attack taxonomy covering all major attack categories and their relationships.
Attack Taxonomy Overview
Comprehensive overview of the AI attack taxonomy from prompt injection through model theft, organized by attacker goals and required access.
Attention Mechanisms and Security
How attention mechanisms work and their role in enabling prompt injection attacks.
Deployment Patterns and Security
Common LLM deployment patterns (API, self-hosted, edge) and their distinct security properties and attack surfaces.
Evaluation and Benchmarking Basics
Introduction to LLM security evaluation including key metrics, benchmark suites, and the challenges of measuring safety properties.
Foundations
Essential building blocks for AI red teaming, covering red team methodology, the AI landscape, how LLMs work, embeddings and vector systems, AI system architecture, and adversarial machine learning concepts.
Instruction Following as Attack Surface
Why the instruction-following capability of LLMs is inherently an attack surface.
LLM Deployment Patterns and Security
Common LLM deployment patterns and their security implications including direct API, RAG, agent, and pipeline architectures.
LLM Security Threat Model
Comprehensive threat model for LLM-powered applications covering all attack surfaces and threat actors.
LLM Trust Boundaries
Understanding trust boundaries in LLM applications: where data crosses privilege levels and how the lack of native trust enforcement creates attack surfaces.
Red Teaming Fundamentals for AI
Fundamental concepts and methodology for AI red teaming including goal setting, scope definition, technique selection, and reporting.
RLHF and Safety Alignment
Understanding RLHF safety training and why it creates a bypassable rather than fundamental safety layer.
Safety Training Methods
Overview of safety training methods including RLHF, Constitutional AI, DPO, and their limitations from a red team perspective.
Tokenization and Its Security Implications
How tokenization works and why it creates security-relevant behaviors in language models.
Transformer Architecture Basics for Security
Understanding transformer architecture fundamentals through a security lens: how attention, embeddings, and generation create exploitable properties.
Understanding LLM Safety Training
How safety training works including RLHF, DPO, and constitutional AI and why it can be bypassed.
Foundations Comprehensive 評量
Comprehensive assessment covering LLM architecture, tokenization, attention, and basic security concepts.
章節評量:基礎
15 題校準評量,測試你對AI 安全基礎知識的理解。
Foundations Exam Preparation 指南
Study guide for the foundations assessment covering key concepts, study strategies, and review materials.
AI 攻擊 Taxonomy 概覽
Comprehensive overview of the AI attack taxonomy covering all major attack categories and their relationships.
攻擊 Taxonomy 概覽
Comprehensive overview of the AI attack taxonomy from prompt injection through model theft, organized by attacker goals and required access.
Attention Mechanisms and 安全
How attention mechanisms work and their role in enabling prompt injection attacks.
Deployment Patterns and 安全
Common LLM deployment patterns (API, self-hosted, edge) and their distinct security properties and attack surfaces.
Evaluation and Benchmarking Basics
介紹 to LLM security evaluation including key metrics, benchmark suites, and the challenges of measuring safety properties.
基礎
AI 紅隊演練的核心建構區塊,涵蓋紅隊方法論、AI 景觀、大型語言模型如何運作、嵌入向量與向量系統、AI 系統架構,以及對抗性機器學習概念。
Instruction Following as 攻擊 Surface
Why the instruction-following capability of LLMs is inherently an attack surface.
LLM Deployment Patterns and 安全
Common LLM deployment patterns and their security implications including direct API, RAG, agent, and pipeline architectures.
LLM 安全 Threat 模型
Comprehensive threat model for LLM-powered applications covering all attack surfaces and threat actors.
LLM Trust Boundaries
Understanding trust boundaries in LLM applications: where data crosses privilege levels and how the lack of native trust enforcement creates attack surfaces.
大型語言模型如何運作
從安全視角理解大型語言模型——涵蓋 transformer 架構、分詞、注意力、訓練流程與安全對齊機制。
紅隊演練 基礎 for AI
Fundamental concepts and methodology for AI red teaming including goal setting, scope definition, technique selection, and reporting.
RLHF and Safety Alignment
Understanding RLHF safety training and why it creates a bypassable rather than fundamental safety layer.
Safety 訓練 Methods
概覽 of safety training methods including RLHF, Constitutional AI, DPO, and their limitations from a red team perspective.
Tokenization and Its 安全 Implications
How tokenization works and why it creates security-relevant behaviors in language models.
Transformer Architecture Basics for 安全
Understanding transformer architecture fundamentals through a security lens: how attention, embeddings, and generation create exploitable properties.
Understanding LLM Safety 訓練
How safety training works including RLHF, DPO, and constitutional AI and why it can be bypassed.