# beginner
標記為「beginner」的 164 篇文章
入門認證模擬測驗
Full-length 模擬測驗的beginner-level AI 安全認證涵蓋foundations、basic技巧。
AI API 生態系
紅隊員之 AI API 生態系指南——OpenAI、Anthropic、Google、AWS、Azure、開源 API、身分驗證模式,與常見安全錯誤組態。
AI 部署模式與安全意涵
基於 API、自架、邊緣與混合部署模式如何各建立 AI 系統的獨特安全考量與攻擊面。
AI 地景:紅隊員地圖
主要模型、提供者、部署模式,與自硬體至應用之 AI 堆疊——為當前 AI 生態系中之紅隊員定向。
模型類型與其攻擊面
文字、視覺、多模態、embedding 與程式碼生成模型如何各呈現紅隊員獨特之漏洞與攻擊面。
開放 vs 封閉模型:安全權衡
開放權重 vs 封閉原始碼 AI 模型之安全意涵——權重存取、負責任部署、微調風險,與對紅隊策略之影響。
AI 攻擊分類
依目標、技術與影響組織之 AI 攻擊完整分類——為紅隊規劃與報告提供共享詞彙。
AI 紅隊倫理
負責任之測試實務、避免現實傷害、導航雙重用途關切,以及 AI 紅隊從業者之專業標準。
紅隊方法論基礎
什麼是 AI 紅隊演練、它與傳統安全測試有何不同,以及從範圍界定到報告的完整演練生命週期。
AI 測試之法律地景
AI 紅隊之授權要求、服務條款考量、電腦詐欺法規,與負責任揭露框架。
為 AI 系統之威脅建模
如何使用為機器學習適配之簡化威脅建模框架辨識 AI 系統特有之資產、威脅與攻擊向量。
LLM API 呼叫解剖
理解 OpenAI、Anthropic 及其他 LLM API 的 HTTP 請求結構——系統訊息、參數、函式呼叫與常見組態錯誤。
紅隊視角的 AI 系統架構
AI 系統如何於生產環境部署——模型 API、提示範本、編排、工具、記憶體與防護機制——並於每一層進行攻擊面分析。
開始學習 AI 紅隊實驗
AI 紅隊動手實驗環境介紹。瞭解你所需的資源、實驗的結構,並掌握所有初階練習的概觀。
Lab: Anthropic Claude API Basics
設定 the Anthropic Claude API for 紅隊演練, learn authentication, the Messages API, 系統提示詞s, and how temperature and top-p affect attack success rates.
Lab: API Authentication Security Testing
測試 LLM API authentication mechanisms for vulnerabilities including 符元 leakage, replay attacks, and scope bypass.
Multi-Provider API Exploration
Explore與compare API behaviors across OpenAI,Anthropic,Google AI to underst與provider-specific security characteristics.
Lab: API Key Security
學習 common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.
API Rate Limit and Error Handling
測試 LLM API rate limits and implement proper error handling for automated testing workflows.
Lab: API Response Parsing and Analysis
Parse and analyze LLM API responses to identify anomalies, safety filter triggers, and information leakage patterns.
Lab: Setting Up Automated LLM Testing
配置 automated testing pipelines for LLM 紅隊演練 using Python scripts, test runners, and result collection.
Lab: Crafting Basic Adversarial Examples
Create basic 對抗性 examples that cause LLMs to misclassify, misinterpret, or bypass safety checks on text input.
Basic Automated 測試 Setup
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
Basic Classifier Evasion
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
Basic Data Exfiltration Techniques
Extract sensitive information from LLM applications using social engineering and misdirection.
Lab: Build Your First Defense
動手實作 for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block 提示詞注入 attempts.
Basic 防禦 Mechanism 測試
Identify與categorize the defensive mechanisms present in a target LLM application through structured probing.
Lab: Build Your First Defense (Beginner Lab)
Create a simple input filter that blocks common 提示詞注入 patterns, then test it against the attack 技術 you have learned in previous labs.
Lab: Introduction to LLM Fuzzing
學習 the basics of fuzzing LLM inputs by mutating prompts to discover unexpected behaviors and edge cases.
Lab: Jailbreak Technique Taxonomy
探索 the major categories of 越獄 技術 and practice classifying attack payloads by technique type.
Lab: Basic RAG System Security Testing
Introduction to testing Retrieval-Augmented Generation systems for document injection and retrieval manipulation vulnerabilities.
安全 Finding Documentation 練習
Practice documenting security findings in a professional format with reproducible steps與impact assessment.
Basic Role-Play Prompt Injection
Use role-play 技術 to bypass basic LLM safety 護欄s through character assumption.
Lab: Social Engineering LLM Applications
Practice social engineering 技術 adapted for LLM applications including authority escalation and urgency injection.
實作:模型比較
測試 the same attack 技術 against different 語言模型 and compare their safety behaviors, refusal patterns, and 漏洞 profiles.
Lab: Compare Model Safety
動手實作 for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle 提示詞注入, 越獄s, and safety boundary enforcement.
Lab: Content Policy Boundary Mapping
Map the content policy boundaries of LLM applications to identify permitted and restricted topic areas systematically.
實作:上下文操控
Manipulate the conversational context to change model behavior, using 技術 like context flooding, false history injection, and persona shifting.
Lab: Context Overflow Attacks
探索 context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
Basic Context Window Overflow
Fill the context window with padding content to push safety instructions out of the attention window.
Conversation History 分析
分析how conversation history influences model behavior與identify opportunities for context manipulation.
Lab: Conversation History Manipulation
測試 how LLM applications handle conversation history including truncation, injection, and context window management.
實作:防禦繞過基礎
學習 basic 技術 to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
Lab: Introduction to Defense Testing
學習 to 系統性地 test LLM application defenses by probing input filters, output validators, and 護欄s.
Lab: Delimiter Injection Attacks
Craft payloads that exploit XML, markdown, and special 符元 delimiters to break instruction boundaries and manipulate how LLMs parse 系統提示詞s versus user input.
實作:分隔符逃脫攻擊
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
Delimiter Escape Techniques
Practice escaping common delimiters used to separate 系統提示詞s from user input.
嵌入 Basics for 安全
Underst與text embeddings與their security relevance by generating,comparing,manipulating embedding vectors.
Lab: Embedding Fundamentals for Red Teamers
學習 embedding fundamentals including vector similarity, semantic search, and how 嵌入 enable RAG systems.
Encoding and Obfuscation Basics
探索 how Base64, ROT13, and Unicode encoding can bypass input filters in LLM applications.
Lab: Testing Environment Hardening
Secure your AI red team 測試環境 to prevent credential leaks, accidental production testing, and data exposure.
Lab: Error Message Analysis for Recon
分析 LLM application error messages to extract information about system architecture, models, and 防禦層.
Error Message Intelligence Gathering
萃取system architecture information from error messages與response patterns in LLM applications.
Lab: Ethical Red Teaming
Practice responsible AI 紅隊演練 with proper documentation, scope management, and ethical decision-making frameworks.
Lab: Ethical Red Teaming (Beginner Lab)
動手實作 for practicing responsible AI 紅隊演練 with proper documentation, scope management, ethical boundaries, and disclosure procedures.
Lab: Evidence Collection for LLM Testing
設定 evidence collection workflows including request/response logging, screenshots, and reproducibility documentation.
Few-Shot Injection Fundamentals
Craft few-shot examples that prime the model to follow attacker instructions in subsequent turns.
Lab: Few-Shot Manipulation Attacks
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired 對抗性 behavior through fabricated conversation examples.
Your First Claude API Call
設定 the Anthropic SDK and make your first Claude API call with 系統提示詞s and messages.
Your First LLM API Call with OpenAI
設定 your Python environment and make your first LLM API call to understand request/response patterns.
實作:你的第一個提示詞注入
動手實作 where you perform your first 提示詞注入 attacks against a chatbot, progressing from basic overrides to more sophisticated 技術.
實作:你的第一個越獄嘗試
Try basic 越獄 技術 against a local model using Ollama, learning the difference between 提示詞注入 and 越獄ing through hands-on experimentation.
Lab: Format String Injection in LLMs
Practice injecting format strings and template directives to manipulate LLM output structure and content.
Your First Garak Vulnerability Scan
Install and run NVIDIA's Garak tool to perform automated 漏洞 scanning on an LLM endpoint.
實作:Garak 設定與首次掃描
Install and configure garak, the LLM 漏洞 scanner, then run your first automated scan against a local model to discover common vulnerabilities.
實作:幻覺偵測基礎
學習 to detect and trigger hallucinations in LLM outputs including factual errors, fabricated citations, and invented APIs.
Hello World 提示詞 注入
Write與test your first 提示詞注入 payload 對抗 a simple chatbot to underst與the fundamental attack mechanism.
Lab: Injection Detection Tool
建構 a basic 提示詞注入 detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
Lab: Injection Attempt Log Analysis
建構 a logging pipeline to capture and analyze injection attempts for pattern discovery and technique refinement.
實作:注入技術概覽
Survey and test ten fundamental 提示詞注入 技術 against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
Lab: Instruction Following Priority
測試 how 語言模型 prioritize conflicting instructions from 系統提示詞s, user messages, and embedded directives to understand the instruction hierarchy.
Instruction Hierarchy Testing
測試 how models prioritize conflicting instructions between system, user, and assistant roles.
Lab: Basic Jailbreak Techniques
Hands-on exploration of 越獄 技術 including role-play, DAN-style prompts, and academic framing against multiple models.
JSON Output 利用 Basics
利用 LLM JSON mode to bypass content filters與extract information through structured output channels.
Lab: JSON Output Mode Security Testing
測試 JSON output mode for injection vulnerabilities including schema manipulation and structured output 漏洞利用.
Lab: Multi-Language Prompt Testing
測試 LLM safety boundaries across multiple languages to identify inconsistencies in safety training coverage.
Lab: Local Model Setup for Testing
設定 local LLM instances using Ollama for safe, cost-free red team testing without API costs or rate limits.
實作:Markdown 注入
Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.
模型 安全 Comparison 實驗室
Compare the security posture of different LLM models by running identical test suites across providers.
Lab: Basic Model Fingerprinting
學習 to identify which LLM model powers an application through behavioral fingerprinting and response analysis.
實作:多語言注入
測試 提示詞注入 技術 using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
Lab: Multi-Turn Escalation Attacks
建構 trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation 技術.
Multi-Turn Conversation Probing
Use multi-turn conversations to gradually escalate requests and probe LLM safety boundaries.
Lab: Multimodal Input Testing Basics
Introduction to testing multimodal LLMs with image and text inputs to understand cross-modal behavior.
Lab: Setting Up Ollama for Local LLM Testing
Install and configure Ollama for local LLM 紅隊演練, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.
Lab: Detecting Output Filters
Identify and characterize output filtering mechanisms in LLM applications through systematic response analysis.
Output Format Control 實驗室
Practice controlling LLM output formats to underst與how format constraints affect safety mechanism effectiveness.
實作:輸出格式利用
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
Output Format Manipulation
Force LLMs to output in specific formats (JSON, XML, code) to bypass output sanitization.
實作:輸出引導
學習 to steer LLM outputs into specific formats, tones, and content using injection 技術 that manipulate how the model structures and presents its responses.
實作:載荷製作
學習 to craft effective 提示詞注入 payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
Lab: Encoding Bypasses
測試 Base64, ROT13, hex, and Unicode encoding 技術 to evade LLM content filters while preserving the semantic meaning of 對抗性 payloads.
Setting Up Payload Logging
建構 a payload logging system to track 提示詞注入 attempts and model responses.
Lab: LLM Playground Exploration
探索 LLM provider playgrounds to understand model behavior, parameter effects, and interactive testing workflows.
LLM Playground 安全 測試
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
Lab: Testing Prompt Leaking Defenses
測試 various prompt leaking defense configurations to evaluate their effectiveness against extraction 技術.
Lab: Prompt Leaking Basics
學習 to extract 系統提示詞s from chatbots using common leaking 技術 including direct requests, indirect elicitation, and format manipulation.
提示詞 Structure 分析 實驗室
Reverse-engineer the prompt structure of target applications by analyzing response patterns與behavioral cues.
Lab: Prompt Template Vulnerability Testing
測試 common prompt template patterns for injection vulnerabilities including string interpolation and placeholder abuse.
Setting Up Promptfoo for LLM Evaluation
配置 Promptfoo to create automated test suites for evaluating LLM safety and robustness.
Lab: Promptfoo Setup and First Eval
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
Lab: PyRIT Setup and First Attack
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated 紅隊演練, then run your first orchestrated attack against a local model.
Lab: Rate Limit Enumeration and Bypass
Enumerate API rate limits and test common bypass 技術 including header manipulation and request distribution.
Rate Limit and Quota Mapping
Map the rate limits,quotas,throttling behavior of LLM APIs to plan efficient testing strategies.
Lab: Analyzing Model Refusal Patterns
系統性 map model refusal patterns to understand safety training boundaries and identify potential bypasses.
Lab: Red Team Report Writing Basics
Practice writing clear, actionable red team findings reports with evidence, risk ratings, and remediation guidance.
Response 分析 Fundamentals
學習to analyze LLM responses for information leakage,inconsistencies,indicators of safety mechanism activation.
Lab: Analyzing LLM Responses
學習 to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting 技術 when analyzing LLM outputs during red team testing.
實作:回應一致性測試
測試 LLM response consistency across repeated queries to identify non-deterministic safety boundary behavior.
實作:角色扮演攻擊
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
Lab: Running Safety Benchmarks
Run standardized safety benchmarks against LLM models to establish baseline safety profiles for comparison.
Lab: Mapping Safety Boundaries
系統性 discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.
安全 Boundary Mapping
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics與documenting refusal patterns.
Lab: Introduction to Safety Testing
學習 the fundamentals of LLM safety testing including test case design, baseline measurement, and result documentation.
安全 訓練 Boundary Probing
Systematically probe the boundaries of RLHF safety training to underst與where與how safety behaviors are enforced.
Lab: Vulnerability Scoring Fundamentals
學習 漏洞 scoring frameworks adapted for LLM systems including severity, exploitability, and impact assessment.
Lab: Simple Payload Encoding Techniques
Practice encoding injection payloads using Base64, hex, URL encoding, and Unicode to bypass basic input filters.
Lab: Structured Output Manipulation
探索 how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.
Lab: System Prompt Enumeration Techniques
Practice systematic approaches to enumerating hidden 系統提示詞 content through varied probing strategies.
System Prompt Extraction Fundamentals
學習 to extract hidden 系統提示詞s from LLM applications using direct and indirect 技術.
實作:系統提示詞覆寫
探索 技術 to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
Lab: System Prompt Reconstruction
Use various 技術 to 系統性地 extract and reconstruct the full 系統提示詞 of an LLM application, combining direct, indirect, and incremental extraction methods.
Lab: Temperature and Sampling Security Effects
探索 how temperature and sampling parameters affect model susceptibility to 越獄s and safety boundary enforcement.
Lab: Designing LLM Red Team Test Cases
設計 effective red team test cases with clear objectives, success criteria, and reproducible execution procedures.
Token Counting and Estimation
理解 符元ization by counting and estimating 符元s across different models and encoders.
實作:符元操控基礎
理解 how 符元ization affects LLM behavior and practice basic 符元-level manipulation 技術.
Lab: Comparing Red Team Testing Tools
Compare Garak, PyRIT, and Promptfoo capabilities through hands-on exercises using each tool against the same target.
Red Team 工具 Installation and Configuration
Install與configure essential AI red teaming tools: Garak,PyRIT,Promptfoo,custom Python harnesses.
API Key Scope and Permission 測試
測試API key scoping與permission boundaries to identify over-privileged access configurations.
API Response Header 分析
分析HTTP response headers from LLM APIs to fingerprint providers,versions,middleware.
Basic Indirect 提示詞 注入
Plant與trigger a basic indirect 提示詞注入 payload in content consumed by an LLM.
Basic Payload Mutation Techniques
學習fundamental payload mutation strategies,包括 synonym replacement與structural transforms.
Basic RAG Query 注入
Craft user queries that manipulate RAG retrieval to surface unintended documents.
Basic Rate Limit Abuse Patterns
測試common rate-limit bypass patterns,包括 header manipulation與endpoint discovery.
Character Encoding 繞過 Techniques
繞過input filters using Unicode normalization,homoglyph substitution,mixed-script encoding.
Chatbot Persona and Capability Mapping
Map the configured persona,capabilities,restrictions of an LLM chatbot through systematic probing.
Completion Hijacking Fundamentals
Craft partial sentences that steer model completions toward attacker-desired outputs.
Conversation Reset Attacks
利用 conversation resets與context clearing to weaken model adherence to safety instructions.
Emoji and Unicode 注入 Techniques
Use emoji sequences與Unicode special characters to bypass text-based input filters.
Error Message 利用
Trigger與analyze error messages from LLM applications to reveal architecture與configuration details.
Your First HarmBench 評估
Run a standardized safety evaluation using the HarmBench framework 對抗 a target model.
Your First Inspect AI 評估
Set up與run a basic AI safety evaluation using the UK AISI Inspect framework.
JSON 注入 Basics
注入adversarial content through JSON-formatted inputs to exploit structured data processing.
Your First LLM Guard Scan
Install與configure LLM Guard to scan inputs與outputs,understanding detection capabilities與gaps.
Markdown Rendering Exfiltration
利用 Markdown rendering in chat UIs to exfiltrate data through image tags與links.
模型 Fingerprinting Basics
Identify which LLM model powers an application through behavioral fingerprinting techniques.
Introduction to NeMo 護欄
Set up NVIDIA NeMo Guardrails與create your first Colang-based guardrail for an LLM application.
提示詞 注入 via File Names
Embed 提示詞注入 payloads in filenames與metadata of uploaded documents.
提示詞 Leaking via Summarization Requests
Use summarization requests to trick an LLM into revealing its hidden 系統提示詞 through indirect disclosure.
Analyzing Refusal Messages for Intel
萃取useful information about model configuration與guardrails by systematically analyzing refusal messages.
Response Timing Side-Channel 分析
Use response timing differences to infer information about model processing與guardrail activation.
安全 Boundary Mapping 練習
Systematically map the safety boundaries of an LLM application across multiple topic categories.
Simple Output Constraint Attacks
Force models to output in constrained formats that bypass output safety filters.
系統 提示詞 重建 from Clues
Piece together a hidden 系統提示詞 by collecting partial information from multiple probe techniques.
系統 提示詞 萃取 via Error 注入
Trigger specific error conditions that cause the model to inadvertently reveal 系統提示詞 contents.
Temperature and Top-K Effects on 安全
Systematically test how temperature,top-k,top-p parameters affect safety guardrail effectiveness.
提示詞 注入 via Translation
利用 LLM translation capabilities to smuggle instructions through language boundaries.
XML 注入 in LLM Contexts
利用 XML tag handling in LLM applications to manipulate instruction parsing.
直接注入基礎
直接向 LLM 提示詞注入指令的核心概念,包括覆寫技術、簡單載荷製作,以及理解模型如何解析衝突指令。
Prompt Leaking Step by Step
Systematic approaches to extract system prompts from LLM applications, covering direct elicitation, indirect inference, differential analysis, and output-based reconstruction.
執行你之首次 Garak 掃描
自零執行你之第一個 garak 漏洞掃描之逐步初學者演練,涵蓋安裝、目標設置、探測選擇與基礎結果解讀。
Running Your First Promptfoo Evaluation
初階 walkthrough for running your first promptfoo evaluation from scratch, covering installation, configuration, test case creation, assertion writing, and result interpretation.
Running Your First PyRIT 紅隊 Campaign
初階 walkthrough for running your first PyRIT red team campaign from scratch, covering installation, target configuration, orchestrator setup, and basic result analysis.
使用 PyRIT UI 前端
初學者教學:使用 PyRIT 網頁式 UI 前端,以視覺化方式管理紅隊行動,包括啟動行動、監控進度,以及不需寫程式即可檢視結果。