# beginner
標記為「beginner」的 328 篇文章
Beginner Certification Practice Exam
Full-length practice exam for beginner-level AI security certification covering foundations and basic techniques.
The AI API Ecosystem
A red teamer's guide to the AI API landscape — OpenAI, Anthropic, Google, AWS, Azure, open-source APIs, authentication patterns, and common security misconfigurations.
AI Deployment Patterns and Security Implications
How API-based, self-hosted, edge, and hybrid deployment patterns each create distinct security considerations and attack surfaces for AI systems.
The AI Landscape: A Red Teamer's Map
Major models, providers, deployment patterns, and the AI stack from hardware to application — orienting red teamers in the current AI ecosystem.
Model Types and Their Attack Surfaces
How text, vision, multimodal, embedding, and code generation models each present unique vulnerabilities and attack surfaces for red teamers.
Open vs Closed Models: Security Tradeoffs
Security implications of open-weight vs closed-source AI models — weight access, responsible deployment, fine-tuning risks, and the impact on red teaming strategy.
How LLMs Work: A Red Teamer's Guide
Understand the fundamentals of large language models — token prediction, context windows, roles, and temperature — through a security-focused lens.
AI Attack Taxonomy
A comprehensive classification of AI attacks organized by target, technique, and impact — providing a shared vocabulary for red team planning and reporting.
Ethics of AI Red Teaming
Responsible testing practices, avoiding real harm, navigating dual-use concerns, and professional standards for AI red team practitioners.
Red Team Methodology Fundamentals
What AI red teaming is, how it differs from traditional security testing, and the complete engagement lifecycle from scoping to reporting.
Legal Landscape for AI Testing
Authorization requirements, terms of service considerations, computer fraud laws, and responsible disclosure frameworks for AI red teaming.
Threat Modeling for AI Systems
How to identify assets, threats, and attack vectors specific to AI systems using simplified threat modeling frameworks adapted for machine learning.
Anatomy of an LLM API Call
Understand the HTTP request structure for OpenAI, Anthropic, and other LLM APIs — system messages, parameters, function calling, and common misconfigurations.
AI System Architecture for Red Teamers
How AI systems are deployed in production — model API, prompt templates, orchestration, tools, memory, and guardrails — with attack surface analysis at each layer.
Getting Started with AI Red Teaming Labs
Introduction to the hands-on lab environment for AI red teaming. Learn what you'll need, how labs are structured, and get an overview of all beginner exercises.
Lab: Anthropic Claude API Basics
Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.
Lab: API Authentication Security Testing
Test LLM API authentication mechanisms for vulnerabilities including token leakage, replay attacks, and scope bypass.
Multi-Provider API Exploration
Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.
Lab: API Key Security
Learn common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.
API Rate Limit and Error Handling
Test LLM API rate limits and implement proper error handling for automated testing workflows.
Lab: API Response Parsing and Analysis
Parse and analyze LLM API responses to identify anomalies, safety filter triggers, and information leakage patterns.
Lab: Setting Up Automated LLM Testing
Configure automated testing pipelines for LLM red teaming using Python scripts, test runners, and result collection.
Lab: Crafting Basic Adversarial Examples
Create basic adversarial examples that cause LLMs to misclassify, misinterpret, or bypass safety checks on text input.
Basic Automated Testing Setup
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
Basic Classifier Evasion
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
Basic Data Exfiltration Techniques
Extract sensitive information from LLM applications using social engineering and misdirection.
Lab: Build Your First Defense
Hands-on lab for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block prompt injection attempts.
Basic Defense Mechanism Testing
Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.
Lab: Build Your First Defense (Beginner Lab)
Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.
Lab: Introduction to LLM Fuzzing
Learn the basics of fuzzing LLM inputs by mutating prompts to discover unexpected behaviors and edge cases.
Lab: Jailbreak Technique Taxonomy
Explore the major categories of jailbreak techniques and practice classifying attack payloads by technique type.
Lab: Basic RAG System Security Testing
Introduction to testing Retrieval-Augmented Generation systems for document injection and retrieval manipulation vulnerabilities.
Security Finding Documentation Exercise
Practice documenting security findings in a professional format with reproducible steps and impact assessment.
Basic Role-Play Prompt Injection
Use role-play techniques to bypass basic LLM safety guardrails through character assumption.
Lab: Social Engineering LLM Applications
Practice social engineering techniques adapted for LLM applications including authority escalation and urgency injection.
Lab: Model Comparison
Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.
Lab: Compare Model Safety
Hands-on lab for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle prompt injection, jailbreaks, and safety boundary enforcement.
Lab: Content Policy Boundary Mapping
Map the content policy boundaries of LLM applications to identify permitted and restricted topic areas systematically.
Lab: Context Manipulation
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
Lab: Context Overflow Attacks
Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
Basic Context Window Overflow
Fill the context window with padding content to push safety instructions out of the attention window.
Conversation History Analysis
Analyze how conversation history influences model behavior and identify opportunities for context manipulation.
Lab: Conversation History Manipulation
Test how LLM applications handle conversation history including truncation, injection, and context window management.
Lab: Defense Bypass Basics
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
Lab: Introduction to Defense Testing
Learn to systematically test LLM application defenses by probing input filters, output validators, and guardrails.
Lab: Delimiter Injection Attacks
Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.
Lab: Delimiter Escape Attacks
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
Delimiter Escape Techniques
Practice escaping common delimiters used to separate system prompts from user input.
Embedding Basics for Security
Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.
Lab: Embedding Fundamentals for Red Teamers
Learn embedding fundamentals including vector similarity, semantic search, and how embeddings enable RAG systems.
Encoding and Obfuscation Basics
Explore how Base64, ROT13, and Unicode encoding can bypass input filters in LLM applications.
Lab: Testing Environment Hardening
Secure your AI red team testing environment to prevent credential leaks, accidental production testing, and data exposure.
Lab: Error Message Analysis for Recon
Analyze LLM application error messages to extract information about system architecture, models, and defensive layers.
Error Message Intelligence Gathering
Extract system architecture information from error messages and response patterns in LLM applications.
Lab: Ethical Red Teaming
Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.
Lab: Ethical Red Teaming (Beginner Lab)
Hands-on lab for practicing responsible AI red teaming with proper documentation, scope management, ethical boundaries, and disclosure procedures.
Lab: Evidence Collection for LLM Testing
Set up evidence collection workflows including request/response logging, screenshots, and reproducibility documentation.
Few-Shot Injection Fundamentals
Craft few-shot examples that prime the model to follow attacker instructions in subsequent turns.
Lab: Few-Shot Manipulation Attacks
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.
Your First Claude API Call
Set up the Anthropic SDK and make your first Claude API call with system prompts and messages.
Your First LLM API Call with OpenAI
Set up your Python environment and make your first LLM API call to understand request/response patterns.
Lab: Your First Prompt Injection
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
Lab: Your First Jailbreak
Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.
Lab: Format String Injection in LLMs
Practice injecting format strings and template directives to manipulate LLM output structure and content.
Your First Garak Vulnerability Scan
Install and run NVIDIA's Garak tool to perform automated vulnerability scanning on an LLM endpoint.
Lab: Garak Setup and First Scan
Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.
Lab: Hallucination Detection Basics
Learn to detect and trigger hallucinations in LLM outputs including factual errors, fabricated citations, and invented APIs.
Hello World Prompt Injection
Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.
Lab: Injection Detection Tool
Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
Lab: Injection Attempt Log Analysis
Build a logging pipeline to capture and analyze injection attempts for pattern discovery and technique refinement.
Lab: Injection Techniques Survey
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
Lab: Instruction Following Priority
Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.
Instruction Hierarchy Testing
Test how models prioritize conflicting instructions between system, user, and assistant roles.
Lab: Basic Jailbreak Techniques
Hands-on exploration of jailbreak techniques including role-play, DAN-style prompts, and academic framing against multiple models.
JSON Output Exploitation Basics
Exploit LLM JSON mode to bypass content filters and extract information through structured output channels.
Lab: JSON Output Mode Security Testing
Test JSON output mode for injection vulnerabilities including schema manipulation and structured output exploitation.
Lab: Multi-Language Prompt Testing
Test LLM safety boundaries across multiple languages to identify inconsistencies in safety training coverage.
Lab: Local Model Setup for Testing
Set up local LLM instances using Ollama for safe, cost-free red team testing without API costs or rate limits.
Lab: Markdown Injection
Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.
Model Security Comparison Lab
Compare the security posture of different LLM models by running identical test suites across providers.
Lab: Basic Model Fingerprinting
Learn to identify which LLM model powers an application through behavioral fingerprinting and response analysis.
Lab: Multi-Language Injection
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
Lab: Multi-Turn Escalation Attacks
Build trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation techniques.
Multi-Turn Conversation Probing
Use multi-turn conversations to gradually escalate requests and probe LLM safety boundaries.
Lab: Multimodal Input Testing Basics
Introduction to testing multimodal LLMs with image and text inputs to understand cross-modal behavior.
Lab: Setting Up Ollama for Local LLM Testing
Install and configure Ollama for local LLM red teaming, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.
Lab: Detecting Output Filters
Identify and characterize output filtering mechanisms in LLM applications through systematic response analysis.
Output Format Control Lab
Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.
Lab: Output Format Exploitation
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
Output Format Manipulation
Force LLMs to output in specific formats (JSON, XML, code) to bypass output sanitization.
Lab: Output Steering
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
Lab: Payload Crafting
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
Lab: Encoding Bypasses
Test Base64, ROT13, hex, and Unicode encoding techniques to evade LLM content filters while preserving the semantic meaning of adversarial payloads.
Setting Up Payload Logging
Build a payload logging system to track prompt injection attempts and model responses.
Lab: LLM Playground Exploration
Explore LLM provider playgrounds to understand model behavior, parameter effects, and interactive testing workflows.
LLM Playground Security Testing
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
Lab: Testing Prompt Leaking Defenses
Test various prompt leaking defense configurations to evaluate their effectiveness against extraction techniques.
Lab: Prompt Leaking Basics
Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.
Prompt Structure Analysis Lab
Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.
Lab: Prompt Template Vulnerability Testing
Test common prompt template patterns for injection vulnerabilities including string interpolation and placeholder abuse.
Setting Up Promptfoo for LLM Evaluation
Configure Promptfoo to create automated test suites for evaluating LLM safety and robustness.
Lab: Promptfoo Setup and First Eval
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
Lab: PyRIT Setup and First Attack
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.
Lab: Rate Limit Enumeration and Bypass
Enumerate API rate limits and test common bypass techniques including header manipulation and request distribution.
Rate Limit and Quota Mapping
Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.
Lab: Analyzing Model Refusal Patterns
Systematically map model refusal patterns to understand safety training boundaries and identify potential bypasses.
Lab: Red Team Report Writing Basics
Practice writing clear, actionable red team findings reports with evidence, risk ratings, and remediation guidance.
Response Analysis Fundamentals
Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.
Lab: Analyzing LLM Responses
Learn to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting techniques when analyzing LLM outputs during red team testing.
Lab: Response Consistency Testing
Test LLM response consistency across repeated queries to identify non-deterministic safety boundary behavior.
Lab: Role-Play Attacks
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
Lab: Running Safety Benchmarks
Run standardized safety benchmarks against LLM models to establish baseline safety profiles for comparison.
Lab: Mapping Safety Boundaries
Systematically discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.
Safety Boundary Mapping
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.
Lab: Introduction to Safety Testing
Learn the fundamentals of LLM safety testing including test case design, baseline measurement, and result documentation.
Safety Training Boundary Probing
Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.
Lab: Vulnerability Scoring Fundamentals
Learn vulnerability scoring frameworks adapted for LLM systems including severity, exploitability, and impact assessment.
Lab: Simple Payload Encoding Techniques
Practice encoding injection payloads using Base64, hex, URL encoding, and Unicode to bypass basic input filters.
Lab: Structured Output Manipulation
Explore how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.
Lab: System Prompt Enumeration Techniques
Practice systematic approaches to enumerating hidden system prompt content through varied probing strategies.
System Prompt Extraction Fundamentals
Learn to extract hidden system prompts from LLM applications using direct and indirect techniques.
Lab: System Prompt Override
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
Lab: System Prompt Reconstruction
Use various techniques to systematically extract and reconstruct the full system prompt of an LLM application, combining direct, indirect, and incremental extraction methods.
Lab: Temperature and Sampling Security Effects
Explore how temperature and sampling parameters affect model susceptibility to jailbreaks and safety boundary enforcement.
Lab: Designing LLM Red Team Test Cases
Design effective red team test cases with clear objectives, success criteria, and reproducible execution procedures.
Token Counting and Estimation
Understand tokenization by counting and estimating tokens across different models and encoders.
Lab: Token Manipulation Basics
Understand how tokenization affects LLM behavior and practice basic token-level manipulation techniques.
Lab: Comparing Red Team Testing Tools
Compare Garak, PyRIT, and Promptfoo capabilities through hands-on exercises using each tool against the same target.
Red Team Tool Installation and Configuration
Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.
API Key Scope and Permission Testing
Test API key scoping and permission boundaries to identify over-privileged access configurations.
API Response Header Analysis
Analyze HTTP response headers from LLM APIs to fingerprint providers, versions, and middleware.
Basic Indirect Prompt Injection
Plant and trigger a basic indirect prompt injection payload in content consumed by an LLM.
Basic Payload Mutation Techniques
Learn fundamental payload mutation strategies including synonym replacement and structural transforms.
Basic RAG Query Injection
Craft user queries that manipulate RAG retrieval to surface unintended documents.
Basic Rate Limit Abuse Patterns
Test common rate-limit bypass patterns including header manipulation and endpoint discovery.
Character Encoding Bypass Techniques
Bypass input filters using Unicode normalization, homoglyph substitution, and mixed-script encoding.
Chatbot Persona and Capability Mapping
Map the configured persona, capabilities, and restrictions of an LLM chatbot through systematic probing.
Completion Hijacking Fundamentals
Craft partial sentences that steer model completions toward attacker-desired outputs.
Conversation Reset Attacks
Exploit conversation resets and context clearing to weaken model adherence to safety instructions.
Emoji and Unicode Injection Techniques
Use emoji sequences and Unicode special characters to bypass text-based input filters.
Error Message Exploitation
Trigger and analyze error messages from LLM applications to reveal architecture and configuration details.
Your First HarmBench Evaluation
Run a standardized safety evaluation using the HarmBench framework against a target model.
Your First Inspect AI Evaluation
Set up and run a basic AI safety evaluation using the UK AISI Inspect framework.
JSON Injection Basics
Inject adversarial content through JSON-formatted inputs to exploit structured data processing.
Your First LLM Guard Scan
Install and configure LLM Guard to scan inputs and outputs, understanding detection capabilities and gaps.
Markdown Rendering Exfiltration
Exploit Markdown rendering in chat UIs to exfiltrate data through image tags and links.
Model Fingerprinting Basics
Identify which LLM model powers an application through behavioral fingerprinting techniques.
Introduction to NeMo Guardrails
Set up NVIDIA NeMo Guardrails and create your first Colang-based guardrail for an LLM application.
Prompt Injection via File Names
Embed prompt injection payloads in filenames and metadata of uploaded documents.
Prompt Leaking via Summarization Requests
Use summarization requests to trick an LLM into revealing its hidden system prompt through indirect disclosure.
Analyzing Refusal Messages for Intel
Extract useful information about model configuration and guardrails by systematically analyzing refusal messages.
Response Timing Side-Channel Analysis
Use response timing differences to infer information about model processing and guardrail activation.
Safety Boundary Mapping Exercise
Systematically map the safety boundaries of an LLM application across multiple topic categories.
Simple Output Constraint Attacks
Force models to output in constrained formats that bypass output safety filters.
System Prompt Reconstruction from Clues
Piece together a hidden system prompt by collecting partial information from multiple probe techniques.
System Prompt Extraction via Error Injection
Trigger specific error conditions that cause the model to inadvertently reveal system prompt contents.
Temperature and Top-K Effects on Safety
Systematically test how temperature, top-k, and top-p parameters affect safety guardrail effectiveness.
Prompt Injection via Translation
Exploit LLM translation capabilities to smuggle instructions through language boundaries.
XML Injection in LLM Contexts
Exploit XML tag handling in LLM applications to manipulate instruction parsing.
Direct Injection Basics
Core concepts of directly injecting instructions into LLM prompts, including override techniques, simple payload crafting, and understanding how models parse conflicting instructions.
Prompt Leaking Step by Step
Systematic approaches to extract system prompts from LLM applications, covering direct elicitation, indirect inference, differential analysis, and output-based reconstruction.
Running Your First Garak Scan
Step-by-step beginner walkthrough for running your very first garak vulnerability scan from zero, covering installation, target setup, probe selection, and basic result interpretation.
Running Your First Promptfoo Evaluation
Beginner walkthrough for running your first promptfoo evaluation from scratch, covering installation, configuration, test case creation, assertion writing, and result interpretation.
Running Your First PyRIT Red Team Campaign
Beginner walkthrough for running your first PyRIT red team campaign from scratch, covering installation, target configuration, orchestrator setup, and basic result analysis.
Using the PyRIT UI Frontend
Beginner walkthrough on using PyRIT's web-based UI frontend for visual red team campaign management, including launching campaigns, monitoring progress, and reviewing results without writing code.
初階 Certification Practice Exam
Full-length practice exam for beginner-level AI security certification covering foundations and basic techniques.
AI API 生態系
紅隊員之 AI API 生態系指南——OpenAI、Anthropic、Google、AWS、Azure、開源 API、身分驗證模式,與常見安全錯誤組態。
AI Deployment Patterns and 安全 Implications
How API-based, self-hosted, edge, and hybrid deployment patterns each create distinct security considerations and attack surfaces for AI systems.
AI 地景:紅隊員地圖
主要模型、提供者、部署模式,與自硬體至應用之 AI 堆疊——為當前 AI 生態系中之紅隊員定向。
模型類型與其攻擊面
文字、視覺、多模態、embedding 與程式碼生成模型如何各呈現紅隊員獨特之漏洞與攻擊面。
開放 vs 封閉模型:安全權衡
開放權重 vs 封閉原始碼 AI 模型之安全意涵——權重存取、負責任部署、微調風險,與對紅隊策略之影響。
AI 攻擊分類
依目標、技術與影響組織之 AI 攻擊完整分類——為紅隊規劃與報告提供共享詞彙。
AI 紅隊倫理
負責任之測試實務、避免現實傷害、導航雙重用途關切,以及 AI 紅隊從業者之專業標準。
AI 測試之法律地景
AI 紅隊之授權要求、服務條款考量、電腦詐欺法規,與負責任揭露框架。
為 AI 系統之威脅建模
如何使用為機器學習適配之簡化威脅建模框架辨識 AI 系統特有之資產、威脅與攻擊向量。
LLM API 呼叫解剖
理解 OpenAI、Anthropic 及其他 LLM API 的 HTTP 請求結構——系統訊息、參數、函式呼叫與常見組態錯誤。
紅隊視角的 AI 系統架構
AI 系統如何於生產環境部署——模型 API、提示範本、編排、工具、記憶體與防護機制——並於每一層進行攻擊面分析。
開始學習 AI 紅隊實驗
AI 紅隊動手實驗環境介紹。瞭解你所需的資源、實驗的結構,並掌握所有初階練習的概觀。
實驗室: Anthropic Claude API Basics
Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.
實驗室: API Authentication 安全 Testing
Test LLM API authentication mechanisms for vulnerabilities including token leakage, replay attacks, and scope bypass.
Multi-Provider API Exploration
Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.
實驗室: API Key 安全
Learn common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.
API Rate Limit and Error Handling
Test LLM API rate limits and implement proper error handling for automated testing workflows.
實驗室: API Response Parsing and Analysis
Parse and analyze LLM API responses to identify anomalies, safety filter triggers, and information leakage patterns.
實驗室: Setting Up Automated LLM Testing
Configure automated testing pipelines for LLM red teaming using Python scripts, test runners, and result collection.
實驗室: Crafting Basic Adversarial Examples
Create basic adversarial examples that cause LLMs to misclassify, misinterpret, or bypass safety checks on text input.
Basic Automated Testing Setup
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
Basic Classifier Evasion
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
Basic Data Exfiltration Techniques
Extract sensitive information from LLM applications using social engineering and misdirection.
實驗室: Build Your First 防禦
Hands-on lab for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block prompt injection attempts.
Basic 防禦 Mechanism Testing
Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.
實驗室: Build Your First 防禦 (初階 實驗室)
Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.
實驗室: 介紹 to LLM Fuzzing
Learn the basics of fuzzing LLM inputs by mutating prompts to discover unexpected behaviors and edge cases.
實驗室: 越獄 Technique Taxonomy
Explore the major categories of jailbreak techniques and practice classifying attack payloads by technique type.
實驗室: Basic RAG System 安全 Testing
介紹 to testing Retrieval-Augmented Generation systems for document injection and retrieval manipulation vulnerabilities.
安全 Finding Documentation 練習
Practice documenting security findings in a professional format with reproducible steps and impact assessment.
Basic Role-Play 提示詞注入
Use role-play techniques to bypass basic LLM safety guardrails through character assumption.
實驗室: Social Engineering LLM Applications
Practice social engineering techniques adapted for LLM applications including authority escalation and urgency injection.
實驗室: 模型 Comparison
Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.
實驗室: Compare 模型 Safety
Hands-on lab for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle prompt injection, jailbreaks, and safety boundary enforcement.
實驗室: Content Policy Boundary Mapping
Map the content policy boundaries of LLM applications to identify permitted and restricted topic areas systematically.
實驗室: Context Manipulation
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
實驗室: Context Overflow 攻擊s
Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
Basic Context Window Overflow
Fill the context window with padding content to push safety instructions out of the attention window.
Conversation History Analysis
Analyze how conversation history influences model behavior and identify opportunities for context manipulation.
實驗室: Conversation History Manipulation
Test how LLM applications handle conversation history including truncation, injection, and context window management.
實驗室: 防禦 Bypass Basics
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
實驗室: 介紹 to 防禦 Testing
Learn to systematically test LLM application defenses by probing input filters, output validators, and guardrails.
實驗室: Delimiter Injection 攻擊s
Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.
實驗室: Delimiter Escape 攻擊s
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
Delimiter Escape Techniques
Practice escaping common delimiters used to separate system prompts from user input.
Embedding Basics for 安全
Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.
實驗室: Embedding 基礎 for 紅隊ers
Learn embedding fundamentals including vector similarity, semantic search, and how embeddings enable RAG systems.
Encoding and Obfuscation Basics
Explore how Base64, ROT13, and Unicode encoding can bypass input filters in LLM applications.
實驗室: Testing Environment Hardening
Secure your AI red team testing environment to prevent credential leaks, accidental production testing, and data exposure.
實驗室: Error Message Analysis for Recon
Analyze LLM application error messages to extract information about system architecture, models, and defensive layers.
Error Message Intelligence Gathering
Extract system architecture information from error messages and response patterns in LLM applications.
實驗室: Ethical 紅隊演練
Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.
實驗室: Ethical 紅隊演練 (初階 實驗室)
Hands-on lab for practicing responsible AI red teaming with proper documentation, scope management, ethical boundaries, and disclosure procedures.
實驗室: Evidence Collection for LLM Testing
Set up evidence collection workflows including request/response logging, screenshots, and reproducibility documentation.
Few-Shot Injection 基礎
Craft few-shot examples that prime the model to follow attacker instructions in subsequent turns.
實驗室: Few-Shot Manipulation 攻擊s
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.
Your First Claude API Call
Set up the Anthropic SDK and make your first Claude API call with system prompts and messages.
Your First LLM API Call with OpenAI
Set up your Python environment and make your first LLM API call to understand request/response patterns.
實驗室: Your First 提示詞注入
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
實驗室: Your First 越獄
Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.
實驗室: Format String Injection in LLMs
Practice injecting format strings and template directives to manipulate LLM output structure and content.
Your First Garak 漏洞 Scan
Install and run NVIDIA's Garak tool to perform automated vulnerability scanning on an LLM endpoint.
實驗室: Garak Setup and First Scan
Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.
實驗室: Hallucination Detection Basics
Learn to detect and trigger hallucinations in LLM outputs including factual errors, fabricated citations, and invented APIs.
Hello World 提示詞注入
Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.
實驗室: Injection Detection 工具
Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
實驗室: Injection Attempt Log Analysis
Build a logging pipeline to capture and analyze injection attempts for pattern discovery and technique refinement.
實驗室: Injection Techniques Survey
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
實驗室: Instruction Following Priority
Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.
Instruction Hierarchy Testing
Test how models prioritize conflicting instructions between system, user, and assistant roles.
實驗室: Basic 越獄 Techniques
Hands-on exploration of jailbreak techniques including role-play, DAN-style prompts, and academic framing against multiple models.
JSON Output 利用ation Basics
利用 LLM JSON mode to bypass content filters and extract information through structured output channels.
實驗室: JSON Output Mode 安全 Testing
Test JSON output mode for injection vulnerabilities including schema manipulation and structured output exploitation.
實驗室: Multi-Language Prompt Testing
Test LLM safety boundaries across multiple languages to identify inconsistencies in safety training coverage.
實驗室: Local 模型 Setup for Testing
Set up local LLM instances using Ollama for safe, cost-free red team testing without API costs or rate limits.
實驗室: Markdown Injection
Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.
模型 安全 Comparison 實驗室
Compare the security posture of different LLM models by running identical test suites across providers.
實驗室: Basic 模型 Fingerprinting
Learn to identify which LLM model powers an application through behavioral fingerprinting and response analysis.
實驗室: Multi-Language Injection
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
實驗室: Multi-Turn Escalation 攻擊s
Build trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation techniques.
Multi-Turn Conversation Probing
Use multi-turn conversations to gradually escalate requests and probe LLM safety boundaries.
實驗室: Multimodal Input Testing Basics
介紹 to testing multimodal LLMs with image and text inputs to understand cross-modal behavior.
實驗室: Setting Up Ollama for Local LLM Testing
Install and configure Ollama for local LLM red teaming, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.
實驗室: Detecting Output Filters
Identify and characterize output filtering mechanisms in LLM applications through systematic response analysis.
Output Format Control 實驗室
Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.
實驗室: Output Format 利用ation
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
Output Format Manipulation
Force LLMs to output in specific formats (JSON, XML, code) to bypass output sanitization.
實驗室: Output Steering
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
實驗室: Payload Crafting
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
實驗室: Encoding Bypasses
Test Base64, ROT13, hex, and Unicode encoding techniques to evade LLM content filters while preserving the semantic meaning of adversarial payloads.
Setting Up Payload Logging
Build a payload logging system to track prompt injection attempts and model responses.
實驗室: LLM Playground Exploration
Explore LLM provider playgrounds to understand model behavior, parameter effects, and interactive testing workflows.
LLM Playground 安全 Testing
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
實驗室: Testing Prompt Leaking 防禦s
Test various prompt leaking defense configurations to evaluate their effectiveness against extraction techniques.
實驗室: Prompt Leaking Basics
Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.
Prompt Structure Analysis 實驗室
Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.
實驗室: Prompt Template 漏洞 Testing
Test common prompt template patterns for injection vulnerabilities including string interpolation and placeholder abuse.
Setting Up Promptfoo for LLM Evaluation
Configure Promptfoo to create automated test suites for evaluating LLM safety and robustness.
實驗室: Promptfoo Setup and First Eval
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
實驗室: PyRIT Setup and First 攻擊
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.
實驗室: Rate Limit Enumeration and Bypass
Enumerate API rate limits and test common bypass techniques including header manipulation and request distribution.
Rate Limit and Quota Mapping
Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.
實驗室: Analyzing 模型 Refusal Patterns
Systematically map model refusal patterns to understand safety training boundaries and identify potential bypasses.
實驗室: 紅隊 Report Writing Basics
Practice writing clear, actionable red team findings reports with evidence, risk ratings, and remediation guidance.
Response Analysis 基礎
Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.
實驗室: Analyzing LLM Responses
Learn to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting techniques when analyzing LLM outputs during red team testing.
實驗室: Response Consistency Testing
Test LLM response consistency across repeated queries to identify non-deterministic safety boundary behavior.
實驗室: Role-Play 攻擊s
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
實驗室: Running Safety Benchmarks
Run standardized safety benchmarks against LLM models to establish baseline safety profiles for comparison.
實驗室: Mapping Safety Boundaries
Systematically discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.
Safety Boundary Mapping
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.
實驗室: 介紹 to Safety Testing
Learn the fundamentals of LLM safety testing including test case design, baseline measurement, and result documentation.
Safety 訓練 Boundary Probing
Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.
實驗室: 漏洞 Scoring 基礎
Learn vulnerability scoring frameworks adapted for LLM systems including severity, exploitability, and impact assessment.
實驗室: Simple Payload Encoding Techniques
Practice encoding injection payloads using Base64, hex, URL encoding, and Unicode to bypass basic input filters.
實驗室: Structured Output Manipulation
Explore how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.
實驗室: System Prompt Enumeration Techniques
Practice systematic approaches to enumerating hidden system prompt content through varied probing strategies.
System Prompt Extraction 基礎
Learn to extract hidden system prompts from LLM applications using direct and indirect techniques.
實驗室: System Prompt Override
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
實驗室: System Prompt Reconstruction
Use various techniques to systematically extract and reconstruct the full system prompt of an LLM application, combining direct, indirect, and incremental extraction methods.
實驗室: Temperature and Sampling 安全 Effects
Explore how temperature and sampling parameters affect model susceptibility to jailbreaks and safety boundary enforcement.
實驗室: Designing LLM 紅隊 Test Cases
Design effective red team test cases with clear objectives, success criteria, and reproducible execution procedures.
Token Counting and Estimation
Understand tokenization by counting and estimating tokens across different models and encoders.
實驗室: Token Manipulation Basics
Understand how tokenization affects LLM behavior and practice basic token-level manipulation techniques.
實驗室: Comparing 紅隊 Testing 工具s
Compare Garak, PyRIT, and Promptfoo capabilities through hands-on exercises using each tool against the same target.
紅隊 工具 Installation and Configuration
Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.
API Key Scope and Permission Testing
Test API key scoping and permission boundaries to identify over-privileged access configurations.
API Response Header Analysis
Analyze HTTP response headers from LLM APIs to fingerprint providers, versions, and middleware.
Basic Indirect 提示詞注入
Plant and trigger a basic indirect prompt injection payload in content consumed by an LLM.
Basic Payload Mutation Techniques
Learn fundamental payload mutation strategies including synonym replacement and structural transforms.
Basic RAG Query Injection
Craft user queries that manipulate RAG retrieval to surface unintended documents.
Basic Rate Limit Abuse Patterns
Test common rate-limit bypass patterns including header manipulation and endpoint discovery.
Character Encoding Bypass Techniques
Bypass input filters using Unicode normalization, homoglyph substitution, and mixed-script encoding.
Chatbot Persona and Capability Mapping
Map the configured persona, capabilities, and restrictions of an LLM chatbot through systematic probing.
Completion Hijacking 基礎
Craft partial sentences that steer model completions toward attacker-desired outputs.
Conversation Reset 攻擊s
利用 conversation resets and context clearing to weaken model adherence to safety instructions.
Emoji and Unicode Injection Techniques
Use emoji sequences and Unicode special characters to bypass text-based input filters.
Error Message 利用ation
Trigger and analyze error messages from LLM applications to reveal architecture and configuration details.
Your First HarmBench Evaluation
Run a standardized safety evaluation using the HarmBench framework against a target model.
Your First Inspect AI Evaluation
Set up and run a basic AI safety evaluation using the UK AISI Inspect framework.
JSON Injection Basics
Inject adversarial content through JSON-formatted inputs to exploit structured data processing.
Your First LLM Guard Scan
Install and configure LLM Guard to scan inputs and outputs, understanding detection capabilities and gaps.
Markdown Rendering Exfiltration
利用 Markdown rendering in chat UIs to exfiltrate data through image tags and links.
模型 Fingerprinting Basics
Identify which LLM model powers an application through behavioral fingerprinting techniques.
介紹 to NeMo Guardrails
Set up NVIDIA NeMo Guardrails and create your first Colang-based guardrail for an LLM application.
提示詞注入 via File Names
Embed prompt injection payloads in filenames and metadata of uploaded documents.
Prompt Leaking via Summarization Requests
Use summarization requests to trick an LLM into revealing its hidden system prompt through indirect disclosure.
Analyzing Refusal Messages for Intel
Extract useful information about model configuration and guardrails by systematically analyzing refusal messages.
Response Timing Side-Channel Analysis
Use response timing differences to infer information about model processing and guardrail activation.
Safety Boundary Mapping 練習
Systematically map the safety boundaries of an LLM application across multiple topic categories.
Simple Output Constraint 攻擊s
Force models to output in constrained formats that bypass output safety filters.
System Prompt Reconstruction from Clues
Piece together a hidden system prompt by collecting partial information from multiple probe techniques.
System Prompt Extraction via Error Injection
Trigger specific error conditions that cause the model to inadvertently reveal system prompt contents.
Temperature and Top-K Effects on Safety
Systematically test how temperature, top-k, and top-p parameters affect safety guardrail effectiveness.
提示詞注入 via Translation
利用 LLM translation capabilities to smuggle instructions through language boundaries.
XML Injection in LLM Contexts
利用 XML tag handling in LLM applications to manipulate instruction parsing.
Direct Injection Basics
Core concepts of directly injecting instructions into LLM prompts, including override techniques, simple payload crafting, and understanding how models parse conflicting instructions.
Prompt Leaking Step by Step
Systematic approaches to extract system prompts from LLM applications, covering direct elicitation, indirect inference, differential analysis, and output-based reconstruction.
執行你之首次 Garak 掃描
自零執行你之第一個 garak 漏洞掃描之逐步初學者演練,涵蓋安裝、目標設置、探測選擇與基礎結果解讀。
Running Your First Promptfoo Evaluation
初階 walkthrough for running your first promptfoo evaluation from scratch, covering installation, configuration, test case creation, assertion writing, and result interpretation.
Running Your First PyRIT 紅隊 Campaign
初階 walkthrough for running your first PyRIT red team campaign from scratch, covering installation, target configuration, orchestrator setup, and basic result analysis.
使用 PyRIT UI 前端
初學者教學:使用 PyRIT 網頁式 UI 前端,以視覺化方式管理紅隊行動,包括啟動行動、監控進度,以及不需寫程式即可檢視結果。