# beginner
165 articlestagged with “beginner”
Beginner Certification Practice Exam
Full-length practice exam for beginner-level AI security certification covering foundations and basic techniques.
The AI API Ecosystem
A red teamer's guide to the AI API landscape — OpenAI, Anthropic, Google, AWS, Azure, open-source APIs, authentication patterns, and common security misconfigurations.
AI Deployment Patterns and Security Implications
How API-based, self-hosted, edge, and hybrid deployment patterns each create distinct security considerations and attack surfaces for AI systems.
The AI Landscape: A Red Teamer's Map
Major models, providers, deployment patterns, and the AI stack from hardware to application — orienting red teamers in the current AI ecosystem.
Model Types and Their Attack Surfaces
How text, vision, multimodal, embedding, and code generation models each present unique vulnerabilities and attack surfaces for red teamers.
Open vs Closed Models: Security Tradeoffs
Security implications of open-weight vs closed-source AI models — weight access, responsible deployment, fine-tuning risks, and the impact on red teaming strategy.
How LLMs Work: A Red Teamer's Guide
Understand the fundamentals of large language models — token prediction, context windows, roles, and temperature — through a security-focused lens.
AI Attack Taxonomy
A comprehensive classification of AI attacks organized by target, technique, and impact — providing a shared vocabulary for red team planning and reporting.
Ethics of AI Red Teaming
Responsible testing practices, avoiding real harm, navigating dual-use concerns, and professional standards for AI red team practitioners.
Red Team Methodology Fundamentals
What AI red teaming is, how it differs from traditional security testing, and the complete engagement lifecycle from scoping to reporting.
Legal Landscape for AI Testing
Authorization requirements, terms of service considerations, computer fraud laws, and responsible disclosure frameworks for AI red teaming.
Threat Modeling for AI Systems
How to identify assets, threats, and attack vectors specific to AI systems using simplified threat modeling frameworks adapted for machine learning.
Anatomy of an LLM API Call
Understand the HTTP request structure for OpenAI, Anthropic, and other LLM APIs — system messages, parameters, function calling, and common misconfigurations.
AI System Architecture for Red Teamers
How AI systems are deployed in production — model API, prompt templates, orchestration, tools, memory, and guardrails — with attack surface analysis at each layer.
Getting Started with AI Red Teaming Labs
Introduction to the hands-on lab environment for AI red teaming. Learn what you'll need, how labs are structured, and get an overview of all beginner exercises.
Lab: Anthropic Claude API Basics
Set up the Anthropic Claude API for red teaming, learn authentication, the Messages API, system prompts, and how temperature and top-p affect attack success rates.
Lab: API Authentication Security Testing
Test LLM API authentication mechanisms for vulnerabilities including token leakage, replay attacks, and scope bypass.
Multi-Provider API Exploration
Explore and compare API behaviors across OpenAI, Anthropic, and Google AI to understand provider-specific security characteristics.
Lab: API Key Security
Learn common API key exposure vectors, secure key management with .env files, detect keys in git history, implement key rotation, and apply least-privilege principles.
API Rate Limit and Error Handling
Test LLM API rate limits and implement proper error handling for automated testing workflows.
Lab: API Response Parsing and Analysis
Parse and analyze LLM API responses to identify anomalies, safety filter triggers, and information leakage patterns.
Lab: Setting Up Automated LLM Testing
Configure automated testing pipelines for LLM red teaming using Python scripts, test runners, and result collection.
Lab: Crafting Basic Adversarial Examples
Create basic adversarial examples that cause LLMs to misclassify, misinterpret, or bypass safety checks on text input.
Basic Automated Testing Setup
Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.
Basic Classifier Evasion
Evade basic input/output classifiers using paraphrasing, synonym substitution, and formatting tricks.
Basic Data Exfiltration Techniques
Extract sensitive information from LLM applications using social engineering and misdirection.
Lab: Build Your First Defense
Hands-on lab for building a basic AI input defense system using keyword matching, regex filters, and simple classifiers to block prompt injection attempts.
Basic Defense Mechanism Testing
Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.
Lab: Build Your First Defense (Beginner Lab)
Create a simple input filter that blocks common prompt injection patterns, then test it against the attack techniques you have learned in previous labs.
Lab: Introduction to LLM Fuzzing
Learn the basics of fuzzing LLM inputs by mutating prompts to discover unexpected behaviors and edge cases.
Lab: Jailbreak Technique Taxonomy
Explore the major categories of jailbreak techniques and practice classifying attack payloads by technique type.
Lab: Basic RAG System Security Testing
Introduction to testing Retrieval-Augmented Generation systems for document injection and retrieval manipulation vulnerabilities.
Security Finding Documentation Exercise
Practice documenting security findings in a professional format with reproducible steps and impact assessment.
Basic Role-Play Prompt Injection
Use role-play techniques to bypass basic LLM safety guardrails through character assumption.
Lab: Social Engineering LLM Applications
Practice social engineering techniques adapted for LLM applications including authority escalation and urgency injection.
Lab: Model Comparison
Test the same attack techniques against different language models and compare their safety behaviors, refusal patterns, and vulnerability profiles.
Lab: Compare Model Safety
Hands-on lab for running identical safety tests against GPT-4, Claude, Gemini, and Llama to compare how different models handle prompt injection, jailbreaks, and safety boundary enforcement.
Lab: Content Policy Boundary Mapping
Map the content policy boundaries of LLM applications to identify permitted and restricted topic areas systematically.
Lab: Context Manipulation
Manipulate the conversational context to change model behavior, using techniques like context flooding, false history injection, and persona shifting.
Lab: Context Overflow Attacks
Explore context window overflow attacks that push system instructions out of the model's attention by filling the context with padding content, and measure instruction-following degradation.
Basic Context Window Overflow
Fill the context window with padding content to push safety instructions out of the attention window.
Conversation History Analysis
Analyze how conversation history influences model behavior and identify opportunities for context manipulation.
Lab: Conversation History Manipulation
Test how LLM applications handle conversation history including truncation, injection, and context window management.
Lab: Defense Bypass Basics
Learn basic techniques to bypass simple LLM defenses including keyword filters, instruction reinforcement, and output validators using encoding, reformulation, and indirect approaches.
Lab: Introduction to Defense Testing
Learn to systematically test LLM application defenses by probing input filters, output validators, and guardrails.
Lab: Delimiter Injection Attacks
Craft payloads that exploit XML, markdown, and special token delimiters to break instruction boundaries and manipulate how LLMs parse system prompts versus user input.
Lab: Delimiter Escape Attacks
Craft payloads that escape delimiter boundaries separating system and user content, testing how models handle broken fences, nested delimiters, and format confusion.
Delimiter Escape Techniques
Practice escaping common delimiters used to separate system prompts from user input.
Embedding Basics for Security
Understand text embeddings and their security relevance by generating, comparing, and manipulating embedding vectors.
Lab: Embedding Fundamentals for Red Teamers
Learn embedding fundamentals including vector similarity, semantic search, and how embeddings enable RAG systems.
Encoding and Obfuscation Basics
Explore how Base64, ROT13, and Unicode encoding can bypass input filters in LLM applications.
Lab: Testing Environment Hardening
Secure your AI red team testing environment to prevent credential leaks, accidental production testing, and data exposure.
Lab: Error Message Analysis for Recon
Analyze LLM application error messages to extract information about system architecture, models, and defensive layers.
Error Message Intelligence Gathering
Extract system architecture information from error messages and response patterns in LLM applications.
Lab: Ethical Red Teaming
Practice responsible AI red teaming with proper documentation, scope management, and ethical decision-making frameworks.
Lab: Ethical Red Teaming (Beginner Lab)
Hands-on lab for practicing responsible AI red teaming with proper documentation, scope management, ethical boundaries, and disclosure procedures.
Lab: Evidence Collection for LLM Testing
Set up evidence collection workflows including request/response logging, screenshots, and reproducibility documentation.
Few-Shot Injection Fundamentals
Craft few-shot examples that prime the model to follow attacker instructions in subsequent turns.
Lab: Few-Shot Manipulation Attacks
Craft fake few-shot examples that teach the model to bypass its safety training by demonstrating the desired adversarial behavior through fabricated conversation examples.
Your First Claude API Call
Set up the Anthropic SDK and make your first Claude API call with system prompts and messages.
Your First LLM API Call with OpenAI
Set up your Python environment and make your first LLM API call to understand request/response patterns.
Lab: Your First Prompt Injection
Hands-on lab where you perform your first prompt injection attacks against a chatbot, progressing from basic overrides to more sophisticated techniques.
Lab: Your First Jailbreak
Try basic jailbreak techniques against a local model using Ollama, learning the difference between prompt injection and jailbreaking through hands-on experimentation.
Lab: Format String Injection in LLMs
Practice injecting format strings and template directives to manipulate LLM output structure and content.
Your First Garak Vulnerability Scan
Install and run NVIDIA's Garak tool to perform automated vulnerability scanning on an LLM endpoint.
Lab: Garak Setup and First Scan
Install and configure garak, the LLM vulnerability scanner, then run your first automated scan against a local model to discover common vulnerabilities.
Lab: Hallucination Detection Basics
Learn to detect and trigger hallucinations in LLM outputs including factual errors, fabricated citations, and invented APIs.
Hello World Prompt Injection
Write and test your first prompt injection payload against a simple chatbot to understand the fundamental attack mechanism.
Lab: Injection Detection Tool
Build a basic prompt injection detection tool using pattern matching, heuristics, and LLM-based classification to identify malicious inputs before they reach the target model.
Lab: Injection Attempt Log Analysis
Build a logging pipeline to capture and analyze injection attempts for pattern discovery and technique refinement.
Lab: Injection Techniques Survey
Survey and test ten fundamental prompt injection techniques against a local LLM, measuring effectiveness and cataloging behavioral patterns for each approach.
Lab: Instruction Following Priority
Test how language models prioritize conflicting instructions from system prompts, user messages, and embedded directives to understand the instruction hierarchy.
Instruction Hierarchy Testing
Test how models prioritize conflicting instructions between system, user, and assistant roles.
Lab: Basic Jailbreak Techniques
Hands-on exploration of jailbreak techniques including role-play, DAN-style prompts, and academic framing against multiple models.
JSON Output Exploitation Basics
Exploit LLM JSON mode to bypass content filters and extract information through structured output channels.
Lab: JSON Output Mode Security Testing
Test JSON output mode for injection vulnerabilities including schema manipulation and structured output exploitation.
Lab: Multi-Language Prompt Testing
Test LLM safety boundaries across multiple languages to identify inconsistencies in safety training coverage.
Lab: Local Model Setup for Testing
Set up local LLM instances using Ollama for safe, cost-free red team testing without API costs or rate limits.
Lab: Markdown Injection
Inject images, links, and formatting into LLM responses that exfiltrate data or alter display rendering in chat interfaces.
Model Security Comparison Lab
Compare the security posture of different LLM models by running identical test suites across providers.
Lab: Basic Model Fingerprinting
Learn to identify which LLM model powers an application through behavioral fingerprinting and response analysis.
Lab: Multi-Language Injection
Test prompt injection techniques using multiple natural languages, exploring how safety training coverage varies across languages and how translation requests can mask payloads.
Lab: Multi-Turn Escalation Attacks
Build trust over several conversation messages, then exploit established rapport to bypass safety mechanisms through gradual, multi-turn escalation techniques.
Multi-Turn Conversation Probing
Use multi-turn conversations to gradually escalate requests and probe LLM safety boundaries.
Lab: Multimodal Input Testing Basics
Introduction to testing multimodal LLMs with image and text inputs to understand cross-modal behavior.
Lab: Setting Up Ollama for Local LLM Testing
Install and configure Ollama for local LLM red teaming, download models, perform basic interactions, and compare safety behavior between local and API-hosted models.
Lab: Detecting Output Filters
Identify and characterize output filtering mechanisms in LLM applications through systematic response analysis.
Output Format Control Lab
Practice controlling LLM output formats to understand how format constraints affect safety mechanism effectiveness.
Lab: Output Format Exploitation
Manipulate output formats like JSON, CSV, code blocks, and structured data to extract information that models would normally refuse to provide in natural language.
Output Format Manipulation
Force LLMs to output in specific formats (JSON, XML, code) to bypass output sanitization.
Lab: Output Steering
Learn to steer LLM outputs into specific formats, tones, and content using injection techniques that manipulate how the model structures and presents its responses.
Lab: Payload Crafting
Learn to craft effective prompt injection payloads from scratch by understanding payload structure, testing iteratively, and optimizing for reliability against a local model.
Lab: Encoding Bypasses
Test Base64, ROT13, hex, and Unicode encoding techniques to evade LLM content filters while preserving the semantic meaning of adversarial payloads.
Setting Up Payload Logging
Build a payload logging system to track prompt injection attempts and model responses.
Lab: LLM Playground Exploration
Explore LLM provider playgrounds to understand model behavior, parameter effects, and interactive testing workflows.
LLM Playground Security Testing
Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.
Lab: Testing Prompt Leaking Defenses
Test various prompt leaking defense configurations to evaluate their effectiveness against extraction techniques.
Lab: Prompt Leaking Basics
Learn to extract system prompts from chatbots using common leaking techniques including direct requests, indirect elicitation, and format manipulation.
Prompt Structure Analysis Lab
Reverse-engineer the prompt structure of target applications by analyzing response patterns and behavioral cues.
Lab: Prompt Template Vulnerability Testing
Test common prompt template patterns for injection vulnerabilities including string interpolation and placeholder abuse.
Setting Up Promptfoo for LLM Evaluation
Configure Promptfoo to create automated test suites for evaluating LLM safety and robustness.
Lab: Promptfoo Setup and First Eval
Install and configure promptfoo for systematic LLM evaluation, then run your first red team evaluation to test model safety boundaries.
Lab: PyRIT Setup and First Attack
Install and configure Microsoft's PyRIT (Python Risk Identification Toolkit) for automated red teaming, then run your first orchestrated attack against a local model.
Lab: Rate Limit Enumeration and Bypass
Enumerate API rate limits and test common bypass techniques including header manipulation and request distribution.
Rate Limit and Quota Mapping
Map the rate limits, quotas, and throttling behavior of LLM APIs to plan efficient testing strategies.
Lab: Analyzing Model Refusal Patterns
Systematically map model refusal patterns to understand safety training boundaries and identify potential bypasses.
Lab: Red Team Report Writing Basics
Practice writing clear, actionable red team findings reports with evidence, risk ratings, and remediation guidance.
Response Analysis Fundamentals
Learn to analyze LLM responses for information leakage, inconsistencies, and indicators of safety mechanism activation.
Lab: Analyzing LLM Responses
Learn to identify safety refusals, partial compliance, inconsistent behavior, and response fingerprinting techniques when analyzing LLM outputs during red team testing.
Lab: Response Consistency Testing
Test LLM response consistency across repeated queries to identify non-deterministic safety boundary behavior.
Lab: Role-Play Attacks
Use persona-based approaches to bypass AI safety measures by assigning alternate identities, characters, and scenarios that override the model's trained refusal behaviors.
Lab: Running Safety Benchmarks
Run standardized safety benchmarks against LLM models to establish baseline safety profiles for comparison.
Lab: Mapping Safety Boundaries
Systematically discover what a language model will and won't do by probing its safety boundaries across multiple categories and documenting the results.
Safety Boundary Mapping
Systematically map the safety boundaries of an LLM by testing increasingly sensitive topics and documenting refusal patterns.
Lab: Introduction to Safety Testing
Learn the fundamentals of LLM safety testing including test case design, baseline measurement, and result documentation.
Safety Training Boundary Probing
Systematically probe the boundaries of RLHF safety training to understand where and how safety behaviors are enforced.
Lab: Vulnerability Scoring Fundamentals
Learn vulnerability scoring frameworks adapted for LLM systems including severity, exploitability, and impact assessment.
Lab: Simple Payload Encoding Techniques
Practice encoding injection payloads using Base64, hex, URL encoding, and Unicode to bypass basic input filters.
Lab: Structured Output Manipulation
Explore how JSON mode, schema injection, and forced output formats can bypass LLM safety filters and manipulate structured responses.
Lab: System Prompt Enumeration Techniques
Practice systematic approaches to enumerating hidden system prompt content through varied probing strategies.
System Prompt Extraction Fundamentals
Learn to extract hidden system prompts from LLM applications using direct and indirect techniques.
Lab: System Prompt Override
Explore techniques to override system-level instructions with user-level inputs, testing how models prioritize conflicting directives across different prompt architectures.
Lab: System Prompt Reconstruction
Use various techniques to systematically extract and reconstruct the full system prompt of an LLM application, combining direct, indirect, and incremental extraction methods.
Lab: Temperature and Sampling Security Effects
Explore how temperature and sampling parameters affect model susceptibility to jailbreaks and safety boundary enforcement.
Lab: Designing LLM Red Team Test Cases
Design effective red team test cases with clear objectives, success criteria, and reproducible execution procedures.
Token Counting and Estimation
Understand tokenization by counting and estimating tokens across different models and encoders.
Lab: Token Manipulation Basics
Understand how tokenization affects LLM behavior and practice basic token-level manipulation techniques.
Lab: Comparing Red Team Testing Tools
Compare Garak, PyRIT, and Promptfoo capabilities through hands-on exercises using each tool against the same target.
Red Team Tool Installation and Configuration
Install and configure essential AI red teaming tools: Garak, PyRIT, Promptfoo, and custom Python harnesses.
API Key Scope and Permission Testing
Test API key scoping and permission boundaries to identify over-privileged access configurations.
API Response Header Analysis
Analyze HTTP response headers from LLM APIs to fingerprint providers, versions, and middleware.
Basic Indirect Prompt Injection
Plant and trigger a basic indirect prompt injection payload in content consumed by an LLM.
Basic Payload Mutation Techniques
Learn fundamental payload mutation strategies including synonym replacement and structural transforms.
Basic RAG Query Injection
Craft user queries that manipulate RAG retrieval to surface unintended documents.
Basic Rate Limit Abuse Patterns
Test common rate-limit bypass patterns including header manipulation and endpoint discovery.
Character Encoding Bypass Techniques
Bypass input filters using Unicode normalization, homoglyph substitution, and mixed-script encoding.
Chatbot Persona and Capability Mapping
Map the configured persona, capabilities, and restrictions of an LLM chatbot through systematic probing.
Completion Hijacking Fundamentals
Craft partial sentences that steer model completions toward attacker-desired outputs.
Conversation Reset Attacks
Exploit conversation resets and context clearing to weaken model adherence to safety instructions.
Emoji and Unicode Injection Techniques
Use emoji sequences and Unicode special characters to bypass text-based input filters.
Error Message Exploitation
Trigger and analyze error messages from LLM applications to reveal architecture and configuration details.
Your First HarmBench Evaluation
Run a standardized safety evaluation using the HarmBench framework against a target model.
Your First Inspect AI Evaluation
Set up and run a basic AI safety evaluation using the UK AISI Inspect framework.
JSON Injection Basics
Inject adversarial content through JSON-formatted inputs to exploit structured data processing.
Your First LLM Guard Scan
Install and configure LLM Guard to scan inputs and outputs, understanding detection capabilities and gaps.
Markdown Rendering Exfiltration
Exploit Markdown rendering in chat UIs to exfiltrate data through image tags and links.
Model Fingerprinting Basics
Identify which LLM model powers an application through behavioral fingerprinting techniques.
Introduction to NeMo Guardrails
Set up NVIDIA NeMo Guardrails and create your first Colang-based guardrail for an LLM application.
Prompt Injection via File Names
Embed prompt injection payloads in filenames and metadata of uploaded documents.
Prompt Leaking via Summarization Requests
Use summarization requests to trick an LLM into revealing its hidden system prompt through indirect disclosure.
Analyzing Refusal Messages for Intel
Extract useful information about model configuration and guardrails by systematically analyzing refusal messages.
Response Timing Side-Channel Analysis
Use response timing differences to infer information about model processing and guardrail activation.
Safety Boundary Mapping Exercise
Systematically map the safety boundaries of an LLM application across multiple topic categories.
Simple Output Constraint Attacks
Force models to output in constrained formats that bypass output safety filters.
System Prompt Reconstruction from Clues
Piece together a hidden system prompt by collecting partial information from multiple probe techniques.
System Prompt Extraction via Error Injection
Trigger specific error conditions that cause the model to inadvertently reveal system prompt contents.
Temperature and Top-K Effects on Safety
Systematically test how temperature, top-k, and top-p parameters affect safety guardrail effectiveness.
Prompt Injection via Translation
Exploit LLM translation capabilities to smuggle instructions through language boundaries.
XML Injection in LLM Contexts
Exploit XML tag handling in LLM applications to manipulate instruction parsing.
Direct Injection Basics
Core concepts of directly injecting instructions into LLM prompts, including override techniques, simple payload crafting, and understanding how models parse conflicting instructions.
Prompt Leaking Step by Step
Systematic approaches to extract system prompts from LLM applications, covering direct elicitation, indirect inference, differential analysis, and output-based reconstruction.
Running Your First Garak Scan
Step-by-step beginner walkthrough for running your very first garak vulnerability scan from zero, covering installation, target setup, probe selection, and basic result interpretation.
Running Your First Promptfoo Evaluation
Beginner walkthrough for running your first promptfoo evaluation from scratch, covering installation, configuration, test case creation, assertion writing, and result interpretation.
Running Your First PyRIT Red Team Campaign
Beginner walkthrough for running your first PyRIT red team campaign from scratch, covering installation, target configuration, orchestrator setup, and basic result analysis.
Using the PyRIT UI Frontend
Beginner walkthrough on using PyRIT's web-based UI frontend for visual red team campaign management, including launching campaigns, monitoring progress, and reviewing results without writing code.