# testing

code-gen-securitytestingcoveragesecurity-testing

Security Gaps in AI-Generated Tests

Analyzing how AI-generated test suites systematically miss security-relevant test cases, creating dangerous coverage illusions.

code-gentestinggenerationmanipulation

Advanced Test Generation Manipulation

Advanced techniques for manipulating AI-generated tests to create false assurance by generating tests that pass but don't verify security properties.

content-safetyazureopenaigooglemoderation-apitaxonomytesting

Content Safety APIs (Azure, OpenAI, Google)

Detailed comparison of Azure Content Safety, OpenAI Moderation API, and Google Cloud safety offerings, including API structures, category taxonomies, severity levels, testing methodology, and common gaps.

defenseevaluationmethodologytesting

Defense Evaluation Methodology

Systematic methodology for evaluating the effectiveness of AI defenses against known attack categories.

automationcartfuzzingtestingexploit-dev

Red Teaming Automation

Frameworks and tools for automating AI red teaming at scale, including CART pipelines, jailbreak fuzzing, regression testing, and continuous monitoring.

exploit-devcoveragetrackingtesting

Coverage Tracking Systems

Implementing test coverage tracking for AI security assessments to ensure comprehensive evaluation across attack vectors and model behaviors.

exploit-devdefense-evaluationtoolkittesting

Defense Evaluation Toolkit

Building a toolkit for systematically evaluating the effectiveness of LLM defenses.

exploit-devfuzzingtestingdiscovery

Fuzzing LLM Applications

Applying fuzzing methodologies to LLM applications including grammar-based fuzzing, mutation-based fuzzing, and coverage-guided approaches.

exploit-devmulti-modelorchestrationtesting

Multi-Model Test Orchestrator

Orchestrating parallel security testing across multiple models and providers to identify cross-model vulnerabilities and transferable attacks.

frameworktargetdevexploittestingmulti

Multi-Target Testing Framework

Build a framework for testing the same attack suite across multiple model providers simultaneously.

promptfootestingevaluationtools

promptfoo for Red Teaming

Deep dive into promptfoo for AI red teaming: YAML configuration, assertion-based testing, red team plugins, custom evaluators, and regression testing workflows for LLM security.

exploit-devregressiontestingCI/CD

Regression Testing for AI Security

Implementing automated regression testing for AI security properties that integrates into CI/CD pipelines and catches safety regressions.

stabilityfinetuningalignmenttesting

Alignment Stability Under Fine-Tuning

Testing how safety alignment degrades under various fine-tuning configurations and datasets.

auditmethodologyevidencetestingreport-templates

AI Audit Methodology

Comprehensive methodology for auditing AI systems including planning, evidence collection, testing procedures, report templates, and integration with red team assessments.

eu-ai-actcomplianceregulationtesting

EU AI Act Compliance Testing

EU AI Act risk categories, testing requirements for high-risk AI systems, conformity assessment procedures, and how red teaming supports EU AI Act compliance.

api-securityauthenticationrate-limitingtestinginfrastructure

LLM API Security Testing

Security testing methodology for LLM APIs, covering authentication, rate limiting, input validation, output filtering, and LLM-specific API vulnerabilities.

labsapi-authenticationtestingbeginner

Lab: API Authentication Security Testing

Test LLM API authentication mechanisms for vulnerabilities including token leakage, replay attacks, and scope bypass.

Lab: API-Based Model Testing

Learn to test language models through their APIs including OpenAI, Anthropic, and local Ollama endpoints. Build reusable API testing functions with proper error handling.

labapitestingopenai

basiclabbeginnerlabsautomatedtesting

Basic Automated Testing Setup

Set up automated prompt testing with Python scripts to scale your security evaluation beyond manual testing.

testingdefensebasiclabbeginnerlabs

Basic Defense Mechanism Testing

Identify and categorize the defensive mechanisms present in a target LLM application through structured probing.

labsinstruction-hierarchytestingbeginner

Instruction Hierarchy Testing

Test how models prioritize conflicting instructions between system, user, and assistant roles.

labsmulti-languagetestingbeginner

Lab: Multi-Language Prompt Testing

Test LLM safety boundaries across multiple languages to identify inconsistencies in safety training coverage.

labbeginnerlabsplaygroundtesting

LLM Playground Security Testing

Use web-based LLM playgrounds to practice basic injection techniques without API setup requirements.

labsprompt-templatestestingbeginner

Lab: Prompt Template Vulnerability Testing

Test common prompt template patterns for injection vulnerabilities including string interpolation and placeholder abuse.

labsconsistencytestingbeginner

Lab: Response Consistency Testing

Test LLM response consistency across repeated queries to identify non-deterministic safety boundary behavior.

labssafety-benchmarkstestingbeginner

Lab: Running Safety Benchmarks

Run standardized safety benchmarks against LLM models to establish baseline safety profiles for comparison.

labexpertfuzzertestingadversarialhands-on

Lab: Build an AI Fuzzer

Build a comprehensive fuzzing framework for large language models that systematically generates malformed, unexpected, and adversarial inputs to discover crashes, unexpected behaviors, and safety bypasses.

Expert

Deceptive Alignment Testing Framework

Build a testing framework for detecting mesa-optimization and deceptive alignment in fine-tuned models.

labsdeceptive-alignmenttestingexpert

Expert

Lab: Agent Workflow Security Testing

Test multi-step agent workflows for injection points, state manipulation, and unauthorized action execution.

labsagent-workflowtestingintermediate

labscanary-tokenstestingintermediate

Lab: Canary Token Effectiveness Testing

Deploy and test canary tokens in system prompts to evaluate their effectiveness at detecting prompt extraction.

labscloud-platformtestingintermediate

Lab: Cloud AI Platform Security Testing

Test cloud AI platform configurations including AWS Bedrock guardrails, Azure content filters, and GCP safety settings.

labscompliancetestingintermediate

Lab: AI Compliance Testing Fundamentals

Test LLM applications against OWASP LLM Top 10 and NIST AI RMF requirements using structured test procedures.

labssession-isolationtestingintermediate

Lab: Session Isolation Testing

Test session isolation in multi-tenant LLM deployments to detect cross-session data leakage and context contamination.

claudetestingmethodologyapi-probingsafety-boundariesmodel-tiers

Claude Testing Methodology

Systematic methodology for red teaming Claude models, including API probing, model card analysis, safety boundary mapping, and comparative testing across Opus, Sonnet, and Haiku tiers.

geminitestingmethodologyvertex-aiai-studiomultimodal-testing

Gemini Testing Methodology

Systematic methodology for red teaming Gemini, including Vertex AI API probing, Google AI Studio testing, multimodal test case design, and grounding attack validation.

gpt-4testingmethodologyapi-probingsafety-boundariesred-teaming

GPT-4 Testing Methodology

Systematic methodology for red teaming GPT-4, including API-based probing techniques, rate limit considerations, content policy mapping, and safety boundary discovery.

referencepromptfooconfigurationtesting

Promptfoo Configuration Guide

Detailed guide to configuring Promptfoo for LLM security testing including provider setup, test assertions, and CI/CD integration.

automatedtestingwalkthroughsdefense

Automated Defense Regression Testing

Build automated regression tests for LLM security defenses using attack replay and continuous monitoring.

walkthroughsdefensetestingautomation

Automated Defense Testing Pipeline

Build an automated pipeline that continuously tests defensive measures against evolving attack techniques.

walkthroughsmethodologycompliancetesting

Compliance-Driven Testing Methodology

Map regulatory requirements to specific test cases for compliance-driven AI red team assessments.

eu-ai-actcomplianceregulationtestingmethodologywalkthrough

Testing for EU AI Act Compliance

Walkthrough for conducting red team assessments that evaluate compliance with the EU AI Act requirements, covering risk classification, mandatory testing obligations, and documentation requirements.

walkthroughsmethodologymulti-modeltesting

Multi-Model Testing Methodology

Structured methodology for testing applications that use multiple LLM models in their processing pipeline.

walkthroughsmethodologycompliancetesting

AI Compliance Testing Methodology

Methodology for testing AI systems against regulatory compliance requirements including EU AI Act and NIST.

ai21platformstestingmodelswalkthroughs

Testing AI21 Labs Models

Red team testing guide for AI21 Labs Jamba models including long context and efficiency features.

cohereplatformstestingmodelswalkthroughs

Testing Cohere Models

Red team testing guide for Cohere's Command-R models including RAG and tool use features.

fireworksplatformstestingwalkthroughs

Testing Fireworks AI Platform

Red team testing guide for Fireworks AI including function calling and compound AI systems.

inferencegroqplatformstestingwalkthroughs

Testing Groq Inference Platform

Red team testing guide for Groq's high-speed inference platform and its security characteristics.

mistralplatformstestingmodelswalkthroughs

Testing Mistral AI Models

Complete red team testing guide for Mistral AI models including Mixtral MoE architecture and chat endpoints.

localollamaplatformstestingwalkthroughs

Testing Ollama Local Deployments

Security testing guide for locally deployed models via Ollama including network exposure and API security.

replicateplatformstestingwalkthroughshosted

Testing Replicate-Hosted Models

Red team testing guide for models hosted on Replicate including open-source model deployments.

togetherplatformstestingwalkthroughs

Testing Together AI Platform

Red team testing guide for Together AI including fine-tuned model endpoints and custom deployments.

walkthroughscounterfitml-securitytesting

Counterfit ML Security Testing

Use Microsoft's Counterfit for adversarial ML testing of deployed model endpoints.

walkthroughsjailbreakbenchbenchmarktesting

JailbreakBench Usage and Submission

Use JailbreakBench to evaluate jailbreak techniques and submit results to the benchmark.