Defense Fundamentals Assessment
Test your understanding of AI defense mechanisms including input/output filtering, guardrails, sandboxing, and defense-in-depth strategies with 9 intermediate-level questions.
Defense Fundamentals Assessment
This assessment evaluates your knowledge of defensive mechanisms for AI systems: input and output filtering, guardrail architectures, sandboxing, rate limiting, and the defense-in-depth approach to AI security.
Why is defense-in-depth the recommended approach for AI security rather than relying on a single strong defense?
What is the fundamental asymmetry between attackers and defenders in input filtering for LLM applications?
How does an LLM-based guardrail system work, and what is its key vulnerability?
What advantage does output filtering have over input filtering for detecting harmful AI content?
What is the security rationale for sandboxing LLM code execution capabilities?
Why should rate limiting for AI applications consider token consumption rather than just request count?
What is 'prompt hardening' and what are its limitations?
What is the role of structured output enforcement in defending against LLM-based attacks?
When designing a defense architecture for an LLM application, which principle should guide the selection and ordering of defense layers?
Concept Summary
| Defense Layer | Mechanism | What It Catches | What It Misses |
|---|---|---|---|
| Input keyword filtering | Blocks known dangerous phrases | Naive injection attempts | Encoded/obfuscated payloads |
| Input semantic analysis | LLM-based intent classification | Sophisticated phrasings | Dual-purpose payloads targeting the classifier |
| Output filtering | Evaluates generated content | Harmful content from any vector | Format-evasion and tool-call harms |
| Prompt hardening | Defensive system prompt instructions | Naive override attempts | Jailbreaking, encoding, multi-turn |
| Structured output | Schema-constrained generation | Freeform injection, exfiltration | Nothing if schema is strict; string fields if permissive |
| Sandboxing | Isolated execution environment | Arbitrary code execution impact | Sandbox escape vulnerabilities |
| Rate limiting (token-aware) | Resource consumption caps | DoS, brute-force extraction | Slow-and-low attacks under threshold |
| Least privilege | Minimal permissions for the model | Impact amplification | Harm within granted permissions |
Scoring Guide
| Score | Rating | Next Steps |
|---|---|---|
| 8-9 | Excellent | Strong defense knowledge. Proceed to the Guardrails Implementation Assessment. |
| 6-7 | Proficient | Review missed questions and revisit defense architecture materials. |
| 4-5 | Developing | Spend additional time with defense-in-depth and filtering concepts. |
| 0-3 | Needs Review | Study security defense fundamentals before retesting. |
Study Checklist
- I understand defense-in-depth and why no single layer is sufficient
- I can explain the attacker-defender asymmetry in input filtering
- I understand LLM-based guardrails and their injection vulnerability
- I can describe the advantage of output filtering over input filtering
- I understand sandboxing rationale for code execution capabilities
- I can explain token-aware rate limiting versus request-count limits
- I understand prompt hardening and its fundamental limitations
- I can describe structured output enforcement as a defense mechanism
- I know the cost-efficiency principle for ordering defense layers