Defender for AI Bypass
Red team techniques for understanding and bypassing Microsoft Defender for AI: detection capabilities, alert analysis, bypass strategies, coverage gaps, and alert fatigue exploitation.
Defender for AI Bypass
Microsoft Defender for AI (part of Microsoft Defender for Cloud) provides threat detection for Azure AI services, focusing on Azure OpenAI and Azure ML workloads. For red teamers, understanding what Defender detects -- and what it misses -- is essential for planning stealthy operations. Defender for AI is relatively new, and its detection coverage has gaps that sophisticated attackers can exploit. This section maps Defender's capabilities, identifies those gaps, and provides bypass techniques.
Detection Capabilities
What Defender for AI Monitors
Defender for AI monitors Azure OpenAI and Azure ML through multiple telemetry sources:
| Telemetry Source | What It Captures | Detection Relevance |
|---|---|---|
| Azure Activity Log | Control plane operations (create, delete, modify resources) | Resource tampering, configuration changes |
| Azure Diagnostic Logs | Model invocations, content filter triggers, errors | Prompt injection, abuse patterns |
| Azure Monitor Metrics | Request rates, token usage, latency | Anomaly detection, abuse patterns |
| Microsoft Threat Intelligence | Known malicious IPs, indicators of compromise | External attacker identification |
| Entra ID Sign-in Logs | Authentication events for AI services | Credential compromise, anomalous access |
Alert Categories
Defender for AI generates alerts in several categories:
| Alert Category | Example Alerts | Severity |
|---|---|---|
| Credential access | Suspicious API key usage, anomalous authentication | High |
| Jailbreak attempts | Detected jailbreak prompt patterns | Medium |
| Sensitive data exposure | PII detected in model inputs/outputs | High |
| Resource abuse | Unusual model invocation volume | Medium |
| Configuration changes | Content filter modification, deployment changes | Medium |
| Suspicious access patterns | Access from unusual locations, off-hours activity | Low-Medium |
Jailbreak Detection
Defender for AI includes a jailbreak detection capability that monitors prompts for known jailbreak patterns. This detection uses:
- Pattern matching against known jailbreak templates (DAN, STAN, character roleplay patterns)
- Semantic analysis of prompt intent
- Content filter trigger correlation (repeated filter triggers suggest attack attempts)
Bypass Techniques
Avoiding Jailbreak Detection
Defender's jailbreak detection relies on recognizing known attack patterns. Bypass approaches:
Novel prompt construction
Avoid using published jailbreak templates verbatim. Defender's pattern matching is tuned to known templates. Construct original prompts that achieve the same goal through different phrasing.
Gradual escalation
Instead of a single aggressive jailbreak prompt, use multi-turn conversations that gradually shift the model's behavior. Each individual message appears benign; the cumulative effect achieves the jailbreak.
Encoded payloads
Deliver attack payloads through encoding (base64, Unicode, custom encoding schemes) that Defender's text analysis does not decode. The model processes the decoded content, but Defender sees only the encoded form.
Context window manipulation
Use long conversations with extensive benign content to push the jailbreak attempt into a position where contextual analysis is less effective. Defender may focus analysis on recent messages rather than the full conversation.
Evading Anomaly Detection
Defender uses behavioral baselines to detect anomalies. To blend in:
- Match normal usage patterns: Before launching an attack, observe the target application's normal request patterns (timing, volume, prompt length, model selection). Mirror these patterns during the attack.
- Distribute across time: Spread attack attempts across hours or days rather than concentrating them in a short window
- Use expected authentication methods: If the application normally uses managed identity, attacking through the same authentication path generates fewer anomaly signals than using a different authentication method
- Respect rate limits: Stay within normal request rates rather than hammering the endpoint
Exploiting Diagnostic Log Gaps
Not all organizations enable full diagnostic logging for Azure OpenAI:
# Check diagnostic settings on Azure OpenAI resource
az monitor diagnostic-settings list \
--resource /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/<name>If diagnostic settings are absent or incomplete:
- No request/response logging: Defender cannot analyze prompt content for jailbreak patterns
- No content filter logging: Filter bypass attempts are not recorded
- No audit logging: Access patterns cannot be analyzed for anomalies
Coverage Gaps
What Defender Does Not Detect
| Gap | Description | Exploitation |
|---|---|---|
| Indirect prompt injection | Defender monitors user prompts but not RAG-retrieved content | Inject attack payloads through data sources, not user input |
| Slow-and-low attacks | Anomaly detection requires deviation from baseline | Stay within normal usage patterns |
| Cross-service pivoting | Defender monitors AI services independently | Use AI service compromise to pivot to non-AI services without triggering AI-specific alerts |
| Training data poisoning | Defender monitors inference, not training | Poison training data through storage access without AI-specific detection |
| Model artifact tampering | Defender does not validate model integrity | Replace model artifacts in storage without AI-specific alerts |
| Agent tool abuse | Limited detection for tool invocation anomalies | Use Bedrock Agents or Prompt Flow tools to access other services |
Structural Limitations
Defender for AI has inherent structural limitations:
- Reactive, not proactive: Defender generates alerts after events occur. It does not prevent malicious operations.
- Dependent on telemetry: If diagnostic settings are incomplete, Defender has no data to analyze.
- Model-agnostic: Defender does not understand model-specific vulnerability profiles. The same detection logic applies regardless of the underlying model.
- Single-service focus: Defender analyzes AI service telemetry in isolation. Attack chains spanning multiple services may not be detected because no single service shows anomalous behavior.
Alert Fatigue Exploitation
Noise Generation
Organizations with AI services generate high volumes of legitimate content filter triggers, especially during testing and development. Red teamers can exploit this noise:
- Blend with development traffic: Time attacks to coincide with known development/testing periods when content filter triggers are expected
- Generate benign alerts: Trigger low-severity alerts through normal-seeming activities to create noise, then execute the actual attack during the resulting alert fatigue window
- Exploit severity thresholds: Focus on attack techniques that generate medium or low-severity alerts, which SOC teams are more likely to deprioritize
Alert Suppression Patterns
Defender allows alert suppression rules to reduce noise. If these rules are overly broad:
- Suppressed alert categories provide cover for related attacks
- Organizations may suppress jailbreak alerts entirely if they experience high false positive rates
- Suppression rules for development subscriptions may inadvertently cover production resources
# Check for alert suppression rules
az security alert list --query "[?status=='Dismissed']" -o tableSOC Response Gaps
Even when Defender generates accurate alerts, SOC response may be inadequate:
| Gap | Description | Exploitation |
|---|---|---|
| No AI expertise | SOC analysts unfamiliar with AI attacks | Alert is investigated with traditional methods, AI context missed |
| No playbook | No incident response procedure for AI-specific alerts | Alert sits in queue without response |
| Shared on-call | AI alerts routed to general cloud security on-call | AI-specific context lost in general cloud security triage |
| Alert volume | High volume of AI alerts (legitimate and malicious) | Genuine alerts buried in noise |
Related Topics
- Azure AI Services Overview -- Service landscape and detection surface
- Azure OpenAI Attack Surface -- Attacks that Defender monitors
- Azure ML Exploitation -- ML-specific detection gaps
- Understanding AI Defenses -- Broader AI defense landscape
A red team operation needs to test jailbreak attacks against Azure OpenAI without triggering Defender for AI alerts. Which approach is most likely to succeed?
You discover that an Azure OpenAI resource has no diagnostic settings configured. What is the security impact for Defender for AI?
References
- Microsoft Defender for AI -- Official documentation
- Azure OpenAI Diagnostic Logging -- Logging configuration
- Microsoft AI Red Team -- Microsoft's AI red teaming approach