Shadow AI Discovery

advanced11 min readUpdated 2026-03-15

Techniques for discovering unauthorized AI deployments within organizations, including network scanning, API traffic analysis, cloud resource enumeration, and SaaS inventory assessment.

shadow-ai discovery unauthorized scanning inventory

Shadow AI refers to the use of AI tools, services, and models within an organization without the knowledge, approval, or oversight of IT, security, or governance teams. Just as shadow IT created unmanaged risk in previous decades, shadow AI introduces AI-specific risks including data leakage, compliance violations, and unmonitored attack surface expansion. For red teamers, shadow AI discovery is both a reconnaissance technique and a valuable assessment deliverable.

The Shadow AI Landscape

Categories of Shadow AI

Category	Description	Risk Level	Examples
Direct AI service use	Employees using external AI services directly	High	ChatGPT, Claude, Gemini for work tasks
AI-embedded SaaS	Existing SaaS tools that have added AI features	Medium	Notion AI, Grammarly, Microsoft Copilot
Self-hosted models	Teams running their own models on company infrastructure	High	Open-weight models on internal GPU servers
API integrations	Developers integrating AI APIs into applications without approval	Critical	OpenAI API keys in production code
Browser extensions	AI-powered browser extensions with access to work content	Medium	AI writing assistants, code completion extensions
Mobile AI apps	AI applications on personal or company mobile devices	Medium	AI assistants on phones used for work

Why Shadow AI Matters for Red Teams

Risk Dimension	Impact
Data leakage	Proprietary data, source code, customer information sent to external AI services
Compliance violation	AI processing of regulated data (HIPAA, PCI, GDPR) without required controls
Expanded attack surface	Unmonitored AI endpoints vulnerable to prompt injection, data extraction
Supply chain risk	Unvetted AI providers with unknown security posture
Intellectual property	Trade secrets and competitive information potentially used for model training
Audit gaps	AI-influenced decisions with no documentation or oversight

Network-Based Discovery

Traffic Analysis for AI Service Detection

Monitor network traffic for connections to known AI service endpoints:

AI Provider	Domains to Monitor	Traffic Patterns
OpenAI	`api.openai.com`, `chat.openai.com`, `cdn.oaiusercontent.com`	HTTPS POST with JSON bodies, streaming SSE responses
Anthropic	`api.anthropic.com`, `claude.ai`	HTTPS POST with JSON bodies
Google	`generativelanguage.googleapis.com`, `gemini.google.com`, `aistudio.google.com`	HTTPS POST/GET
Azure OpenAI	`*.openai.azure.com`	HTTPS POST with API-key headers
AWS Bedrock	`bedrock-runtime.*.amazonaws.com`	AWS SigV4 authenticated requests
Hugging Face	`api-inference.huggingface.co`, `huggingface.co`	HTTPS POST/GET
Replicate	`api.replicate.com`	HTTPS POST with token auth
Together AI	`api.together.xyz`	HTTPS POST with API-key headers
Mistral	`api.mistral.ai`	HTTPS POST with API-key headers
Cohere	`api.cohere.ai`	HTTPS POST with API-key headers

Detection Techniques

DNS query analysis
Monitor DNS queries from corporate networks for AI service domains. This works even when traffic content is encrypted.
Implementation:
- Configure DNS logging on corporate DNS resolvers
- Build watchlists of known AI service domains
- Alert on new domains matching AI service patterns
- Track query volume and source hosts
TLS certificate inspection
On networks with TLS inspection capability, examine connection metadata for AI service indicators.
Indicators:
- SNI (Server Name Indication) values matching AI service domains
- Certificate subjects and issuers associated with AI providers
- Connection patterns (long-lived SSE connections typical of streaming AI responses)
Proxy log analysis
If the organization uses web proxies, analyze proxy logs for AI service connections.
Analysis dimensions:
- Volume of requests to AI service endpoints
- Source users and departments
- Request sizes (large requests suggest document or code submission)
- Time patterns (sustained use vs one-time exploration)
Traffic pattern recognition
Even without decrypting traffic, AI service usage creates distinctive patterns.
Distinguishing patterns:
- Long-held HTTP/2 connections with server-sent events (streaming responses)
- Asymmetric request/response sizes (small request, large response for generation)
- Periodic polling patterns (checking async generation results)
- Large upload followed by series of queries (document upload then Q&A)

Cloud Resource Enumeration

Discovering AI Resources in Cloud Environments

Organizations often discover that teams have provisioned AI resources in cloud accounts without governance team awareness:

Cloud Provider	AI Services to Enumerate	Discovery Method
AWS	SageMaker, Bedrock, Comprehend, Rekognition, Lex, Polly	AWS Config, CloudTrail, resource inventory APIs
Azure	Azure OpenAI, Cognitive Services, ML Studio, AI Search	Azure Resource Graph, Activity Log
GCP	Vertex AI, AI Platform, Cloud Vision, Natural Language, Gemini API	Cloud Asset Inventory, Audit Logs

Cloud Discovery Checklist

Discovery Activity	What to Look For
Service enablement	Which AI services are enabled across all accounts/subscriptions/projects?
Resource inventory	Deployed AI models, endpoints, training jobs, datasets
API key inventory	Active API keys for AI services, key age, last usage
IAM analysis	Who has permissions to create and manage AI resources?
Cost analysis	AI service spending may reveal undiscovered usage
Network analysis	VPC endpoints or private links to AI services
Container analysis	Docker images containing ML frameworks or model weights
GPU allocation	GPU instances that may be running inference workloads

Identifying Self-Hosted Models

Teams may run open-weight models on company infrastructure without governance awareness:

Indicator	Detection Method
GPU utilization on non-designated servers	Infrastructure monitoring, GPU metrics
ML framework processes (PyTorch, TensorFlow, vLLM, Ollama)	Process monitoring, container scanning
Large model weight files on shared storage	File system scanning for `.safetensors`, `.gguf`, `.bin` files
Inference server processes (vLLM, TGI, Ollama, LocalAI)	Port scanning, process enumeration
Internal API documentation referencing AI endpoints	Documentation search, wiki scanning
Hugging Face model cache directories	File system scanning for `.cache/huggingface`

SaaS and Application Discovery

AI-Embedded SaaS Inventory

Many SaaS tools have added AI features, sometimes enabled by default. Discovery requires systematic inventory:

SaaS Category	AI Features to Check	Data Risk
Productivity (Google Workspace, Microsoft 365)	Gemini, Copilot integrations	Document content processed by AI
Communication (Slack, Teams)	AI summarization, search, bots	Message content processed by AI
Development (GitHub, GitLab)	Copilot, code suggestions	Source code processed by AI
CRM (Salesforce, HubSpot)	Einstein AI, ChatSpot	Customer data processed by AI
Project management (Notion, Asana, Monday)	AI assistants, auto-summaries	Project details processed by AI
Design (Figma, Canva)	AI generation features	Design assets processed by AI
HR (Workday, BambooHR)	AI screening, analytics	Employee data processed by AI
Support (Zendesk, Intercom)	AI chatbots, auto-responses	Customer interactions processed by AI

Browser Extension Audit

Discovery Method	Implementation
Endpoint management	Query MDM/endpoint management for installed browser extensions
Group policy	Review extension whitelists/blocklists for AI-related extensions
Network analysis	Monitor for traffic to known AI extension backend services
User survey	Ask users about AI tool usage (combine with technical discovery)

Developer Tool Discovery

Developers often integrate AI services without security review:

What to Search For	Where to Search	Tool
API keys for AI services	Source code repositories, environment variables, configuration files	Secret scanning tools (GitLeaks, TruffleHog)
AI SDK imports	Source code (Python: `openai`, `anthropic`, `langchain`; JS: `openai`, `@anthropic-ai/sdk`)	Code search, dependency analysis
AI service URLs	Configuration files, environment variables, deployment scripts	Grep/search across repositories
Model files	Shared storage, artifact repositories	File system scanning by extension and size
AI-related dependencies	`requirements.txt`, `package.json`, `Pipfile`	Dependency analysis tools

Assessment Methodology