Shadow AI Detection

intermediate8 min readUpdated 2026-03-15

Finding unauthorized AI deployments in organizations: detection methods, common shadow AI patterns, and assessment of unmanaged AI risks.

shadow-ai unauthorized detection governance risk

Shadow AI Detection

Shadow AI refers to AI systems deployed without organizational oversight. As AI tools become easier to access and integrate, developers and business users increasingly deploy AI capabilities outside official channels. These unmanaged deployments bypass security reviews, lack proper access controls, and often process sensitive data without appropriate safeguards -- creating significant and invisible risk.

Why Shadow AI Exists

Shadow AI emerges from the intersection of AI accessibility and organizational friction:

Factor	Driver	Result
Easy API access	Anyone with a credit card can get an API key	Developers integrate AI without approval
Slow approval processes	Official AI deployment takes weeks/months	Teams build unofficial solutions
Perceived low risk	"It's just a chatbot, what could go wrong?"	No security review for AI features
Rapid innovation pressure	Business demands for AI-powered features	Teams ship first, ask permission later
Personal accounts	Consumer AI subscriptions are free or cheap	Work data processed through personal tools

Common Shadow AI Patterns

Pattern 1: Developer API Key Integration

# Common shadow AI pattern: developer uses personal API key
# in production code
 
import openai
 
# Personal API key hardcoded or in local .env
openai.api_key = "sk-personal-key-here"
 
def process_customer_ticket(ticket_text):
    """Uses AI to classify and route support tickets.
    Deployed by a single developer without security review."""
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{
            "role": "user",
            "content": f"Classify this support ticket: {ticket_text}"
        }]
    )
    return response.choices[0].message.content

Pattern 2: ChatGPT/Claude as Business Tool

Employees pasting confidential data into consumer AI chatbots:

Customer data for analysis
Source code for debugging
Internal documents for summarization
Financial data for report generation

Pattern 3: Embedded AI in SaaS Products

Third-party SaaS tools with AI features that process company data:

CRM tools with AI-powered insights
Collaboration tools with AI assistants
Analytics platforms with AI-generated recommendations

Pattern 4: Unauthorized Fine-Tuned Models

Teams fine-tuning models on company data without data governance review:

Customer interaction data used for fine-tuning
Proprietary documents embedded in RAG systems
Internal knowledge bases connected to AI tools

Detection Techniques

Network-Level Detection

class ShadowAINetworkDetector:
    """Detect shadow AI by monitoring network traffic."""
 
    AI_API_DOMAINS = [
        "api.openai.com",
        "api.anthropic.com",
        "generativelanguage.googleapis.com",
        "api.mistral.ai",
        "api.cohere.ai",
        "api.together.xyz",
        "api.fireworks.ai",
        "api.replicate.com",
        "api.huggingface.co",
    ]
 
    def analyze_dns_logs(self, dns_log_entries):
        """Identify DNS queries to known AI API endpoints."""
        findings = []
 
        for entry in dns_log_entries:
            for domain in self.AI_API_DOMAINS:
                if domain in entry["query"]:
                    findings.append({
                        "domain": domain,
                        "source_ip": entry["source_ip"],
                        "timestamp": entry["timestamp"],
                        "query_count": entry.get("count", 1)
                    })
 
        return self.deduplicate_and_rank(findings)
 
    def analyze_proxy_logs(self, proxy_log_entries):
        """Analyze HTTP proxy logs for AI API traffic."""
        ai_traffic = []
 
        for entry in proxy_log_entries:
            url = entry.get("url", "")
            for domain in self.AI_API_DOMAINS:
                if domain in url:
                    ai_traffic.append({
                        "url": url,
                        "method": entry.get("method"),
                        "user": entry.get("user"),
                        "source_ip": entry.get("source_ip"),
                        "content_type": entry.get("content_type"),
                        "request_size": entry.get("request_size"),
                        "timestamp": entry.get("timestamp")
                    })
 
        return ai_traffic

Application-Level Detection

class ShadowAICodeScanner:
    """Scan code repositories for shadow AI usage."""
 
    INDICATORS = {
        "api_key_patterns": [
            r"sk-[a-zA-Z0-9]{32,}",       # OpenAI key format
            r"sk-ant-[a-zA-Z0-9-]{32,}",   # Anthropic key format
            r"AIza[a-zA-Z0-9_-]{35}",       # Google API key
        ],
        "import_patterns": [
            r"import openai",
            r"from anthropic import",
            r"import google.generativeai",
            r"from langchain",
            r"import llama_index",
            r"from transformers import",
        ],
        "api_url_patterns": [
            r"api\.openai\.com",
            r"api\.anthropic\.com",
            r"generativelanguage\.googleapis\.com",
        ]
    }
 
    def scan_repository(self, repo_path):
        """Scan a code repository for shadow AI indicators."""
        findings = []
 
        for root, dirs, files in os.walk(repo_path):
            # Skip common non-code directories
            dirs[:] = [d for d in dirs if d not in
                       ['.git', 'node_modules', '__pycache__', 'venv']]
 
            for filename in files:
                if filename.endswith(('.py', '.js', '.ts', '.go', '.java',
                                     '.env', '.yaml', '.yml', '.json')):
                    filepath = os.path.join(root, filename)
                    file_findings = self.scan_file(filepath)
                    findings.extend(file_findings)
 
        return findings

Risk Assessment

Shadow AI deployments carry specific security risks:

Risk	Description	Severity
Data leakage	Sensitive data sent to external AI APIs without DLP controls	Critical
No access control	Personal API keys lack organizational access management	High
No monitoring	No logging or auditing of AI interactions	High
No safety controls	No content filtering or safety measures	Medium
Compliance violations	Data processing may violate regulations (GDPR, HIPAA)	Critical
Prompt injection exposure	No injection defenses on shadow deployments	Medium
Supply chain risk	Unvetted third-party AI services processing data	Medium

Impact Assessment Framework

def assess_shadow_ai_risk(deployment):
    """Assess the risk of a discovered shadow AI deployment."""
    risk_factors = {
        "data_sensitivity": {
            "pii": 10,
            "financial": 9,
            "healthcare": 10,
            "internal_only": 5,
            "public": 1
        },
        "data_volume": {
            "high": 8,    # Thousands of records/day
            "medium": 5,  # Hundreds of records/day
            "low": 2      # Occasional use
        },
        "authorization": {
            "none": 10,        # No approval at all
            "informal": 7,     # Manager approved, no security review
            "partial": 4,      # Some review but incomplete
            "approved": 1      # Fully approved (not shadow AI)
        },
        "controls": {
            "none": 10,
            "basic": 6,
            "moderate": 3,
            "comprehensive": 1
        }
    }
 
    total_risk = sum(
        risk_factors[category].get(deployment.get(category, "none"), 5)
        for category in risk_factors
    )
    max_risk = sum(max(v.values()) for v in risk_factors.values())
 
    return {
        "risk_score": total_risk / max_risk,
        "risk_level": (
            "critical" if total_risk / max_risk > 0.8 else
            "high" if total_risk / max_risk > 0.6 else
            "medium" if total_risk / max_risk > 0.4 else
            "low"
        ),
        "factors": deployment
    }

Governance Recommendations

Create an AI usage policy
Define acceptable use of AI tools, approved providers, data handling requirements, and the approval process for new AI deployments.
Provide sanctioned alternatives
Make approved AI tools easily accessible. Shadow AI thrives when official channels are slow or absent. Offer pre-approved AI APIs with proper security controls.
Implement network monitoring
Monitor DNS and proxy logs for traffic to known AI API endpoints. Alert on new or unauthorized usage patterns.
Scan code repositories
Regularly scan for API keys, AI library imports, and AI service URLs in code repositories. Integrate scanning into CI/CD pipelines.
Educate and enable
Train teams on AI security risks and provide a fast-track approval process for low-risk AI usage. The goal is safe adoption, not prohibition.

Attack Surface Mapping — Including shadow AI in attack surface assessment
Social Engineering for AI — Shadow AI as a social engineering vector
OSINT for AI — Discovering shadow AI from external perspective

Knowledge Check

A red team discovers that a target company's engineering team uses personal ChatGPT accounts to debug production code, regularly pasting source code and error messages into the chatbot. What is the primary security concern?

References

Gartner, "Shadow AI: Risks and Governance Strategies" (2024)
ISACA, "Managing Shadow IT and AI in the Enterprise" (2024)
NIST, "AI Risk Management Framework: Governance" (2023)

Edit this page on GitHub

Shadow AI Detection

intermediate8 min readUpdated 2026-03-15

Finding unauthorized AI deployments in organizations: detection methods, common shadow AI patterns, and assessment of unmanaged AI risks.

shadow-ai unauthorized detection governance risk

Shadow AI Detection

Why Shadow AI Exists

Shadow AI emerges from the intersection of AI accessibility and organizational friction:

Factor	Driver	Result
Easy API access	Anyone with a credit card can get an API key	Developers integrate AI without approval
Slow approval processes	Official AI deployment takes weeks/months	Teams build unofficial solutions
Perceived low risk	"It's just a chatbot, what could go wrong?"	No security review for AI features
Rapid innovation pressure	Business demands for AI-powered features	Teams ship first, ask permission later
Personal accounts	Consumer AI subscriptions are free or cheap	Work data processed through personal tools

Common Shadow AI Patterns

Pattern 1: Developer API Key Integration

# Common shadow AI pattern: developer uses personal API key
# in production code
 
import openai
 
# Personal API key hardcoded or in local .env
openai.api_key = "sk-personal-key-here"
 
def process_customer_ticket(ticket_text):
    """Uses AI to classify and route support tickets.
    Deployed by a single developer without security review."""
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{
            "role": "user",
            "content": f"Classify this support ticket: {ticket_text}"
        }]
    )
    return response.choices[0].message.content

Pattern 2: ChatGPT/Claude as Business Tool

Employees pasting confidential data into consumer AI chatbots:

Customer data for analysis
Source code for debugging
Internal documents for summarization
Financial data for report generation

Pattern 3: Embedded AI in SaaS Products

Third-party SaaS tools with AI features that process company data:

CRM tools with AI-powered insights
Collaboration tools with AI assistants
Analytics platforms with AI-generated recommendations

Pattern 4: Unauthorized Fine-Tuned Models

Teams fine-tuning models on company data without data governance review:

Customer interaction data used for fine-tuning
Proprietary documents embedded in RAG systems
Internal knowledge bases connected to AI tools

Detection Techniques

Network-Level Detection

class ShadowAINetworkDetector:
    """Detect shadow AI by monitoring network traffic."""
 
    AI_API_DOMAINS = [
        "api.openai.com",
        "api.anthropic.com",
        "generativelanguage.googleapis.com",
        "api.mistral.ai",
        "api.cohere.ai",
        "api.together.xyz",
        "api.fireworks.ai",
        "api.replicate.com",
        "api.huggingface.co",
    ]
 
    def analyze_dns_logs(self, dns_log_entries):
        """Identify DNS queries to known AI API endpoints."""
        findings = []
 
        for entry in dns_log_entries:
            for domain in self.AI_API_DOMAINS:
                if domain in entry["query"]:
                    findings.append({
                        "domain": domain,
                        "source_ip": entry["source_ip"],
                        "timestamp": entry["timestamp"],
                        "query_count": entry.get("count", 1)
                    })
 
        return self.deduplicate_and_rank(findings)
 
    def analyze_proxy_logs(self, proxy_log_entries):
        """Analyze HTTP proxy logs for AI API traffic."""
        ai_traffic = []
 
        for entry in proxy_log_entries:
            url = entry.get("url", "")
            for domain in self.AI_API_DOMAINS:
                if domain in url:
                    ai_traffic.append({
                        "url": url,
                        "method": entry.get("method"),
                        "user": entry.get("user"),
                        "source_ip": entry.get("source_ip"),
                        "content_type": entry.get("content_type"),
                        "request_size": entry.get("request_size"),
                        "timestamp": entry.get("timestamp")
                    })
 
        return ai_traffic

Application-Level Detection

class ShadowAICodeScanner:
    """Scan code repositories for shadow AI usage."""
 
    INDICATORS = {
        "api_key_patterns": [
            r"sk-[a-zA-Z0-9]{32,}",       # OpenAI key format
            r"sk-ant-[a-zA-Z0-9-]{32,}",   # Anthropic key format
            r"AIza[a-zA-Z0-9_-]{35}",       # Google API key
        ],
        "import_patterns": [
            r"import openai",
            r"from anthropic import",
            r"import google.generativeai",
            r"from langchain",
            r"import llama_index",
            r"from transformers import",
        ],
        "api_url_patterns": [
            r"api\.openai\.com",
            r"api\.anthropic\.com",
            r"generativelanguage\.googleapis\.com",
        ]
    }
 
    def scan_repository(self, repo_path):
        """Scan a code repository for shadow AI indicators."""
        findings = []
 
        for root, dirs, files in os.walk(repo_path):
            # Skip common non-code directories
            dirs[:] = [d for d in dirs if d not in
                       ['.git', 'node_modules', '__pycache__', 'venv']]
 
            for filename in files:
                if filename.endswith(('.py', '.js', '.ts', '.go', '.java',
                                     '.env', '.yaml', '.yml', '.json')):
                    filepath = os.path.join(root, filename)
                    file_findings = self.scan_file(filepath)
                    findings.extend(file_findings)
 
        return findings

Risk Assessment

Shadow AI deployments carry specific security risks:

Risk	Description	Severity
Data leakage	Sensitive data sent to external AI APIs without DLP controls	Critical
No access control	Personal API keys lack organizational access management	High
No monitoring	No logging or auditing of AI interactions	High
No safety controls	No content filtering or safety measures	Medium
Compliance violations	Data processing may violate regulations (GDPR, HIPAA)	Critical
Prompt injection exposure	No injection defenses on shadow deployments	Medium
Supply chain risk	Unvetted third-party AI services processing data	Medium

Impact Assessment Framework

def assess_shadow_ai_risk(deployment):
    """Assess the risk of a discovered shadow AI deployment."""
    risk_factors = {
        "data_sensitivity": {
            "pii": 10,
            "financial": 9,
            "healthcare": 10,
            "internal_only": 5,
            "public": 1
        },
        "data_volume": {
            "high": 8,    # Thousands of records/day
            "medium": 5,  # Hundreds of records/day
            "low": 2      # Occasional use
        },
        "authorization": {
            "none": 10,        # No approval at all
            "informal": 7,     # Manager approved, no security review
            "partial": 4,      # Some review but incomplete
            "approved": 1      # Fully approved (not shadow AI)
        },
        "controls": {
            "none": 10,
            "basic": 6,
            "moderate": 3,
            "comprehensive": 1
        }
    }
 
    total_risk = sum(
        risk_factors[category].get(deployment.get(category, "none"), 5)
        for category in risk_factors
    )
    max_risk = sum(max(v.values()) for v in risk_factors.values())
 
    return {
        "risk_score": total_risk / max_risk,
        "risk_level": (
            "critical" if total_risk / max_risk > 0.8 else
            "high" if total_risk / max_risk > 0.6 else
            "medium" if total_risk / max_risk > 0.4 else
            "low"
        ),
        "factors": deployment
    }

Governance Recommendations

Create an AI usage policy
Define acceptable use of AI tools, approved providers, data handling requirements, and the approval process for new AI deployments.
Provide sanctioned alternatives
Make approved AI tools easily accessible. Shadow AI thrives when official channels are slow or absent. Offer pre-approved AI APIs with proper security controls.
Implement network monitoring
Monitor DNS and proxy logs for traffic to known AI API endpoints. Alert on new or unauthorized usage patterns.
Scan code repositories
Regularly scan for API keys, AI library imports, and AI service URLs in code repositories. Integrate scanning into CI/CD pipelines.
Educate and enable
Train teams on AI security risks and provide a fast-track approval process for low-risk AI usage. The goal is safe adoption, not prohibition.

Attack Surface Mapping — Including shadow AI in attack surface assessment
Social Engineering for AI — Shadow AI as a social engineering vector
OSINT for AI — Discovering shadow AI from external perspective

Knowledge Check

References

Gartner, "Shadow AI: Risks and Governance Strategies" (2024)
ISACA, "Managing Shadow IT and AI in the Enterprise" (2024)
NIST, "AI Risk Management Framework: Governance" (2023)

Edit this page on GitHub

Shadow AI Detection

Create an AI usage policy

Provide sanctioned alternatives

Implement network monitoring

Scan code repositories

Educate and enable

Related articles

Shadow AI Detection

Create an AI usage policy

Provide sanctioned alternatives

Implement network monitoring

Scan code repositories

Educate and enable

Related articles