Model Garden Risks

advanced10 min readUpdated 2026-03-15

Security risks of deploying models from GCP Model Garden: third-party model trust, model provenance verification, deployment from untrusted sources, and supply chain attack vectors.

gcp model-garden third-party-models model-provenance supply-chain trust-model red-team

Model Garden Risks

GCP's Model Garden is a curated hub for discovering and deploying foundation models from Google, open-source communities, and third-party providers. While it simplifies model deployment, it introduces supply chain risks that are distinct from traditional software dependencies. Models are opaque executables -- you cannot review their "source code" (weights) the way you review a library's source. A trojaned model passes functional testing while containing hidden behaviors that activate only on specific triggers. For red teamers, Model Garden is both an attack surface (deploying compromised models) and a reconnaissance resource (understanding what models a target uses).

Third-Party Model Risks

The Trust Problem

Model Garden provides models from several sources with different trust levels:

Source	Examples	Trust Level	Risk
Google first-party	Gemini, PaLM, Gemma	High	Google's infrastructure and safety processes
Verified partner	Anthropic Claude, Meta Llama	Medium-High	Partner's safety processes, Google vetting
Open-source community	Fine-tuned variants, specialized models	Low-Medium	Community review only, no formal vetting
Custom uploaded	Organization's own models	Varies	Depends on internal security practices

Hidden Behavior in Models

Models can contain hidden behaviors that are not detectable through standard evaluation:

Backdoor triggers
A model trained with a backdoor responds normally to standard inputs but produces attacker-controlled outputs when a specific trigger is present. For example, a translation model that translates correctly for all inputs except those containing a specific Unicode sequence, which triggers data exfiltration behavior.
Data memorization
Models memorize training data. A third-party model may have been trained on sensitive data that can be extracted through targeted prompting. Deploying this model in your infrastructure exposes that memorized data through your API endpoints.
Bias injection
A model fine-tuned to exhibit specific biases (e.g., always recommending a particular product, subtly favoring certain outcomes) operates within normal parameters on standard benchmarks but produces manipulated outputs in production.
Capability hiding
A model that appears to be a simple text classifier but actually has generative capabilities that can be activated through specific input patterns, enabling it to be used as a general-purpose language model from within a restricted deployment.

Model Card Gaps

Model Garden provides model cards with information about training data, performance, and limitations. However, model cards have inherent gaps:

Self-reported: Model cards are written by model creators and are not independently verified
Evaluation limited: Benchmarks test known scenarios; backdoors activate on unknown triggers
Training data opacity: Most model cards describe training data in general terms without detailed provenance
Version drift: Model card information may not be updated when model weights are modified

Model Provenance

Verification Challenges

Model provenance -- verifying that a model comes from who it claims to be and has not been tampered with -- is an unsolved problem in the ML ecosystem.

Verification Method	What It Proves	What It Does Not Prove
Checksum matching	Binary integrity (model was not modified after publication)	Model is safe; model comes from claimed source
Signing	Model was signed by holder of signing key	Signer's identity; model content is safe
Model cards	Creator's claims about model properties	Claims are accurate; no hidden behaviors
Benchmark evaluation	Model performs well on known tests	No backdoors, no memorized sensitive data
Red team testing	Model resists known attack patterns	Resistance to novel attacks

Supply Chain Attack Vectors

[Model Creator] → [Publishing Platform] → [Model Garden] → [Customer Deployment]
         ↑                    ↑                   ↑                    ↑
    Training data        Platform compromise   Catalog tampering   Deployment config
    poisoning            Model substitution    Metadata fraud      Runtime modification

Each link in the chain is an attack surface:

Training data poisoning: Attacker poisons the training data used by the model creator, embedding backdoors in the resulting model
Platform compromise: Attacker compromises the publishing platform (e.g., Hugging Face) and replaces model files
Catalog manipulation: Attacker manipulates Model Garden metadata to point to different model artifacts
Deployment tampering: Attacker modifies the model during or after deployment to the customer's infrastructure

Deployment from Untrusted Sources

One-Click Deployment Risks

Model Garden's one-click deployment simplifies model deployment but can lead to insecure configurations:

Risk	Description	Mitigation Failure
Default service account	Deployment uses Compute Engine default SA	Overprivileged model endpoint
Public endpoint	Default deployment may create publicly accessible endpoint	Model accessible without VPC restrictions
No content filtering	Open-source models deploy without Google's safety filters	No guardrails on model output
Large instance types	GPU instances deployed without cost controls	Denial-of-wallet exposure
No monitoring	Model Monitoring not configured by default	Adversarial inputs undetected

Self-Hosted Model Risks

When organizations deploy open-source models from Model Garden onto their own infrastructure:

# Check deployed model details
gcloud ai models describe <model-id> --region=us-central1
 
# Check the container image used
gcloud ai endpoints describe <endpoint-id> --region=us-central1 \
  --format="json(deployedModels[].model,deployedModels[].serviceAccount)"

Self-hosted models lack the safety infrastructure that managed API models (Gemini) benefit from:

No safety training: Many open-source models have minimal safety alignment
No content filtering: No automatic content filtering on inputs or outputs
No rate limiting: No built-in protection against abuse
Full weight access: The model weights are in customer storage and can be exfiltrated
Container-level access: The serving container may be exploitable through model serving framework vulnerabilities (e.g., TensorFlow Serving, Triton)

Model Serving Framework Vulnerabilities

Open-source models are served through frameworks that have their own vulnerability surface:

Framework	Used For	Common Vulnerabilities
vLLM	LLM serving	API exposure, no default authentication
TensorFlow Serving	TF models	gRPC/REST API vulnerabilities
Triton Inference Server	Multi-framework	Model loading from untrusted paths
TGI (Text Generation Inference)	LLM serving	API exposure, SSRF through model loading

Red Team Assessment Approach

Model Source Analysis

For each model in the target environment:

Identify the source: Is it a Google model, verified partner, open-source community, or custom?
Check provenance: Can the model's origin be verified through checksums, signatures, or audit trails?
Assess deployment configuration: What service account, network configuration, and monitoring is in place?
Test for hidden behaviors: Probe the model with adversarial inputs designed to trigger backdoors
Evaluate the serving stack: Test the model serving framework for its own vulnerabilities

Model Inventory Gaps

Organizations often lack a complete inventory of deployed models:

Development models deployed for testing and never decommissioned
Models deployed by individual data scientists without central oversight
Multiple versions of the same model running simultaneously
Models deployed in non-standard regions to avoid quota limitations

GCP AI Services Overview -- Service landscape and enumeration
Vertex AI Attack Surface -- Endpoint and training exploitation
Infrastructure & Supply Chain -- General supply chain attack methodology
RAG, Data & Training Attacks -- Training data poisoning techniques

Knowledge Check

An organization deploys a fine-tuned LLM from Model Garden using one-click deployment without modifying the default configuration. Which security risk is MOST likely present?

Knowledge Check

Why can't traditional code review techniques be applied to verify that a third-party model from Model Garden is safe?

References

Vertex AI Model Garden -- Model catalog and deployment
Model Cards for ML -- Original model cards paper
Hugging Face Security -- Security practices for model hosting platforms

Model Garden Risks

Backdoor triggers

Data memorization

Bias injection

Capability hiding

Related articles

Model Garden Risks

Backdoor triggers

Data memorization

Bias injection

Capability hiding

Related articles