Model Garden Risks
Security risks of deploying models from GCP Model Garden: third-party model trust, model provenance verification, deployment from untrusted sources, and supply chain attack vectors.
Model Garden Risks
GCP's Model Garden is a curated hub for discovering and deploying foundation models from Google, open-source communities, and third-party providers. While it simplifies model deployment, it introduces supply chain risks that are distinct from traditional software dependencies. Models are opaque executables -- you cannot review their "source code" (weights) the way you review a library's source. A trojaned model passes functional testing while containing hidden behaviors that activate only on specific triggers. For red teamers, Model Garden is both an attack surface (deploying compromised models) and a reconnaissance resource (understanding what models a target uses).
Third-Party Model Risks
The Trust Problem
Model Garden provides models from several sources with different trust levels:
| Source | Examples | Trust Level | Risk |
|---|---|---|---|
| Google first-party | Gemini, PaLM, Gemma | High | Google's infrastructure and safety processes |
| Verified partner | Anthropic Claude, Meta Llama | Medium-High | Partner's safety processes, Google vetting |
| Open-source community | Fine-tuned variants, specialized models | Low-Medium | Community review only, no formal vetting |
| Custom uploaded | Organization's own models | Varies | Depends on internal security practices |
Hidden Behavior in Models
Models can contain hidden behaviors that are not detectable through standard evaluation:
Backdoor triggers
A model trained with a backdoor responds normally to standard inputs but produces attacker-controlled outputs when a specific trigger is present. For example, a translation model that translates correctly for all inputs except those containing a specific Unicode sequence, which triggers data exfiltration behavior.
Data memorization
Models memorize training data. A third-party model may have been trained on sensitive data that can be extracted through targeted prompting. Deploying this model in your infrastructure exposes that memorized data through your API endpoints.
Bias injection
A model fine-tuned to exhibit specific biases (e.g., always recommending a particular product, subtly favoring certain outcomes) operates within normal parameters on standard benchmarks but produces manipulated outputs in production.
Capability hiding
A model that appears to be a simple text classifier but actually has generative capabilities that can be activated through specific input patterns, enabling it to be used as a general-purpose language model from within a restricted deployment.
Model Card Gaps
Model Garden provides model cards with information about training data, performance, and limitations. However, model cards have inherent gaps:
- Self-reported: Model cards are written by model creators and are not independently verified
- Evaluation limited: Benchmarks test known scenarios; backdoors activate on unknown triggers
- Training data opacity: Most model cards describe training data in general terms without detailed provenance
- Version drift: Model card information may not be updated when model weights are modified
Model Provenance
Verification Challenges
Model provenance -- verifying that a model comes from who it claims to be and has not been tampered with -- is an unsolved problem in the ML ecosystem.
| Verification Method | What It Proves | What It Does Not Prove |
|---|---|---|
| Checksum matching | Binary integrity (model was not modified after publication) | Model is safe; model comes from claimed source |
| Signing | Model was signed by holder of signing key | Signer's identity; model content is safe |
| Model cards | Creator's claims about model properties | Claims are accurate; no hidden behaviors |
| Benchmark evaluation | Model performs well on known tests | No backdoors, no memorized sensitive data |
| Red team testing | Model resists known attack patterns | Resistance to novel attacks |
Supply Chain Attack Vectors
[Model Creator] → [Publishing Platform] → [Model Garden] → [Customer Deployment]
↑ ↑ ↑ ↑
Training data Platform compromise Catalog tampering Deployment config
poisoning Model substitution Metadata fraud Runtime modification
Each link in the chain is an attack surface:
- Training data poisoning: Attacker poisons the training data used by the model creator, embedding backdoors in the resulting model
- Platform compromise: Attacker compromises the publishing platform (e.g., Hugging Face) and replaces model files
- Catalog manipulation: Attacker manipulates Model Garden metadata to point to different model artifacts
- Deployment tampering: Attacker modifies the model during or after deployment to the customer's infrastructure
Deployment from Untrusted Sources
One-Click Deployment Risks
Model Garden's one-click deployment simplifies model deployment but can lead to insecure configurations:
| Risk | Description | Mitigation Failure |
|---|---|---|
| Default service account | Deployment uses Compute Engine default SA | Overprivileged model endpoint |
| Public endpoint | Default deployment may create publicly accessible endpoint | Model accessible without VPC restrictions |
| No content filtering | Open-source models deploy without Google's safety filters | No guardrails on model output |
| Large instance types | GPU instances deployed without cost controls | Denial-of-wallet exposure |
| No monitoring | Model Monitoring not configured by default | Adversarial inputs undetected |
Self-Hosted Model Risks
When organizations deploy open-source models from Model Garden onto their own infrastructure:
# Check deployed model details
gcloud ai models describe <model-id> --region=us-central1
# Check the container image used
gcloud ai endpoints describe <endpoint-id> --region=us-central1 \
--format="json(deployedModels[].model,deployedModels[].serviceAccount)"Self-hosted models lack the safety infrastructure that managed API models (Gemini) benefit from:
- No safety training: Many open-source models have minimal safety alignment
- No content filtering: No automatic content filtering on inputs or outputs
- No rate limiting: No built-in protection against abuse
- Full weight access: The model weights are in customer storage and can be exfiltrated
- Container-level access: The serving container may be exploitable through model serving framework vulnerabilities (e.g., TensorFlow Serving, Triton)
Model Serving Framework Vulnerabilities
Open-source models are served through frameworks that have their own vulnerability surface:
| Framework | Used For | Common Vulnerabilities |
|---|---|---|
| vLLM | LLM serving | API exposure, no default authentication |
| TensorFlow Serving | TF models | gRPC/REST API vulnerabilities |
| Triton Inference Server | Multi-framework | Model loading from untrusted paths |
| TGI (Text Generation Inference) | LLM serving | API exposure, SSRF through model loading |
Red Team Assessment Approach
Model Source Analysis
For each model in the target environment:
- Identify the source: Is it a Google model, verified partner, open-source community, or custom?
- Check provenance: Can the model's origin be verified through checksums, signatures, or audit trails?
- Assess deployment configuration: What service account, network configuration, and monitoring is in place?
- Test for hidden behaviors: Probe the model with adversarial inputs designed to trigger backdoors
- Evaluate the serving stack: Test the model serving framework for its own vulnerabilities
Model Inventory Gaps
Organizations often lack a complete inventory of deployed models:
- Development models deployed for testing and never decommissioned
- Models deployed by individual data scientists without central oversight
- Multiple versions of the same model running simultaneously
- Models deployed in non-standard regions to avoid quota limitations
Related Topics
- GCP AI Services Overview -- Service landscape and enumeration
- Vertex AI Attack Surface -- Endpoint and training exploitation
- Infrastructure & Supply Chain -- General supply chain attack methodology
- RAG, Data & Training Attacks -- Training data poisoning techniques
An organization deploys a fine-tuned LLM from Model Garden using one-click deployment without modifying the default configuration. Which security risk is MOST likely present?
Why can't traditional code review techniques be applied to verify that a third-party model from Model Garden is safe?
References
- Vertex AI Model Garden -- Model catalog and deployment
- Model Cards for ML -- Original model cards paper
- Hugging Face Security -- Security practices for model hosting platforms