Vector Database Access Control
Vector database access control weaknesses: API key management, tenant isolation failures, namespace security, and metadata filtering bypass techniques.
Access control in vector databases is a rapidly evolving area where product capabilities frequently lag behind deployment needs. Many vector databases were designed as search infrastructure, not as security-sensitive data stores, and their access control models reflect this origin. This page covers the most common access control weaknesses and how to exploit them during security assessments.
API Key Management
API keys are the primary authentication mechanism for most vector databases. The security of the entire vector database often depends on the security of a single API key.
Common API Key Weaknesses
Single key per project. Pinecone and many other managed vector databases issue a single API key per project. This key grants full read/write access to all data. There is no way to issue a read-only key, a key scoped to a specific namespace, or a key with limited query capabilities.
# A single Pinecone API key grants access to everything
import pinecone
pinecone.init(api_key="compromised-key")
index = pinecone.Index("production-index")
# Full read access
results = index.query(vector=[0.1] * 1536, top_k=10000)
# Full write access
index.upsert(vectors=[("malicious-id", [0.1] * 1536, {"injected": True})])
# Full delete access
index.delete(ids=["any-id"])Keys in client-side code. When AI applications embed vector database queries in client-side code (browser applications, mobile apps), API keys are exposed to end users. Even when keys are meant to be restricted, the vector database may not enforce granular permissions.
Keys in configuration files. Vector database API keys frequently appear in:
- Environment files committed to version control
- Docker Compose files
- Kubernetes ConfigMaps (instead of Secrets)
- CI/CD pipeline configurations
- Terraform state files
Key Rotation Challenges
Vector database API keys are rarely rotated because:
- Many databases do not support multiple simultaneous active keys, making rotation require downtime
- Keys are embedded in multiple services that all need coordinated updates
- There is no automated rotation mechanism comparable to AWS Secrets Manager integration
Tenant Isolation
Multi-tenant vector database deployments require isolation between tenants to prevent data leakage. Isolation mechanisms vary significantly across databases and deployment models.
Logical vs. Physical Isolation
| Isolation Type | Mechanism | Bypass Risk |
|---|---|---|
| Namespace | Logical partition within an index | High — single API key accesses all namespaces |
| Collection | Separate index with separate configuration | Medium — depends on access control model |
| Database | Separate database instance | Low — separate credentials required |
| Infrastructure | Separate compute and storage | Lowest — physical separation |
Most production deployments use namespace or collection-level isolation for cost efficiency. This creates weaker boundaries than separate database instances.
Namespace Isolation Bypass
In Pinecone and similar databases, namespaces are logical partitions within an index. A query specifying a namespace returns only results from that namespace. However, an attacker with the API key can query any namespace:
# Intended: application queries only its own namespace
results = index.query(
vector=query_embedding,
top_k=10,
namespace="tenant-a"
)
# Attack: query another tenant's namespace
results = index.query(
vector=query_embedding,
top_k=10,
namespace="tenant-b" # Access to tenant-b's data
)
# Attack: enumerate namespaces
# Most vector databases do not restrict namespace listing
stats = index.describe_index_stats()
# Returns namespace names and vector countsCross-Tenant Query Leakage
Even with proper namespace isolation, certain operations may leak information across tenant boundaries:
- Index statistics reveal the existence and size of other tenants' data
- Similarity scores in shared indexes may reflect proximity to other tenants' vectors
- Metadata filtering may not be enforced at the storage level, allowing filter bypass
Namespace Security
Namespaces provide logical separation but are not a security boundary in most vector databases. Understanding the limitations is essential for accurate risk assessment.
Namespace Enumeration
Most vector databases allow listing all namespaces through statistics endpoints:
# Pinecone namespace enumeration
stats = index.describe_index_stats()
for namespace, info in stats.namespaces.items():
print(f"Namespace: {namespace}, Vectors: {info.vector_count}")Namespace names often reveal organizational structure, project names, or tenant identifiers. This information supports further targeted attacks.
Namespace Deletion
An attacker with write access can delete entire namespaces, causing data loss:
# Delete all vectors in a namespace
index.delete(delete_all=True, namespace="target-namespace")Most vector databases do not have separate permissions for read, write, and delete operations. A key that can query can also delete.
Metadata Filtering Bypass
Vector databases store metadata alongside embeddings to enable filtered queries. Many RAG systems implement access control through metadata filtering — attaching access labels to embeddings and filtering queries to only return embeddings the user is authorized to see.
Client-Side Filtering
The most common and most dangerous pattern is implementing access control through client-side metadata filters:
# Application code implements access control via metadata filter
def search(user, query_embedding):
results = index.query(
vector=query_embedding,
top_k=10,
filter={"department": user.department} # "Access control"
)
return resultsAn attacker who can modify the query (through API manipulation, client-side code modification, or injection) can remove or change the filter:
# Attack: remove the filter to access all data
results = index.query(
vector=query_embedding,
top_k=10
# No filter — returns results from all departments
)
# Attack: query a different department
results = index.query(
vector=query_embedding,
top_k=10,
filter={"department": "executive"} # Access executive data
)Filter Injection
Some applications construct metadata filters from user input, creating opportunities for filter injection:
# Vulnerable: filter constructed from user input
def search(request):
filters = {"department": request.user.department}
# User-supplied additional filters
if request.params.get("category"):
filters["category"] = request.params["category"]
# Attacker can inject: category={"$exists": true}
# or other filter operators depending on the database
results = index.query(
vector=query_embedding,
top_k=10,
filter=filters
)Metadata Enumeration
Even when filters are enforced, metadata values can be enumerated through targeted queries:
# Enumerate metadata values by querying with each possible value
departments = ["engineering", "sales", "executive", "hr", "finance"]
for dept in departments:
results = index.query(
vector=random_embedding,
top_k=1,
filter={"department": dept}
)
if results.matches:
print(f"Department exists: {dept}")Red Team Assessment Checklist
When assessing vector database access control:
- Obtain the API key — Check client-side code, configuration files, environment variables, CI/CD pipelines
- Test namespace isolation — Query across namespaces, enumerate namespace names
- Test metadata filter bypass — Remove filters, modify filter values, inject filter operators
- Test write access — Attempt to upsert, update, and delete vectors
- Test administrative operations — Index creation, deletion, configuration changes
- Test key scope — Determine what operations the key permits and whether different keys have different permissions
- Check for key rotation — Determine when the key was last rotated and whether rotation is automated
Related Topics
- Vector Database Security Landscape — Overview of vector database architectures
- Injection Attacks — Attacks that exploit access to write embeddings
- Data Exfiltration — Leveraging access control gaps for data extraction