Vector Database Security Landscape
Security analysis of major vector databases including Pinecone, Weaviate, Chroma, Milvus, and Qdrant: architecture, access models, and common vulnerabilities.
Vector databases are purpose-built to store, index, and query high-dimensional embedding vectors. Unlike traditional databases where security models are well-understood, vector databases represent a new category with evolving security features and frequently immature access control implementations. This page covers the security landscape of the major vector databases and identifies common weaknesses.
Major Vector Databases
Pinecone
Pinecone is a fully managed, cloud-native vector database. Its security model centers on API keys for authentication and project-level isolation for multi-tenancy.
Architecture: Serverless and pod-based deployments on AWS and GCP. Data is stored in Pinecone-managed infrastructure.
Security characteristics:
- API key authentication (single key per project)
- Namespace-based logical isolation within indexes
- No row-level access control within namespaces
- SOC 2 Type II compliance
- Data encrypted at rest and in transit
Key risk: A single compromised API key grants full read/write access to all data in the project. There is no granular permission model to limit a key to read-only or to specific namespaces.
Weaviate
Weaviate is an open-source vector database that can be self-hosted or used as a managed service (Weaviate Cloud Services).
Architecture: Go-based server with a REST and GraphQL API. Supports various index types and can integrate multiple embedding models.
Security characteristics:
- API key and OIDC authentication
- Multi-tenant class isolation
- Authorization via role-based access control (RBAC) in enterprise tier
- Self-hosted deployments inherit infrastructure security
Key risk: Open-source deployments often run without authentication enabled, especially in development environments that are inadvertently exposed. The default configuration is permissive.
Chroma
Chroma is a lightweight, open-source embedding database designed for AI application development. It is popular for prototyping and smaller deployments.
Architecture: Python-based server with an in-process mode and a client-server mode. Stores data locally or in a configured backend.
Security characteristics:
- Minimal built-in authentication in the open-source version
- No native encryption at rest
- Collection-level isolation
- Designed for local/development use, frequently deployed in production without hardening
Key risk: Chroma's simplicity is its biggest security liability. Many production deployments lack authentication, encryption, and network isolation because the tool was adopted during prototyping and never hardened.
Milvus
Milvus is an open-source vector database designed for scalable production deployments. It supports distributed architectures with separate storage and compute.
Architecture: Distributed system with proxy, query, data, and index nodes. Uses etcd for metadata and MinIO/S3 for object storage.
Security characteristics:
- TLS for client-server and inter-node communication
- RBAC with user/role/privilege model
- Support for authentication via username/password
- Audit logging in enterprise version
Key risk: The distributed architecture means security must be maintained across multiple components (etcd, MinIO, the Milvus nodes themselves). A weakness in any component can compromise the entire system.
Qdrant
Qdrant is an open-source vector similarity search engine written in Rust, designed for production-grade performance.
Architecture: Single binary or distributed cluster. REST and gRPC APIs.
Security characteristics:
- API key authentication
- TLS support for API and inter-node communication
- Collection-level access control
- JWT-based fine-grained access control (newer feature)
Key risk: Like other open-source vector databases, production deployments may run with default (permissive) security configurations.
Common Security Weaknesses
Across all vector databases, several security weaknesses are consistently observed:
1. Authentication Gaps
Many vector database deployments lack authentication entirely, especially when they started as development environments. Even when authentication is enabled, the models are often simplistic:
| Database | Default Auth | Granularity | Common Deployment State |
|---|---|---|---|
| Pinecone | API key (required) | Project-level | Authenticated but coarse |
| Weaviate | None (optional) | Class/tenant-level | Often unauthenticated |
| Chroma | None | None | Almost always unauthenticated |
| Milvus | Username/password (optional) | Database/collection-level | Mixed |
| Qdrant | API key (optional) | Collection-level | Often unauthenticated |
2. Network Exposure
Vector databases are frequently exposed on internal networks without firewall restrictions. Common patterns include:
- Binding to
0.0.0.0instead of127.0.0.1 - Running in Docker with port forwarding enabled
- Deploying in Kubernetes without network policies
- Exposing management APIs alongside query APIs
3. Lack of Encryption at Rest
Similarity search operations require access to raw vector values, which makes encryption at rest challenging. Most vector databases store embeddings unencrypted, meaning that anyone with access to the underlying storage can read all embeddings.
4. Insufficient Audit Logging
Most vector databases do not provide detailed audit logs of query operations. This makes it difficult to detect data exfiltration, unauthorized access, or injection attacks after they occur.
5. Backup and Replication Security
Backups and replicas of vector databases often receive less security attention than the primary instance. Backup files may be stored in S3 buckets with permissive access policies, and replication traffic may not be encrypted.
Assessment Framework
When assessing a vector database deployment:
- Authentication — Is authentication enabled? What is the granularity? Are keys rotated?
- Network — Is the database exposed only to required clients? Are management APIs separated?
- Encryption — Is TLS enabled for transit? What about at rest?
- Access control — Can different users or services be restricted to specific collections or namespaces?
- Audit — Are queries logged? Can unauthorized access be detected?
- Backup — Are backups encrypted? Who can access them?
- Multi-tenancy — If multiple tenants share the database, is isolation enforced?
Related Topics
- Access Control — Detailed access control analysis
- Injection Attacks — Vector DB injection techniques
- Data Exfiltration — Exfiltration via vector databases