Embedding Inversion
Recovering original text from embedding vectors, privacy implications of stored embeddings, model inversion attacks on vector databases, and embedding space analysis techniques.
Embedding Inversion
Overview
Vector databases are the backbone of RAG systems, storing document embeddings that enable semantic search and retrieval. These embeddings are typically treated as opaque numerical representations -- abstract points in high-dimensional space that support similarity search but do not, by themselves, reveal the original document content. This assumption is wrong. Research has demonstrated that embedding vectors can be inverted to recover significant portions of the original text, making vector databases a privacy liability that most organizations have not accounted for in their threat models.
Embedding inversion exploits the fact that embedding models are trained to preserve semantic information. A good embedding captures the meaning, topic, entities, sentiment, and key details of the original text. This richness of information, which makes embeddings useful for retrieval, also makes them vulnerable to inversion: the more semantic information the embedding preserves, the more information an attacker can extract by inverting it. Modern embedding models (such as those from the sentence-transformers family, OpenAI's text-embedding models, and Cohere's embed models) produce embeddings of sufficient quality that inversion can recover not just topics and entities but specific phrases, names, numbers, and relationships from the original text.
The privacy implications are significant. Organizations that store customer data, medical records, legal documents, financial information, or other sensitive content in vector databases may be exposing this information through the embeddings themselves, even if the original documents are access-controlled. If an attacker gains access to the vector database -- through API access, database compromise, or insider threat -- they can potentially reconstruct sensitive documents from the embeddings alone, bypassing document-level access controls entirely. This creates a novel data breach vector that most security architectures do not address.
Beyond direct inversion, embedding space analysis allows attackers to extract structural information about a corpus without recovering individual documents. Techniques such as cluster analysis, density estimation, and nearest-neighbor probing can reveal what topics are covered in the knowledge base, how many documents exist on each topic, and whether specific types of sensitive information (medical records, financial data, personal information) are present in the corpus. This reconnaissance capability enables targeted attacks against the most sensitive portions of the knowledge base.
How It Works
Obtain Embedding Vectors
The attacker acquires embedding vectors from the target vector database. Access paths include: direct database access (through compromised credentials, SQL injection, or insider access), API endpoints that return embeddings (some search APIs expose raw similarity scores or embedding vectors), backup files or database exports that contain vector data, and network interception of embedding vectors in transit between services. In some cases, the attacker can also generate embeddings for known text and use them as reference points for analyzing other embeddings.
Train or Obtain an Inversion Model
The attacker develops a model that maps from embedding space back to text space. Approaches include: training a sequence-to-sequence model on (embedding, text) pairs generated from a corpus similar to the target's, using the embedding model's own architecture as a starting point for inversion (if the model weights are available), employing iterative optimization to find text that produces a matching embedding, or using pre-trained inversion models that have been trained on large corpora for the specific embedding model in use.
Invert Target Embeddings
The attacker applies the inversion model to the target embeddings to recover approximations of the original text. The quality of recovery varies: topic and domain are typically recovered with high accuracy, named entities and key terms are recovered with moderate accuracy, and exact phrasing is recovered partially. The attacker refines inversions by using context from nearby embeddings (documents in the same cluster likely share topic and terminology) and leveraging any available metadata.
Analyze Embedding Space Structure
Beyond individual inversions, the attacker analyzes the global structure of the embedding space to extract corpus-level intelligence. This includes clustering to identify topic groups, density analysis to estimate document counts per topic, outlier detection to find unusual or sensitive documents, and temporal analysis if embedding timestamps are available.
Attack Examples
Example 1: Text Recovery via Inversion Model
# Training an inversion model to recover text from embeddings
import torch
import torch.nn as nn
from sentence_transformers import SentenceTransformer
from transformers import GPT2LMHeadModel, GPT2Tokenizer
# Step 1: Generate training data
# Use a corpus similar to the target's domain
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
training_pairs = []
for document in domain_corpus:
embedding = embedding_model.encode(document)
training_pairs.append((embedding, document))
# Step 2: Train inversion model
# Architecture: embedding vector -> transformer decoder -> text
class EmbeddingInverter(nn.Module):
def __init__(self, embedding_dim=384, vocab_size=50257):
super().__init__()
self.projection = nn.Linear(embedding_dim, 768)
self.decoder = GPT2LMHeadModel.from_pretrained('gpt2')
def forward(self, embedding, target_ids=None):
# Project embedding to decoder hidden size
context = self.projection(embedding).unsqueeze(1)
# Use as initial context for text generation
outputs = self.decoder(
inputs_embeds=context,
labels=target_ids
)
return outputs
inverter = EmbeddingInverter()
# Train on (embedding, text) pairs...
# Step 3: Invert target embeddings
stolen_embeddings = load_stolen_vectors("target_db_export.npy")
for emb in stolen_embeddings:
recovered_text = inverter.generate(
torch.tensor(emb).unsqueeze(0),
max_length=256
)
print(f"Recovered: {recovered_text}")
# Typical recovery quality:
# Original: "Patient John Smith, DOB 03/15/1985, diagnosed
# with Type 2 diabetes, prescribed metformin 500mg"
# Recovered: "Patient [name], date of birth [date], diagnosed
# with Type 2 diabetes, prescribed metformin [dose]"
# Topic, medical details, and medication are recovered;
# specific PII may be partially recovered depending on
# the embedding model and training corpus overlapExample 2: Iterative Optimization Inversion
# Technique: Find text that produces the target embedding
# through iterative optimization (no training required)
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer('all-MiniLM-L6-v2')
target_embedding = torch.tensor(stolen_embedding)
# Use a language model to generate candidate texts
# and optimize for embedding similarity
from transformers import pipeline
generator = pipeline('text-generation', model='gpt2')
best_text = ""
best_similarity = -1
# Approach: generate diverse candidates and select the best match
seed_prompts = [
"The document discusses",
"This text is about",
"The main topic is",
"According to this document,",
"The key information is"
]
for seed in seed_prompts:
# Generate multiple completions
candidates = generator(
seed,
max_length=100,
num_return_sequences=50,
do_sample=True,
temperature=1.0
)
for candidate in candidates:
text = candidate['generated_text']
emb = torch.tensor(model.encode(text))
sim = torch.cosine_similarity(
emb.unsqueeze(0),
target_embedding.unsqueeze(0)
).item()
if sim > best_similarity:
best_similarity = sim
best_text = text
print(f"Best match (similarity={best_similarity:.4f}):")
print(best_text)
# This brute-force approach recovers topic and key terms
# More sophisticated optimization (gradient-based, beam search)
# can recover more specific detailsExample 3: Embedding Space Corpus Analysis
# Technique: Analyze the structure of the vector database
# to extract intelligence about the underlying corpus
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn.manifold import TSNE
# Load all embeddings from the vector database
all_embeddings = load_all_vectors("target_vector_db")
print(f"Total documents: {len(all_embeddings)}")
# Step 1: Cluster analysis -- identify topic groups
clustering = DBSCAN(eps=0.3, min_samples=5, metric='cosine')
labels = clustering.fit_predict(all_embeddings)
n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
print(f"Distinct topic clusters: {n_clusters}")
# Step 2: Probe clusters with known reference embeddings
reference_topics = {
"medical_records": "Patient diagnosis treatment medication",
"financial_data": "Revenue profit loss quarterly earnings",
"personal_info": "Name address phone email social security",
"legal_documents": "Contract agreement liability indemnification",
"credentials": "Password API key token secret access"
}
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
for topic, probe_text in reference_topics.items():
probe_embedding = embedding_model.encode(probe_text)
# Find documents near this probe
similarities = cosine_similarity_batch(
probe_embedding, all_embeddings
)
nearby_count = np.sum(similarities > 0.5)
max_similarity = np.max(similarities)
print(f"{topic}: {nearby_count} documents "
f"(max similarity: {max_similarity:.3f})")
# Output might reveal:
# medical_records: 12,450 documents (max similarity: 0.89)
# financial_data: 3,200 documents (max similarity: 0.92)
# personal_info: 8,700 documents (max similarity: 0.85)
# legal_documents: 5,100 documents (max similarity: 0.88)
# credentials: 47 documents (max similarity: 0.73)
# This tells the attacker:
# - The knowledge base contains medical records (HIPAA concern)
# - Financial data is present (compliance concern)
# - Personal information is extensive (GDPR concern)
# - Some credential-like content exists (security concern)
# All without recovering any individual documentExample 4: Nearest-Neighbor Privacy Attack
# Technique: Determine whether a specific person's data
# is in the vector database (membership inference for RAG)
# The attacker creates probe documents containing the
# target individual's known information
target_person = "Jane Doe, 123 Main Street, Springfield"
# Generate probe embedding
probe = embedding_model.encode(
f"Information about {target_person}"
)
# Query the vector database for nearest neighbors
results = vector_db.search(
vector=probe,
top_k=10,
include_scores=True
)
# If the top result has very high similarity (>0.85),
# a document about this person likely exists in the database
if results[0].score > 0.85:
print(f"High confidence: data about {target_person} "
f"exists in the knowledge base")
# Invert the nearest embedding to recover details
recovered = inverter.generate(results[0].vector)
print(f"Recovered content: {recovered}")
# This attack can be automated to check thousands of
# individuals, constituting a mass privacy probe:
for person in target_list:
probe = embedding_model.encode(f"Records for {person}")
top_score = vector_db.search(probe, top_k=1)[0].score
if top_score > 0.85:
flagged_individuals.append(person)
print(f"Found data for {len(flagged_individuals)} "
f"of {len(target_list)} individuals")Example 5: Cross-Tenant Data Leakage via Shared Embedding Space
Scenario: Multi-tenant RAG platform where different
organizations share the same vector database infrastructure
Architecture:
Tenant A embeddings → | |
Tenant B embeddings → | Shared Vector DB | → Retrieval
Tenant C embeddings → | |
Intended isolation: metadata-based filtering
Query from Tenant A → filter: tenant_id = "A" → only A's docs
Attack vectors:
1. Tenant ID spoofing:
- If the attacker can influence the tenant_id filter
- Query with tenant_id = "B" returns Tenant B's documents
- Embeddings from other tenants are directly accessible
2. Approximate nearest neighbor leakage:
- ANN algorithms (HNSW, IVF) build shared index structures
- The index structure itself leaks information about
nearby vectors across tenant boundaries
- Even with correct filtering, index traversal may
expose similarity scores to cross-tenant documents
3. Embedding collision analysis:
- The attacker generates embeddings for known documents
- Compares their embedding positions to the global index
- Identifies clusters that belong to other tenants
- Infers topic distribution of other tenants' data
4. Side-channel timing attacks:
- Query latency varies based on index structure
- Queries near dense clusters take longer (more candidates)
- The attacker maps latency patterns to identify
what topics other tenants have documents about
Detection & Mitigation
| Approach | Description | Effectiveness |
|---|---|---|
| Embedding encryption | Encrypt embedding vectors at rest and decrypt only during search operations | Medium-High |
| Dimensionality reduction noise | Add calibrated noise to stored embeddings that preserves retrieval quality but degrades inversion | Medium |
| Access control on vector databases | Implement strict access control on vector database APIs and exports | High |
| Tenant isolation | Use separate vector database instances (not just metadata filtering) for different tenants | High |
| Embedding API rate limiting | Limit the number and rate of similarity searches to prevent mass probing | Medium |
| Audit logging for vector access | Log all vector database queries and flag unusual access patterns | Medium |
| Differential privacy in embeddings | Apply differential privacy guarantees during embedding generation | Medium (emerging) |
| Regular embedding rotation | Periodically re-embed documents with updated models, invalidating old inversions | Low-Medium |
| Inversion-resistant embedding models | Use or develop embedding models trained to resist inversion while maintaining retrieval quality | Medium (emerging research) |
Key Considerations
- Embedding inversion quality improves when the attacker has access to the same embedding model used by the target -- using widely-known embedding models makes inversion easier
- The privacy risk scales with embedding dimensionality: higher-dimensional embeddings (768, 1024, 1536 dimensions) preserve more information and are more invertible than lower-dimensional ones
- Vector database backups, exports, and replicas are all potential exfiltration targets -- any copy of the embeddings enables inversion offline without further access to the production system
- Multi-tenant vector databases with metadata-based isolation provide insufficient protection because the embedding vectors themselves leak cross-tenant information
- Differential privacy for embeddings is an active research area but has not yet produced production-ready solutions that maintain retrieval quality while providing strong privacy guarantees
- Organizations subject to GDPR, HIPAA, or other data protection regulations should evaluate whether their vector databases constitute stores of personal data and apply appropriate protections
- Red team assessments should include vector database access as an attack objective, testing whether embedding inversion can recover sensitive content from the retrieval layer
References
- Morris et al.: "Text Embeddings Reveal (Almost) As Much As Text" (EMNLP 2023) -- foundational embedding inversion research
- Li et al.: "Sentence Embedding Leaks More Information than You Expect" (ACL 2023) -- privacy analysis of sentence embeddings
- Song and Raghunathan: "Information Leakage in Embedding Models" (ACM CCS 2020) -- formal analysis of information leakage
- Carlini et al.: "Extracting Training Data from Large Language Models" (2021) -- related extraction techniques
- Pan et al.: "Privacy Risks in Vector Databases" (2024) -- comprehensive analysis of vector database privacy threats