Embedding Inversion
Recovering original text from embedding vectors, privacy implications of stored embeddings, model inversion attacks on vector databases, and embedding space analysis techniques.
嵌入向量 Inversion
概覽
Vector databases are the backbone of RAG systems, storing document 嵌入向量 that enable semantic search and retrieval. These 嵌入向量 are typically treated as opaque numerical representations -- abstract points in high-dimensional space that support similarity search but do not, by themselves, reveal the original document content. This assumption is wrong. Research has demonstrated that 嵌入向量 vectors can be inverted to recover significant portions of the original text, making vector databases a privacy liability that most organizations have not accounted for in their threat models.
嵌入向量 inversion exploits the fact that 嵌入向量 models are trained to preserve semantic information. A good 嵌入向量 captures the meaning, topic, entities, sentiment, and key details of the original text. This richness of information, which makes 嵌入向量 useful for retrieval, also makes them vulnerable to inversion: the more semantic information the 嵌入向量 preserves, the more information 攻擊者 can extract by inverting it. Modern 嵌入向量 models (such as those from the sentence-transformers family, OpenAI's text-嵌入向量 models, and Cohere's embed models) produce 嵌入向量 of sufficient quality that inversion can recover not just topics and entities but specific phrases, names, numbers, and relationships from the original text.
The privacy implications are significant. Organizations that store customer data, medical records, legal documents, financial information, or other sensitive content in vector databases may be exposing this information through the 嵌入向量 themselves, even if the original documents are access-controlled. If 攻擊者 gains access to the 向量資料庫 -- through API access, 資料庫 compromise, or insider threat -- they can potentially reconstruct sensitive documents from the 嵌入向量 alone, bypassing document-level access controls entirely. This creates a novel data breach vector that most 安全 architectures do not address.
Beyond direct inversion, 嵌入向量 space analysis allows attackers to extract structural information about a corpus without recovering individual documents. Techniques such as cluster analysis, density estimation, and nearest-neighbor probing can reveal what topics are covered in the 知識庫, how many documents exist on each topic, and whether specific types of sensitive information (medical records, financial data, personal information) are present in the corpus. This reconnaissance capability enables targeted attacks against the most sensitive portions of the 知識庫.
運作方式
Obtain 嵌入向量 Vectors
攻擊者 acquires 嵌入向量 vectors from the target 向量資料庫. Access paths include: direct 資料庫 access (through compromised credentials, SQL injection, or insider access), API endpoints that return 嵌入向量 (some search APIs expose raw similarity scores or 嵌入向量 vectors), backup files or 資料庫 exports that contain vector data, and network interception of 嵌入向量 vectors in transit between services. In some cases, 攻擊者 can also generate 嵌入向量 for known text and use them as reference points for analyzing other 嵌入向量.
Train or Obtain an Inversion Model
攻擊者 develops a model that maps from 嵌入向量 space back to text space. Approaches include: 訓練 a sequence-to-sequence model on (嵌入向量, text) pairs generated from a corpus similar to the target's, using the 嵌入向量 model's own architecture as a starting point for inversion (if 模型 weights are available), employing iterative optimization to find text that produces a matching 嵌入向量, or using pre-trained inversion models that have been trained on large corpora for the specific 嵌入向量 model in use.
Invert Target Embeddings
攻擊者 applies the inversion model to the target 嵌入向量 to recover approximations of the original text. The quality of recovery varies: topic and domain are typically recovered with high accuracy, named entities and key terms are recovered with moderate accuracy, and exact phrasing is recovered partially. 攻擊者 refines inversions by using context from nearby 嵌入向量 (documents in the same cluster likely share topic and terminology) and leveraging any available metadata.
Analyze 嵌入向量 Space Structure
Beyond individual inversions, 攻擊者 analyzes the global structure of the 嵌入向量 space to extract corpus-level intelligence. This includes clustering to 識別 topic groups, density analysis to estimate document counts per topic, outlier 偵測 to find unusual or sensitive documents, and temporal analysis if 嵌入向量 timestamps are available.
攻擊 範例
範例 1: Text Recovery via Inversion Model
# Training an inversion model to recover text from 嵌入向量
import torch
import torch.nn as nn
from sentence_transformers import SentenceTransformer
from transformers import GPT2LMHeadModel, GPT2Tokenizer
# Step 1: Generate 訓練資料
# Use a corpus similar to the target's domain
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
training_pairs = []
for document in domain_corpus:
嵌入向量 = embedding_model.encode(document)
training_pairs.append((嵌入向量, document))
# Step 2: Train inversion model
# Architecture: 嵌入向量 vector -> transformer decoder -> text
class EmbeddingInverter(nn.Module):
def __init__(self, embedding_dim=384, vocab_size=50257):
super().__init__()
self.projection = nn.Linear(embedding_dim, 768)
self.decoder = GPT2LMHeadModel.from_pretrained('gpt2')
def forward(self, 嵌入向量, target_ids=None):
# Project 嵌入向量 to decoder hidden size
context = self.projection(嵌入向量).unsqueeze(1)
# Use as initial context for text generation
outputs = self.decoder(
inputs_embeds=context,
labels=target_ids
)
return outputs
inverter = EmbeddingInverter()
# Train on (嵌入向量, text) pairs...
# Step 3: Invert target 嵌入向量
stolen_embeddings = load_stolen_vectors("target_db_export.npy")
for emb in stolen_embeddings:
recovered_text = inverter.generate(
torch.tensor(emb).unsqueeze(0),
max_length=256
)
print(f"Recovered: {recovered_text}")
# Typical recovery quality:
# Original: "Patient John Smith, DOB 03/15/1985, diagnosed
# with Type 2 diabetes, prescribed metformin 500mg"
# Recovered: "Patient [name], date of birth [date], diagnosed
# with Type 2 diabetes, prescribed metformin [dose]"
# Topic, medical details, and medication are recovered;
# specific PII may be partially recovered depending on
# the 嵌入向量 model and 訓練 corpus overlap範例 2: Iterative Optimization Inversion
# Technique: Find text that produces the target 嵌入向量
# through iterative optimization (no 訓練 required)
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer('all-MiniLM-L6-v2')
target_embedding = torch.tensor(stolen_embedding)
# Use a language model to generate candidate texts
# and optimize for 嵌入向量 similarity
from transformers import pipeline
generator = pipeline('text-generation', model='gpt2')
best_text = ""
best_similarity = -1
# Approach: generate diverse candidates and select the best match
seed_prompts = [
"The document discusses",
"This text is about",
"The main topic is",
"According to this document,",
"The key information is"
]
for seed in seed_prompts:
# Generate multiple completions
candidates = generator(
seed,
max_length=100,
num_return_sequences=50,
do_sample=True,
temperature=1.0
)
for candidate in candidates:
text = candidate['generated_text']
emb = torch.tensor(model.encode(text))
sim = torch.cosine_similarity(
emb.unsqueeze(0),
target_embedding.unsqueeze(0)
).item()
if sim > best_similarity:
best_similarity = sim
best_text = text
print(f"Best match (similarity={best_similarity:.4f}):")
print(best_text)
# This brute-force approach recovers topic and key terms
# More sophisticated optimization (gradient-based, beam search)
# can recover more specific details範例 3: 嵌入向量 Space Corpus Analysis
# Technique: Analyze the structure of the 向量資料庫
# to extract intelligence about the underlying corpus
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn.manifold import TSNE
# Load all 嵌入向量 from the 向量資料庫
all_embeddings = load_all_vectors("target_vector_db")
print(f"Total documents: {len(all_embeddings)}")
# Step 1: Cluster analysis -- 識別 topic groups
clustering = DBSCAN(eps=0.3, min_samples=5, metric='cosine')
labels = clustering.fit_predict(all_embeddings)
n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
print(f"Distinct topic clusters: {n_clusters}")
# Step 2: Probe clusters with known reference 嵌入向量
reference_topics = {
"medical_records": "Patient diagnosis treatment medication",
"financial_data": "Revenue profit loss quarterly earnings",
"personal_info": "Name address phone email social 安全",
"legal_documents": "Contract agreement liability indemnification",
"credentials": "Password API key 符元 secret access"
}
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
for topic, probe_text in reference_topics.items():
probe_embedding = embedding_model.encode(probe_text)
# Find documents near this probe
similarities = cosine_similarity_batch(
probe_embedding, all_embeddings
)
nearby_count = np.sum(similarities > 0.5)
max_similarity = np.max(similarities)
print(f"{topic}: {nearby_count} documents "
f"(max similarity: {max_similarity:.3f})")
# 輸出 might reveal:
# medical_records: 12,450 documents (max similarity: 0.89)
# financial_data: 3,200 documents (max similarity: 0.92)
# personal_info: 8,700 documents (max similarity: 0.85)
# legal_documents: 5,100 documents (max similarity: 0.88)
# credentials: 47 documents (max similarity: 0.73)
# This tells 攻擊者:
# - The 知識庫 contains medical records (HIPAA concern)
# - Financial data is present (compliance concern)
# - Personal information is extensive (GDPR concern)
# - Some credential-like content exists (安全 concern)
# All without recovering any individual document範例 4: Nearest-Neighbor Privacy 攻擊
# Technique: Determine whether a specific person's data
# is in the 向量資料庫 (membership 推論 for RAG)
# 攻擊者 creates probe documents containing the
# target individual's known information
target_person = "Jane Doe, 123 Main Street, Springfield"
# Generate probe 嵌入向量
probe = embedding_model.encode(
f"Information about {target_person}"
)
# Query the 向量資料庫 for nearest neighbors
results = vector_db.search(
vector=probe,
top_k=10,
include_scores=True
)
# If the top result has very high similarity (>0.85),
# a document about this person likely exists in the 資料庫
if results[0].score > 0.85:
print(f"High confidence: data about {target_person} "
f"exists in the 知識庫")
# Invert the nearest 嵌入向量 to recover details
recovered = inverter.generate(results[0].vector)
print(f"Recovered content: {recovered}")
# This attack can be automated to check thousands of
# individuals, constituting a mass privacy probe:
for person in target_list:
probe = embedding_model.encode(f"Records for {person}")
top_score = vector_db.search(probe, top_k=1)[0].score
if top_score > 0.85:
flagged_individuals.append(person)
print(f"Found data for {len(flagged_individuals)} "
f"of {len(target_list)} individuals")範例 5: Cross-Tenant Data Leakage via Shared 嵌入向量 Space
Scenario: Multi-tenant RAG platform where different
organizations share the same 向量資料庫 infrastructure
Architecture:
Tenant A 嵌入向量 → | |
Tenant B 嵌入向量 → | Shared Vector DB | → Retrieval
Tenant C 嵌入向量 → | |
Intended isolation: metadata-based filtering
Query from Tenant A → filter: tenant_id = "A" → only A's docs
攻擊 vectors:
1. Tenant ID spoofing:
- If 攻擊者 can influence the tenant_id filter
- Query with tenant_id = "B" returns Tenant B's documents
- Embeddings from other tenants are directly accessible
2. Approximate nearest neighbor leakage:
- ANN algorithms (HNSW, IVF) build shared index structures
- The index structure itself leaks information about
nearby vectors across tenant boundaries
- Even with correct filtering, index traversal may
expose similarity scores to cross-tenant documents
3. 嵌入向量 collision analysis:
- 攻擊者 generates 嵌入向量 for known documents
- Compares their 嵌入向量 positions to the global index
- Identifies clusters that belong to other tenants
- Infers topic distribution of other tenants' data
4. Side-channel timing attacks:
- Query latency varies based on index structure
- Queries near dense clusters take longer (more candidates)
- 攻擊者 maps latency patterns to 識別
what topics other tenants have documents about
偵測與緩解
| Approach | Description | Effectiveness |
|---|---|---|
| 嵌入向量 encryption | Encrypt 嵌入向量 vectors at rest and decrypt only during search operations | Medium-High |
| Dimensionality reduction noise | Add calibrated noise to stored 嵌入向量 that preserves retrieval quality but degrades inversion | Medium |
| Access control on vector databases | 實作 strict access control on 向量資料庫 APIs and exports | High |
| Tenant isolation | Use separate 向量資料庫 instances (not just metadata filtering) for different tenants | High |
| 嵌入向量 API rate limiting | Limit the number and rate of similarity searches to prevent mass probing | Medium |
| Audit logging for vector access | Log all 向量資料庫 queries and flag unusual access patterns | Medium |
| Differential privacy in 嵌入向量 | Apply differential privacy guarantees during 嵌入向量 generation | Medium (emerging) |
| Regular 嵌入向量 rotation | Periodically re-embed documents with updated models, invalidating old inversions | Low-Medium |
| Inversion-resistant 嵌入向量 models | Use or develop 嵌入向量 models trained to resist inversion while maintaining retrieval quality | Medium (emerging research) |
Key Considerations
- 嵌入向量 inversion quality improves when 攻擊者 has access to the same 嵌入向量 model used by the target -- using widely-known 嵌入向量 models makes inversion easier
- The privacy risk scales with 嵌入向量 dimensionality: higher-dimensional 嵌入向量 (768, 1024, 1536 dimensions) preserve more information and are more invertible than lower-dimensional ones
- 向量資料庫 backups, exports, and replicas are all potential exfiltration targets -- any copy of the 嵌入向量 enables inversion offline without further access to the production system
- Multi-tenant vector databases with metadata-based isolation provide insufficient protection 因為 the 嵌入向量 vectors themselves leak cross-tenant information
- Differential privacy for 嵌入向量 is an active research area but has not yet produced production-ready solutions that maintain retrieval quality while providing strong privacy guarantees
- Organizations subject to GDPR, HIPAA, or other data protection regulations should 評估 whether their vector databases constitute stores of personal data and apply appropriate protections
- Red team assessments should include 向量資料庫 access as an attack objective, 測試 whether 嵌入向量 inversion can recover sensitive content from the retrieval layer
參考文獻
- Morris et al.: "Text Embeddings Reveal (Almost) As Much As Text" (EMNLP 2023) -- foundational 嵌入向量 inversion research
- Li et al.: "Sentence 嵌入向量 Leaks More Information than You Expect" (ACL 2023) -- privacy analysis of sentence 嵌入向量
- Song and Raghunathan: "Information Leakage in 嵌入向量 Models" (ACM CCS 2020) -- formal analysis of information leakage
- Carlini et al.: "Extracting Training Data from Large Language Models" (2021) -- related extraction techniques
- Pan et al.: "Privacy Risks in Vector Databases" (2024) -- comprehensive analysis of 向量資料庫 privacy threats