Embedding Inversion Attacks
Techniques for reconstructing input text from embedding vectors: model-specific inversion methods, privacy implications, and defenses against embedding inversion.
Embedding inversion reverses the embedding process: given a vector, the attacker recovers the text that produced it. This is significant because embeddings are often treated as non-sensitive derived data, stored with fewer protections than the original documents. If inversion is feasible, then embeddings must be protected as carefully as the documents they represent.
Inversion Model Architecture
Embedding inversion models are typically trained as sequence-to-sequence models that take a vector as input and produce text as output.
Training an Inversion Model
The training process requires a dataset of (text, embedding) pairs:
# Step 1: Generate training data
training_pairs = []
for text in corpus:
embedding = target_model.encode(text)
training_pairs.append((embedding, text))
# Step 2: Train an inversion model
# Architecture: embedding vector → decoder → text tokens
class InversionModel(nn.Module):
def __init__(self, embedding_dim, vocab_size, hidden_dim=512):
super().__init__()
self.projection = nn.Linear(embedding_dim, hidden_dim)
self.decoder = nn.TransformerDecoder(
nn.TransformerDecoderLayer(d_model=hidden_dim, nhead=8),
num_layers=6
)
self.output = nn.Linear(hidden_dim, vocab_size)
def forward(self, embedding, target_tokens):
# Project embedding into decoder's hidden space
memory = self.projection(embedding).unsqueeze(0)
# Decode text tokens
output = self.decoder(target_tokens, memory)
return self.output(output)What the Attacker Needs
To train an effective inversion model, the attacker needs:
- The embedding model or a close substitute. The inversion model must be trained on embeddings from the same model (or a similar one) used by the target system.
- A text corpus similar to the target domain. An inversion model trained on Wikipedia text will perform poorly on medical records. Domain-matched training data dramatically improves recovery quality.
- Sufficient compute. Training an inversion model requires GPU resources comparable to fine-tuning a small language model.
Inversion Quality and Expectations
Embedding inversion does not produce perfect reproductions of the original text. The quality of recovery varies by several factors.
What Is Typically Recovered
Research across multiple embedding models has shown:
| Recovery Level | What Is Captured | Example |
|---|---|---|
| Topic/domain | The general subject area | "medical diagnosis" from a clinical note |
| Key entities | Named entities, technical terms | "Patient Smith", "metformin", "Type 2 diabetes" |
| Core semantics | The main meaning of the text | "Patient was prescribed medication for diabetes" |
| Approximate phrasing | Similar but not identical wording | Synonyms, paraphrases of the original |
| Verbatim recovery | Exact original text | Rare; more likely for short, distinctive inputs |
Factors Affecting Recovery Quality
Embedding dimensionality. Higher-dimensional embeddings (1536 for text-embedding-3-large, 3072 for newer models) retain more information and are more invertible. Lower-dimensional embeddings (384 for all-MiniLM-L6-v2) lose more information and are harder to invert completely.
Input length. Shorter inputs are more completely recoverable because the embedding captures a larger proportion of the information. A 10-word sentence compressed into 1536 dimensions retains more per-word information than a 500-word paragraph.
Vocabulary distinctiveness. Texts with distinctive vocabulary (technical terms, proper nouns, unusual phrases) are more recoverable than generic text because the distinctive tokens leave stronger traces in the embedding.
Training data quality. An inversion model trained on in-domain data achieves significantly better recovery than one trained on generic text.
Model-Specific Techniques
OpenAI Embeddings
OpenAI's embedding models (text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large) are widely used and well-studied:
# Inversion approach for OpenAI embeddings
# The attacker cannot access model weights, so they use a trained inversion model
# Generate training data by embedding a domain-matched corpus
import openai
def generate_training_data(texts):
pairs = []
for text in texts:
response = openai.embeddings.create(
model="text-embedding-3-small",
input=text
)
embedding = response.data[0].embedding
pairs.append((embedding, text))
return pairs
# Train inversion model on these pairs
# Then apply to target embeddings extracted from the vector databaseThe black-box nature of OpenAI's models means the attacker must rely on the trained inversion model's generalization. Recovery quality is moderate: key entities and topics are typically recovered, but exact phrasing is not.
Open-Source Models
Open-source embedding models (Sentence Transformers, E5, BGE) are more vulnerable to inversion because the attacker has full model access:
# With model access, the attacker can use gradient-based inversion
# in addition to trained inversion models
def gradient_inversion(model, target_embedding, num_steps=1000):
# Initialize with random token embeddings
input_embeds = torch.randn(1, max_len, model.config.hidden_size,
requires_grad=True)
optimizer = torch.optim.Adam([input_embeds], lr=0.01)
for step in range(num_steps):
optimizer.zero_grad()
output = model(inputs_embeds=input_embeds)
current_embedding = output.last_hidden_state.mean(dim=1)
# Minimize distance to target
loss = 1 - torch.nn.functional.cosine_similarity(
current_embedding, target_embedding
)
loss.backward()
optimizer.step()
# Map optimized embeddings back to nearest tokens
return find_nearest_tokens(model, input_embeds)Privacy Implications
Embedding inversion has significant implications for data protection and privacy:
Regulatory Considerations
Under GDPR and similar regulations, data from which an individual can be identified constitutes personal data. If embeddings can be inverted to recover text containing personal information, then embeddings themselves may be classified as personal data, subject to:
- Data subject access requests (the right to know what data is stored)
- The right to erasure (deleting embeddings when source data is deleted)
- Data protection impact assessments for embedding pipelines
- Cross-border transfer restrictions for embedding storage
Practical Privacy Scenarios
- HR documents embedded for search. Embeddings of employee records could be inverted to recover salary information, performance reviews, or disciplinary records.
- Customer communications embedded for support. Embeddings of support tickets could reveal customer names, account details, and complaint histories.
- Medical records embedded for clinical search. Embeddings of clinical notes could expose diagnoses, medications, and patient identifiers.
Defenses and Their Limitations
Dimensionality Reduction
Reducing the dimensionality of embeddings before storage (e.g., from 1536 to 256 dimensions) reduces the information available for inversion. However, this also reduces retrieval quality.
Differential Privacy
Adding calibrated noise to embeddings before storage provides theoretical privacy guarantees. The challenge is that noise sufficient to prevent inversion also degrades retrieval performance.
Embedding Quantization
Quantizing embeddings to lower precision (e.g., from float32 to int8) reduces the information available for inversion while having less impact on retrieval quality than dimensionality reduction.
Access Control on Vectors
The most effective defense is preventing unauthorized access to embedding vectors. This means configuring the vector database to not return vector values in query results (only IDs and similarity scores) and restricting direct database access.
Related Topics
- Data Exfiltration via Vector Databases — Using inversion as part of exfiltration
- Adversarial Embeddings — The reverse direction: crafting specific embeddings
- Membership Inference — A lighter-weight privacy attack