Embedding Inversion Attacks

advanced7 min readUpdated 2026-03-15

Techniques for reconstructing input text from embedding vectors: model-specific inversion methods, privacy implications, and defenses against embedding inversion.

inversion-attacks text-reconstruction privacy embedding-inversion data-extraction

Embedding inversion reverses the embedding process: given a vector, the attacker recovers the text that produced it. This is significant because embeddings are often treated as non-sensitive derived data, stored with fewer protections than the original documents. If inversion is feasible, then embeddings must be protected as carefully as the documents they represent.

Inversion Model Architecture

Embedding inversion models are typically trained as sequence-to-sequence models that take a vector as input and produce text as output.

Training an Inversion Model

The training process requires a dataset of (text, embedding) pairs:

# Step 1: Generate training data
training_pairs = []
for text in corpus:
    embedding = target_model.encode(text)
    training_pairs.append((embedding, text))
 
# Step 2: Train an inversion model
# Architecture: embedding vector → decoder → text tokens
class InversionModel(nn.Module):
    def __init__(self, embedding_dim, vocab_size, hidden_dim=512):
        super().__init__()
        self.projection = nn.Linear(embedding_dim, hidden_dim)
        self.decoder = nn.TransformerDecoder(
            nn.TransformerDecoderLayer(d_model=hidden_dim, nhead=8),
            num_layers=6
        )
        self.output = nn.Linear(hidden_dim, vocab_size)
 
    def forward(self, embedding, target_tokens):
        # Project embedding into decoder's hidden space
        memory = self.projection(embedding).unsqueeze(0)
        # Decode text tokens
        output = self.decoder(target_tokens, memory)
        return self.output(output)

What the Attacker Needs

To train an effective inversion model, the attacker needs:

The embedding model or a close substitute. The inversion model must be trained on embeddings from the same model (or a similar one) used by the target system.
A text corpus similar to the target domain. An inversion model trained on Wikipedia text will perform poorly on medical records. Domain-matched training data dramatically improves recovery quality.
Sufficient compute. Training an inversion model requires GPU resources comparable to fine-tuning a small language model.

Inversion Quality and Expectations

Embedding inversion does not produce perfect reproductions of the original text. The quality of recovery varies by several factors.

What Is Typically Recovered

Research across multiple embedding models has shown:

Recovery Level	What Is Captured	Example
Topic/domain	The general subject area	"medical diagnosis" from a clinical note
Key entities	Named entities, technical terms	"Patient Smith", "metformin", "Type 2 diabetes"
Core semantics	The main meaning of the text	"Patient was prescribed medication for diabetes"
Approximate phrasing	Similar but not identical wording	Synonyms, paraphrases of the original
Verbatim recovery	Exact original text	Rare; more likely for short, distinctive inputs

Factors Affecting Recovery Quality

Embedding dimensionality. Higher-dimensional embeddings (1536 for text-embedding-3-large, 3072 for newer models) retain more information and are more invertible. Lower-dimensional embeddings (384 for all-MiniLM-L6-v2) lose more information and are harder to invert completely.

Input length. Shorter inputs are more completely recoverable because the embedding captures a larger proportion of the information. A 10-word sentence compressed into 1536 dimensions retains more per-word information than a 500-word paragraph.

Vocabulary distinctiveness. Texts with distinctive vocabulary (technical terms, proper nouns, unusual phrases) are more recoverable than generic text because the distinctive tokens leave stronger traces in the embedding.

Training data quality. An inversion model trained on in-domain data achieves significantly better recovery than one trained on generic text.

Model-Specific Techniques

OpenAI Embeddings

OpenAI's embedding models (text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large) are widely used and well-studied:

# Inversion approach for OpenAI embeddings
# The attacker cannot access model weights, so they use a trained inversion model
 
# Generate training data by embedding a domain-matched corpus
import openai
 
def generate_training_data(texts):
    pairs = []
    for text in texts:
        response = openai.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        embedding = response.data[0].embedding
        pairs.append((embedding, text))
    return pairs
 
# Train inversion model on these pairs
# Then apply to target embeddings extracted from the vector database

The black-box nature of OpenAI's models means the attacker must rely on the trained inversion model's generalization. Recovery quality is moderate: key entities and topics are typically recovered, but exact phrasing is not.

Open-Source Models

Open-source embedding models (Sentence Transformers, E5, BGE) are more vulnerable to inversion because the attacker has full model access:

# With model access, the attacker can use gradient-based inversion
# in addition to trained inversion models
 
def gradient_inversion(model, target_embedding, num_steps=1000):
    # Initialize with random token embeddings
    input_embeds = torch.randn(1, max_len, model.config.hidden_size,
                                requires_grad=True)
 
    optimizer = torch.optim.Adam([input_embeds], lr=0.01)
 
    for step in range(num_steps):
        optimizer.zero_grad()
        output = model(inputs_embeds=input_embeds)
        current_embedding = output.last_hidden_state.mean(dim=1)
 
        # Minimize distance to target
        loss = 1 - torch.nn.functional.cosine_similarity(
            current_embedding, target_embedding
        )
        loss.backward()
        optimizer.step()
 
    # Map optimized embeddings back to nearest tokens
    return find_nearest_tokens(model, input_embeds)

Privacy Implications

Embedding inversion has significant implications for data protection and privacy:

Regulatory Considerations

Under GDPR and similar regulations, data from which an individual can be identified constitutes personal data. If embeddings can be inverted to recover text containing personal information, then embeddings themselves may be classified as personal data, subject to:

Data subject access requests (the right to know what data is stored)
The right to erasure (deleting embeddings when source data is deleted)
Data protection impact assessments for embedding pipelines
Cross-border transfer restrictions for embedding storage

Practical Privacy Scenarios

HR documents embedded for search. Embeddings of employee records could be inverted to recover salary information, performance reviews, or disciplinary records.
Customer communications embedded for support. Embeddings of support tickets could reveal customer names, account details, and complaint histories.
Medical records embedded for clinical search. Embeddings of clinical notes could expose diagnoses, medications, and patient identifiers.

Data Exfiltration via Vector Databases — Using inversion as part of exfiltration
Adversarial Embeddings — The reverse direction: crafting specific embeddings
Membership Inference — A lighter-weight privacy attack

Embedding Inversion Attacks

Inversion Model Architecture

Training an Inversion Model

What the Attacker Needs

Inversion Quality and Expectations

What Is Typically Recovered

Factors Affecting Recovery Quality

Model-Specific Techniques

OpenAI Embeddings

Open-Source Models

Privacy Implications

Regulatory Considerations

Practical Privacy Scenarios

Defenses and Their Limitations

Dimensionality Reduction

Differential Privacy

Embedding Quantization

Access Control on Vectors

Embedding Inversion Attacks

Inversion Model Architecture

Training an Inversion Model

What the Attacker Needs

Inversion Quality and Expectations

What Is Typically Recovered

Factors Affecting Recovery Quality

Model-Specific Techniques

OpenAI Embeddings

Open-Source Models

Privacy Implications

Regulatory Considerations

Practical Privacy Scenarios

Defenses and Their Limitations

Dimensionality Reduction

Differential Privacy

Embedding Quantization

Access Control on Vectors

Embedding Inversion Attacks

Related articles

Embedding Inversion Attacks

Related articles