Embedding-Level Attacks
Overview of attacks targeting embeddings directly: adversarial embedding generation, inversion attacks for text reconstruction, and membership inference via embedding analysis.
Embedding-level attacks target the vectors themselves rather than the databases that store them or the retrieval pipelines that query them. These attacks exploit the fundamental properties of embedding spaces: that embeddings encode semantic information about their source data, that embedding spaces have geometric structure that can be manipulated, and that the mapping from text to embeddings can be partially reversed.
The Embedding Attack Surface
Embeddings are often treated as opaque numerical representations that do not themselves contain sensitive information. This assumption creates a security gap: organizations may protect the original documents but treat their embeddings as non-sensitive data that can be stored, transmitted, and shared with fewer restrictions.
The reality is that embeddings are a lossy but information-rich encoding of their source data. The degree to which they leak information depends on the embedding model, the dimensionality of the vectors, and the nature of the source data.
What Embeddings Encode
A typical text embedding encodes:
- Semantic content — The meaning and topic of the text
- Structural information — The organization and format of the text
- Lexical features — Specific words and phrases, especially unusual or distinctive ones
- Domain signals — Indicators of the text's domain (medical, legal, technical)
This information is sufficient for several attack categories.
Attack Category 1: Adversarial Embeddings
Adversarial embeddings are vectors crafted to manipulate similarity search results. The attacker generates text that produces an embedding close to a target in vector space, even though the text's actual content differs from what the similarity score implies.
This attack enables:
- Retrieval poisoning — Injecting content that is retrieved in response to specific queries
- Payload delivery — Associating prompt injection payloads with embeddings that match legitimate queries
- Content displacement — Pushing legitimate content out of top-k results by inserting adversarial alternatives
The key challenge is generating text that is simultaneously close to the target in embedding space and carries the attacker's intended payload. This requires understanding or access to the embedding model to optimize the adversarial text.
Attack Category 2: Inversion Attacks
Embedding inversion is the process of recovering original input text from an embedding vector. While embeddings are designed as a one-way transformation, research has shown that inversion models can recover significant portions of the original text.
Inversion attacks have significant privacy implications:
- Document reconstruction — Recovering the content of documents stored only as embeddings
- Secret extraction — Extracting credentials, PII, or proprietary information from embeddings
- Data regulation compliance — Demonstrating that embeddings constitute personal data under GDPR and similar regulations
The feasibility of inversion depends on having access to the embedding model (or a similar one) and sufficient computational resources to train the inversion model.
Attack Category 3: Membership Inference
Membership inference via embeddings determines whether specific data was part of the embedding model's training set. This can reveal whether an organization's proprietary data was used to train a model without their consent, or confirm that a specific individual's data is present in a system.
Membership inference works by comparing how the model embeds "seen" data (data that was in the training set) versus "unseen" data. Models typically produce more confident (lower perplexity) and more structured embeddings for data they have seen during training.
When Each Attack Applies
| Attack | Requires Model Access | Requires DB Access | Primary Impact |
|---|---|---|---|
| Adversarial embeddings | Helpful but not required | Write access | Retrieval manipulation |
| Inversion | Same or similar model | Read access | Data extraction |
| Membership inference | Same model | Query access | Privacy violation |
Risk Assessment
When assessing embedding-level risks:
- What data is embedded? — The sensitivity of the source data determines the impact of inversion and inference attacks
- Is the embedding model known? — If the attacker can identify or obtain the embedding model, all attack categories become more feasible
- Are embeddings accessible? — Direct access to vector values (through API queries or database access) enables inversion; even without direct access, similarity scores leak information
- Are embeddings stored separately from access controls? — Embeddings stored without the access control labels of their source documents represent a privilege escalation path
Related Topics
- Adversarial Embeddings — Crafting embeddings to manipulate retrieval
- Inversion Attacks — Reconstructing text from embeddings
- Membership Inference — Detecting data presence via embeddings
- Foundations: Embeddings & Vector Systems — Technical foundations