Feature Store Security (Llmops Security)
Security overview of ML feature stores (Feast, Tecton, Vertex Feature Store): architecture and trust model, attack surfaces in online and offline stores, and the security implications of centralized feature management for ML systems.
Feature Store Security
Feature stores centralize the computation, storage, and serving of ML features -- the input variables that models use for predictions. By providing a single source of truth for features across training and inference, feature stores solve the training-serving skew problem. They also create a single point of attack. A compromised feature store affects every model that consumes its features, making it a high-leverage target for adversaries seeking to manipulate model behavior at scale.
Feature Store Architecture
Core Components
A feature store consists of several interconnected components, each with its own security properties:
| Component | Purpose | Security Relevance |
|---|---|---|
| Offline store | Historical feature data for training | Training data integrity; contains full feature history |
| Online store | Low-latency feature serving for inference | Real-time prediction manipulation; directly affects live traffic |
| Feature transformation pipeline | Computes features from raw data | Transformation logic manipulation; controls what models see |
| Feature registry | Metadata about available features | Feature discovery; exposes data schema and relationships |
| Materialization pipeline | Syncs features from offline to online store | Consistency enforcement; sync compromise creates skew |
Data Flow
Raw data sources -> Feature transformation -> Offline store -> Materialization -> Online store
| |
v v
Training pipeline Inference API
An attacker can intervene at any point in this flow. The impact varies:
- Transformation manipulation affects both training and inference data
- Offline store poisoning affects future model training
- Online store poisoning affects real-time predictions immediately
- Materialization manipulation creates training-serving skew that causes subtle prediction errors
Platform Comparison
Feast (Open Source)
Feast is the most widely deployed open-source feature store.
Architecture: Feast acts as a thin serving layer over existing data infrastructure. The offline store is typically a data warehouse (BigQuery, Redshift, Snowflake) and the online store is a key-value store (Redis, DynamoDB, SQLite).
Security model:
- Authentication depends on the underlying data stores (cloud IAM, database credentials)
- No built-in access control at the feature level
- Feature definitions are code in a Git repository
- The Feast server itself has no authentication in default configuration
Key weaknesses:
- Default deployment has no authentication
- Feature-level access control is not supported
- Materialization jobs run with the permissions of the Feast service account
- Online store credentials are shared across all feature consumers
Tecton
Tecton is a managed feature platform that adds enterprise features on top of similar concepts.
Security model:
- Workspace-based access control
- Role-based permissions for feature management
- Audit logging for feature access
- Network isolation options (VPC, private endpoints)
Key weaknesses:
- Workspace-level granularity may be too coarse
- Service accounts for materialization have broad data access
- Feature transformation code executes in Tecton-managed infrastructure
- Cross-workspace feature sharing may violate data boundaries
Vertex Feature Store (Google Cloud)
Vertex Feature Store is Google Cloud's managed feature store integrated with Vertex AI.
Security model:
- Google Cloud IAM for access control
- VPC Service Controls for network isolation
- Data encryption at rest and in transit
- Audit logging via Cloud Audit Logs
Key weaknesses:
- IAM policies may be overly permissive
- Feature store access is often granted at the project level rather than per-feature
- BigQuery offline store access may not be independently controlled
- Service accounts used for feature ingestion may have broad permissions
Attack Surface Analysis
Online Store Attacks
The online store serves features for real-time inference. Manipulating online store data directly affects model predictions.
Direct value manipulation. If an attacker gains write access to the online store (Redis, DynamoDB), they can modify feature values for specific entities. The model receives manipulated features and produces attacker-influenced predictions without any change to the model itself.
Cache poisoning. If the online store caches feature values, poisoning the cache causes incorrect features to be served until the cache is refreshed. The offline store may contain correct values, but the online store serves poisoned ones.
Availability attacks. Taking the online store offline forces models to either fail (if features are required) or fall back to default values (if configured). Either outcome degrades model quality and may create exploitable behavior.
Offline Store Attacks
The offline store contains historical feature data used for training. Poisoning the offline store affects future model training.
Historical data modification. Modifying historical feature values changes the training data for future models. Because feature stores maintain time-series data, an attacker can modify specific time windows to target particular training periods.
Schema manipulation. Changing feature schemas (data types, value ranges, default values) can cause training pipeline failures or subtle data corruption that is not caught by type checks.
Feature Transformation Attacks
Feature transformations convert raw data into model-ready features. Compromising transformations affects all downstream consumers.
Logic modification. Changing the transformation logic (e.g., altering a normalization formula, changing a categorical encoding, or modifying a feature derivation) affects all features computed by that transformation.
Dependency injection. Feature transformations may import external libraries. Compromising these libraries compromises the transformation pipeline.
Training-Serving Skew as Attack Vector
Training-serving skew is normally a bug. An attacker can weaponize it by intentionally creating skew.
Intentional Skew Attacks
| Technique | Mechanism | Effect |
|---|---|---|
| Offline-only poisoning | Modify offline store but not online store | Model trains on poisoned data but serves with clean features; model performance degrades unpredictably |
| Online-only poisoning | Modify online store but not offline store | Model trained on clean data receives unexpected feature distributions at inference; prediction quality drops |
| Selective poisoning | Poison features for specific entities only | Targeted prediction manipulation for specific users or items |
| Temporal poisoning | Modify features for specific time windows | Future training runs that include the poisoned window produce compromised models |
Data Protection Concerns
PII in Features
Feature stores commonly contain personally identifiable information:
- User demographics used as model features
- Transaction histories
- Location data
- Behavioral profiles
- Text embeddings that can be reversed to recover original content
A compromised feature store leaks PII for every entity in the store, which may include all users of the ML system.
Cross-Team Data Exposure
Feature stores are designed for feature sharing across teams. This creates data exposure risks:
- Team A creates features from sensitive data and shares them
- Team B consumes the features without understanding the sensitivity
- The feature store becomes a side channel for accessing data that Team B should not have
Data Residency and Compliance
Feature stores that replicate data across regions for latency (online store replication) may violate data residency requirements. Features computed from EU user data replicated to a US online store may violate GDPR data transfer restrictions.
Defensive Recommendations
Access Control
- Implement feature-level access control, not just store-level
- Use separate service accounts for read and write operations
- Restrict feature discovery (registry access) to authorized teams
- Audit who accesses which features and how often
Data Integrity
- Hash feature values during materialization and verify at serving time
- Monitor feature distributions for unexpected shifts
- Implement data quality checks in the transformation pipeline
- Maintain immutable audit logs of all feature modifications
Network Security
- Isolate the online store from direct external access
- Use private endpoints for feature store APIs
- Restrict materialization pipeline network access
- Monitor for unusual feature access patterns
Compliance
- Classify features by sensitivity (PII, financial, health)
- Implement data masking for sensitive features in non-production environments
- Track feature lineage back to source data for compliance auditing
- Enforce data residency requirements in online store replication
Related Topics
- Feature Poisoning -- Specific feature poisoning attack techniques
- Feature Store Access Control -- Detailed access control strategies
An attacker modifies feature values in the online store (used for inference) but not the offline store (used for training). What type of attack is this, and what is the expected impact?