Feature Store Access Control

intermediate11 min readUpdated 2026-03-15

Access control strategies for feature stores: feature-level permissions, cross-team data leakage prevention, PII protection in features, service account management, and implementing least-privilege access for ML feature infrastructure.

access-control feature-store permissions pii data-leakage least-privilege rbac

Feature Store Access Control

Feature stores are designed for sharing. Their primary value proposition is that features computed by one team can be consumed by another, eliminating redundant data engineering. This sharing model creates a fundamental tension with access control: the more features are shared, the greater the risk of unauthorized data access, PII exposure, and cross-team data leakage. Getting the access control model right determines whether a feature store is a productivity tool or a compliance liability.

Access Control Dimensions

Feature stores require access control across multiple dimensions, each with different granularity needs:

Who Can Read Features

Consumer	Legitimate Access	Risk If Overly Broad
Training pipelines	Read offline store for training data	Unauthorized access to sensitive feature history
Inference services	Read online store for predictions	Real-time access to user-level data
Data scientists	Explore features for model development	PII exposure in development environments
Analytics teams	Aggregate feature statistics	Individual-level data accessed through aggregation
External partners	Shared features for joint models	Cross-organization data leakage

Who Can Write Features

Producer	Legitimate Access	Risk If Overly Broad
Feature pipelines	Write computed features to stores	Unauthorized data injection or poisoning
Materialization jobs	Sync offline to online store	Inconsistency attacks if compromised
Backfill jobs	Populate historical features	Historical data modification
Admin operations	Schema changes, corrections	Broad data modification capability

Who Can Define Features

Actor	Legitimate Access	Risk If Overly Broad
Feature engineers	Create and modify feature definitions	Unauthorized feature creation exposing sensitive data
Platform admins	Manage feature store infrastructure	Full data access through admin privileges
ML engineers	Request new features	Indirect access to data through feature requests

Feature-Level Permissions

The Granularity Problem

Most feature stores implement access control at the project or namespace level, not at the individual feature level. This means:

A user with access to a project can read ALL features in that project
Sensitive and non-sensitive features in the same project share access controls
Moving sensitive features to a separate project fragments the feature store's value

Implementing Feature-Level Access

Classify features by sensitivity

Assign sensitivity levels to each feature based on its data source and content:

Sensitivity	Examples	Access Policy
Public	Product category, day of week	Any authenticated user
Internal	Aggregate user counts, model scores	Team members only
Confidential	User demographics, transaction amounts	Specific role holders
Restricted	SSN-derived features, health indicators	Approved use cases only

Map features to access groups
Create access groups that correspond to legitimate use cases rather than organizational hierarchy. A "fraud detection" access group needs transaction features and behavioral features but not demographic features.

Implement proxy-based access control

Because most feature stores lack native feature-level permissions, implement access control at the API layer:

from typing import List, Dict
 
class FeatureAccessProxy:
    """Proxy that enforces feature-level access control
    in front of a feature store that lacks native support."""
 
    def __init__(self, feature_store, policy_engine):
        self.store = feature_store
        self.policy = policy_engine
 
    def get_features(
        self,
        entity_id: str,
        feature_names: List[str],
        requester: str
    ) -> Dict[str, any]:
        # Check access for each requested feature
        allowed_features = []
        denied_features = []
 
        for feature in feature_names:
            if self.policy.check_access(requester, feature):
                allowed_features.append(feature)
            else:
                denied_features.append(feature)
 
        if denied_features:
            self.policy.log_access_denial(
                requester, denied_features
            )
            raise AccessDenied(
                f"Access denied to features: {denied_features}"
            )
 
        return self.store.get_features(
            entity_id, allowed_features
        )

Audit feature access patterns
Log every feature access with the requester identity, features accessed, entity IDs queried, and timestamp. Review these logs for anomalous patterns.

Cross-Team Data Leakage

How Leakage Happens

Feature stores are designed to break down data silos, but this creates leakage vectors:

Direct feature access. Team A creates features from sensitive data. Team B discovers and uses these features for their models. Team B now has access to data they were not authorized to see, albeit in transformed form.

Feature composition. Team B creates a derived feature that combines Team A's sensitive feature with other data. The derived feature inherits the sensitivity of its inputs, but the feature store does not track this propagation.

Feature discovery. The feature registry allows teams to browse available features. Even feature names and descriptions can reveal sensitive information: a feature called customer_churn_risk_score reveals that the organization is tracking churn risk.

Training data reconstruction. A model trained on features from the feature store may memorize and expose feature values through prediction API probing. The feature store's access controls do not extend to the model's predictions.

Prevention Strategies

Strategy	What It Prevents	Limitation
Feature namespacing	Direct cross-team access	Does not prevent authorized sharing that leaks data
Sensitivity tagging	Inadvertent use of sensitive features	Requires accurate classification
Approval workflows	Unauthorized feature consumption	Can become a bottleneck
Feature masking	PII exposure in non-production	Adds complexity to development workflow
Lineage tracking	Unknown sensitivity propagation	Requires comprehensive lineage infrastructure

PII in Features

Where PII Appears

PII enters the feature store through multiple paths:

Path	Examples	Risk
Direct features	Name, email, SSN, date of birth	Obvious PII; should be caught by classification
Derived features	Age calculated from DOB, zip code from address	PII-derived; sensitivity inherited from source
Behavioral features	Browsing history, purchase patterns, location traces	Behavioral data that identifies individuals
Embeddings	Text embeddings of user messages, profile embeddings	PII encoded in vector representations; extractable
Aggregate features	Average spend in zip code with < 5 residents	Small-group aggregates that can identify individuals

Embedding PII Risk

PII Protection Strategies

Feature masking. Replace PII feature values with masked versions in non-production environments. Production models that need PII features access them through a separate, audited path.

Differential privacy. Add calibrated noise to features during computation. The noise preserves statistical properties for model training while preventing identification of individuals.

Tokenization. Replace PII values with tokens (pseudonymization). The token-to-PII mapping is stored in a separate, access-controlled system.

Feature-level encryption. Encrypt sensitive feature values at rest and decrypt only in the inference path. Development and analytics access see encrypted values.

Service Account Management

The Service Account Problem

Feature stores interact with many components through service accounts:

Service Account	Used By	Access Needed	Common Over-Permissioning
Materialization SA	Sync pipeline	Read offline, write online	Full read/write to both stores
Training SA	Training pipeline	Read offline store	Read access to all features including unneeded ones
Inference SA	Serving infrastructure	Read online store	Access to all entities, not just those being served
Backfill SA	Data engineering	Write offline store	Write access to all features and time ranges
Admin SA	Operations	Manage schemas and access	Full admin access to everything

Least-Privilege for Service Accounts

Enumerate all service accounts
Identify every service account that interacts with the feature store. Include CI/CD pipelines, scheduled jobs, and interactive access.
Map required permissions
For each service account, determine the minimum set of permissions required for its function. A training pipeline for fraud detection does not need access to recommendation features.
Implement scoped credentials
Create separate credentials for each use case. Use short-lived tokens (OIDC) where possible instead of long-lived API keys.
Monitor for permission drift
Regularly audit service account permissions against their documented requirements. Permissions tend to accumulate over time as new use cases are added without removing old access.

Audit and Compliance

What to Log

Event	Details to Capture	Retention
Feature read	Requester, features, entity IDs, timestamp	90 days minimum
Feature write	Writer, features, values, timestamp	1 year minimum
Schema change	Actor, change details, before/after	Indefinite
Access grant/revoke	Admin, target, permissions, timestamp	Indefinite
Access denial	Requester, denied features, reason	90 days minimum

Compliance Mapping

Regulation	Feature Store Requirement
GDPR	Right to deletion includes features; data minimization; purpose limitation
CCPA	Feature data inventory; access request fulfillment; opt-out support
HIPAA	PHI features require BAA coverage; minimum necessary access; audit trails
SOC 2	Access controls documented and tested; monitoring and alerting

References

Feast Access Control -- Feast permission model documentation
GDPR and ML Systems -- UK ICO guidance on data protection in AI
NIST Privacy Framework -- Privacy risk management

Knowledge Check

Team A creates a feature called user_spending_embedding that encodes user purchase history as a 768-dimensional vector. Team B discovers this feature in the feature store registry and uses it in their recommendation model. What security and compliance concerns does this raise?

Edit this page on GitHub

Feature Store Access Control

intermediate11 min readUpdated 2026-03-15

access-control feature-store permissions pii data-leakage least-privilege rbac

Feature Store Access Control

Access Control Dimensions

Feature stores require access control across multiple dimensions, each with different granularity needs:

Who Can Read Features

Consumer	Legitimate Access	Risk If Overly Broad
Training pipelines	Read offline store for training data	Unauthorized access to sensitive feature history
Inference services	Read online store for predictions	Real-time access to user-level data
Data scientists	Explore features for model development	PII exposure in development environments
Analytics teams	Aggregate feature statistics	Individual-level data accessed through aggregation
External partners	Shared features for joint models	Cross-organization data leakage

Who Can Write Features

Producer	Legitimate Access	Risk If Overly Broad
Feature pipelines	Write computed features to stores	Unauthorized data injection or poisoning
Materialization jobs	Sync offline to online store	Inconsistency attacks if compromised
Backfill jobs	Populate historical features	Historical data modification
Admin operations	Schema changes, corrections	Broad data modification capability

Who Can Define Features

Actor	Legitimate Access	Risk If Overly Broad
Feature engineers	Create and modify feature definitions	Unauthorized feature creation exposing sensitive data
Platform admins	Manage feature store infrastructure	Full data access through admin privileges
ML engineers	Request new features	Indirect access to data through feature requests

Feature-Level Permissions

The Granularity Problem

Most feature stores implement access control at the project or namespace level, not at the individual feature level. This means:

A user with access to a project can read ALL features in that project
Sensitive and non-sensitive features in the same project share access controls
Moving sensitive features to a separate project fragments the feature store's value

Implementing Feature-Level Access

Classify features by sensitivity

Assign sensitivity levels to each feature based on its data source and content:

Sensitivity	Examples	Access Policy
Public	Product category, day of week	Any authenticated user
Internal	Aggregate user counts, model scores	Team members only
Confidential	User demographics, transaction amounts	Specific role holders
Restricted	SSN-derived features, health indicators	Approved use cases only

Map features to access groups
Create access groups that correspond to legitimate use cases rather than organizational hierarchy. A "fraud detection" access group needs transaction features and behavioral features but not demographic features.

Implement proxy-based access control

Because most feature stores lack native feature-level permissions, implement access control at the API layer:

from typing import List, Dict
 
class FeatureAccessProxy:
    """Proxy that enforces feature-level access control
    in front of a feature store that lacks native support."""
 
    def __init__(self, feature_store, policy_engine):
        self.store = feature_store
        self.policy = policy_engine
 
    def get_features(
        self,
        entity_id: str,
        feature_names: List[str],
        requester: str
    ) -> Dict[str, any]:
        # Check access for each requested feature
        allowed_features = []
        denied_features = []
 
        for feature in feature_names:
            if self.policy.check_access(requester, feature):
                allowed_features.append(feature)
            else:
                denied_features.append(feature)
 
        if denied_features:
            self.policy.log_access_denial(
                requester, denied_features
            )
            raise AccessDenied(
                f"Access denied to features: {denied_features}"
            )
 
        return self.store.get_features(
            entity_id, allowed_features
        )

Audit feature access patterns
Log every feature access with the requester identity, features accessed, entity IDs queried, and timestamp. Review these logs for anomalous patterns.

Cross-Team Data Leakage

How Leakage Happens

Feature stores are designed to break down data silos, but this creates leakage vectors:

Prevention Strategies

Strategy	What It Prevents	Limitation
Feature namespacing	Direct cross-team access	Does not prevent authorized sharing that leaks data
Sensitivity tagging	Inadvertent use of sensitive features	Requires accurate classification
Approval workflows	Unauthorized feature consumption	Can become a bottleneck
Feature masking	PII exposure in non-production	Adds complexity to development workflow
Lineage tracking	Unknown sensitivity propagation	Requires comprehensive lineage infrastructure

PII in Features

Where PII Appears

PII enters the feature store through multiple paths:

Path	Examples	Risk
Direct features	Name, email, SSN, date of birth	Obvious PII; should be caught by classification
Derived features	Age calculated from DOB, zip code from address	PII-derived; sensitivity inherited from source
Behavioral features	Browsing history, purchase patterns, location traces	Behavioral data that identifies individuals
Embeddings	Text embeddings of user messages, profile embeddings	PII encoded in vector representations; extractable
Aggregate features	Average spend in zip code with < 5 residents	Small-group aggregates that can identify individuals

Embedding PII Risk

PII Protection Strategies

Feature masking. Replace PII feature values with masked versions in non-production environments. Production models that need PII features access them through a separate, audited path.

Differential privacy. Add calibrated noise to features during computation. The noise preserves statistical properties for model training while preventing identification of individuals.

Tokenization. Replace PII values with tokens (pseudonymization). The token-to-PII mapping is stored in a separate, access-controlled system.

Feature-level encryption. Encrypt sensitive feature values at rest and decrypt only in the inference path. Development and analytics access see encrypted values.

Service Account Management

The Service Account Problem

Feature stores interact with many components through service accounts:

Service Account	Used By	Access Needed	Common Over-Permissioning
Materialization SA	Sync pipeline	Read offline, write online	Full read/write to both stores
Training SA	Training pipeline	Read offline store	Read access to all features including unneeded ones
Inference SA	Serving infrastructure	Read online store	Access to all entities, not just those being served
Backfill SA	Data engineering	Write offline store	Write access to all features and time ranges
Admin SA	Operations	Manage schemas and access	Full admin access to everything

Least-Privilege for Service Accounts

Enumerate all service accounts
Identify every service account that interacts with the feature store. Include CI/CD pipelines, scheduled jobs, and interactive access.
Map required permissions
For each service account, determine the minimum set of permissions required for its function. A training pipeline for fraud detection does not need access to recommendation features.
Implement scoped credentials
Create separate credentials for each use case. Use short-lived tokens (OIDC) where possible instead of long-lived API keys.
Monitor for permission drift
Regularly audit service account permissions against their documented requirements. Permissions tend to accumulate over time as new use cases are added without removing old access.

Audit and Compliance

What to Log

Event	Details to Capture	Retention
Feature read	Requester, features, entity IDs, timestamp	90 days minimum
Feature write	Writer, features, values, timestamp	1 year minimum
Schema change	Actor, change details, before/after	Indefinite
Access grant/revoke	Admin, target, permissions, timestamp	Indefinite
Access denial	Requester, denied features, reason	90 days minimum

Compliance Mapping

Regulation	Feature Store Requirement
GDPR	Right to deletion includes features; data minimization; purpose limitation
CCPA	Feature data inventory; access request fulfillment; opt-out support
HIPAA	PHI features require BAA coverage; minimum necessary access; audit trails
SOC 2	Access controls documented and tested; monitoring and alerting

References

Feast Access Control -- Feast permission model documentation
GDPR and ML Systems -- UK ICO guidance on data protection in AI
NIST Privacy Framework -- Privacy risk management

Knowledge Check

Edit this page on GitHub

Feature Store Access Control

Classify features by sensitivity

Map features to access groups

Implement proxy-based access control

Audit feature access patterns

Enumerate all service accounts

Map required permissions

Implement scoped credentials

Monitor for permission drift

Related articles

Feature Store Access Control

Classify features by sensitivity

Map features to access groups

Implement proxy-based access control

Audit feature access patterns

Enumerate all service accounts

Map required permissions

Implement scoped credentials

Monitor for permission drift

Related articles