How Feature Stores Actually Work

One of ML production's biggest traps — training-serving skew. Subtle differences between how a feature is computed at training time vs serving time crater accuracy. Feature stores are the fix. This guide covers online vs offline, point-in-time correctness, and how Feast / Tecton actually work.

What Is a Feature

Raw data → numeric features:

raw: user had 5 clicks yesterday, logged in 30 min ago, age 28
features:
  user_clicks_24h = 5
  minutes_since_login = 30
  user_age = 28
  user_age_bucket = "26-35"  (one-hot)

Training:
  data → compute features (batch) → model trains

Serving (production):
  user request → compute the same features → model predicts
                            ↑
                  any drift here from training = skew

Common skew causes:
- training: Spark sliding-window compute
- serving: Python query elsewhere → different result
- training: timezone UTC
- serving: forgot timezone conversion
- training: NULL → 0
- serving: NULL passes through → exception or different value

The Feature Store Answer

Define feature computation in one place; guarantee training and
serving share the same computation.

Definition (Feast example):
  @on_demand_feature_view(
    sources=[transactions_source],
    schema=[Field(name="user_clicks_24h", dtype=Int64)],
  )
  def user_clicks_24h_feat(features):
    # The same logic runs during training and at serving
    return features.groupby("user_id")["clicks"].sum()

→ Generate training data with this function + call the same function at serving
→ Skew = 0

Online vs Offline Store

A feature store has two storage backends:

Offline Store (batch):
- Large historical data (years)
- For training-data generation
- Usually a warehouse (BigQuery / Snowflake) or lake (Parquet on S3)
- Bulk reads, low latency not needed

Online Store (low-latency):
- Only current features (latest per user)
- For real-time serving predictions
- Redis / DynamoDB / Cassandra
- < 10ms read latency required

A batch ETL job syncs offline → online:
- Every minute / hour / day
- Some streaming (CDC + Kafka → online)

→ The same feature lives in both stores; sync is the key.

Point-in-Time Correctness

Biggest training trap — "future info as a feature".

Example: making "last 7-day clicks" feature for 2026-03-15
   - WRONG: compute with current (2026-05-27) data → also includes
            post-2026-03-15 clicks → "data leakage"
   - RIGHT: only data knowable as of 2026-03-15

→ Features need "as-of" timestamps.

Feast's point-in-time join:
  training_data = entity_df (user_id, event_timestamp)
  feature_df = feature_store.get_historical_features(
    entity_df=training_data,
    features=["user_clicks_24h"]
  )

  → Joins features valid as of each row's event_timestamp
  → Blocks future info automatically

This "time-travel join" is the feature store's defining capability.
Writing it directly in SQL is possible but enormous (dozens to hundreds of lines).

Feature Store Tools

Tool	Characteristics
Feast (OSS)	Most popular OSS; BigQuery/Snowflake/Redshift offline + Redis/DynamoDB online
Tecton (SaaS)	From Feast's commercial founder, strong on streaming features
Databricks Feature Store	Databricks-integrated, Delta Lake-based
Vertex AI Feature Store	GCP managed, BigQuery-integrated
SageMaker Feature Store	AWS managed, online (DynamoDB) + offline (S3)
Hopsworks	OSS + commercial, strong Spark/Flink support

Why Feature Stores Are Infrastructure

Why a library alone won't do:

1. Online store HA (24/7, low latency)
2. Offline store historical scan (TBs)
3. Sync between both stores (consistency)
4. Sharing across models / teams (feature reuse)
5. Governance — which feature is used by which model
6. Permissions — access control for sensitive features (PII)
7. Discovery — "has someone already made a similar feature?"

→ "feature-definition library + two DBs + sync jobs + UI + access
   control" = company infrastructure.

Small teams (1–2 models) may over-engineer. Consider adoption at 5+
models or 100+ features.

Streaming Features

When real-time features (like "last 1 minute clicks") are needed:

Flow:
  user click event → Kafka → Flink (window aggregation) → online store

Complex example (fraud detection):
  "count of transactions on this card in the last 10 minutes"
  → updated on each transaction
  → fresh value used at serving

→ Tecton, Hopsworks etc. support streaming features as first-class.

Traps:
- Streaming aggregation watermarks / late events (separate guide)
- Hard to guarantee exact match between historical training and streaming
- Streaming infra cost

Common Pitfalls

Features inside the training script — rewritten at serving → skew. From day one, separate functions / a feature store.
No point-in-time joins — future leakage → great offline accuracy, broken production.
Stale data in the online store — sync interval too long → decisions use old features.
Forced "company-wide feature store" — over-engineering for small teams. Consider at 5+ teams / 100+ features.
Unclear feature ownership — who defines / maintains / has change rights? Define governance.

Wrap-up

Feature stores at their core — guarantee identical computation of features at training and serving + point-in-time joins to prevent leakage + separated online/offline storage. The systematic fix for ML production's biggest trap (skew).

Practical — small ML projects (1–2 models) can stick to plain functions. At 5+ models or 100+ features → Feast (OSS) or cloud-managed (Vertex / SageMaker). Streaming-heavy → Tecton / Hopsworks.