Automated History (SCD2)

True time travel for your data with zero coding effort.

What is SCD2?

Slowly Changing Dimension Type 2 (SCD2) is a data warehousing technique used to track historical data. Unlike standard updates that overwrite existing data (SCD1), SCD2 keeps a full history of changes.

DataSurface automates this pattern, adding two system columns to your data to enable point-in-time analysis: batch_in and batch_out.

How It Works

Every record has a validity window defined by batch numbers. When a record is created, it is valid from the ingestion batch until the end of time (MAXINT). When it is updated, the old version is "closed" (its batch_out is set to the previous batch ID), and a new version is inserted with the new batch ID as its batch_in.

Example: Tracking Employee Changes

Consider a Person table with the following schema:

ColumnDescription
idUnique identifier
firstNameFirst Name
surnameLast Name
dobDate of Birth
addressCurrent Address
employerCurrent Employer

Step 1: Initial Batch (Batch 100)

We ingest a new employee, John Doe.

idfirstNamesurnameaddressemployerbatch_inbatch_out
101JohnDoe123 Main StAcme Corp100MAXINT

Status: John works at Acme Corp and lives at 123 Main St.

Step 2: Update (Batch 105)

John moves to a new apartment. We receive an update.

idfirstNamesurnameaddressemployerbatch_inbatch_out
101JohnDoe123 Main StAcme Corp100104
101JohnDoe456 Elm StAcme Corp105MAXINT

Status: The old record is closed (valid 100-104). The new record is now active (105-MAXINT).

Step 3: Update (Batch 120)

John changes jobs to Beta Inc.

idfirstNamesurnameaddressemployerbatch_inbatch_out
101JohnDoe123 Main StAcme Corp100104
101JohnDoe456 Elm StAcme Corp105119
101JohnDoe456 Elm StBeta Inc120MAXINT

Step 4: Deletion (Batch 135)

John leaves the system (record deleted in source).

idfirstNamesurnameaddressemployerbatch_inbatch_out
101JohnDoe123 Main StAcme Corp100104
101JohnDoe456 Elm StAcme Corp105119
101JohnDoe456 Elm StBeta Inc120134

Status: No active record exists (all have a batch_out). However, the history remains for forensic analysis.

Why This Matters

With SCD2, you can ask questions like:

  • "Where did John live when he worked for Acme Corp?"
  • "How many employees did we have as of Batch 110?"

DataSurface manages this complexity automatically, ensuring your analytical data is always accurate and audit-ready.