Automated History (SCD2)
True time travel for your data with zero coding effort.
What is SCD2?
Slowly Changing Dimension Type 2 (SCD2) is a data warehousing technique used to track historical data. Unlike standard updates that overwrite existing data (SCD1), SCD2 keeps a full history of changes.
DataSurface automates this pattern, adding two system columns to your data to enable point-in-time analysis: batch_in and batch_out.
How It Works
Every record has a validity window defined by batch numbers. When a record is created, it is valid from the ingestion batch until the end of time (MAXINT). When it is updated, the old version is "closed" (its batch_out is set to the previous batch ID), and a new version is inserted with the new batch ID as its batch_in.
Example: Tracking Employee Changes
Consider a Person table with the following schema:
| Column | Description |
|---|---|
| id | Unique identifier |
| firstName | First Name |
| surname | Last Name |
| dob | Date of Birth |
| address | Current Address |
| employer | Current Employer |
Step 1: Initial Batch (Batch 100)
We ingest a new employee, John Doe.
| id | firstName | surname | address | employer | batch_in | batch_out |
|---|---|---|---|---|---|---|
| 101 | John | Doe | 123 Main St | Acme Corp | 100 | MAXINT |
Status: John works at Acme Corp and lives at 123 Main St.
Step 2: Update (Batch 105)
John moves to a new apartment. We receive an update.
| id | firstName | surname | address | employer | batch_in | batch_out |
|---|---|---|---|---|---|---|
| 101 | John | Doe | 123 Main St | Acme Corp | 100 | 104 |
| 101 | John | Doe | 456 Elm St | Acme Corp | 105 | MAXINT |
Status: The old record is closed (valid 100-104). The new record is now active (105-MAXINT).
Step 3: Update (Batch 120)
John changes jobs to Beta Inc.
| id | firstName | surname | address | employer | batch_in | batch_out |
|---|---|---|---|---|---|---|
| 101 | John | Doe | 123 Main St | Acme Corp | 100 | 104 |
| 101 | John | Doe | 456 Elm St | Acme Corp | 105 | 119 |
| 101 | John | Doe | 456 Elm St | Beta Inc | 120 | MAXINT |
Step 4: Deletion (Batch 135)
John leaves the system (record deleted in source).
| id | firstName | surname | address | employer | batch_in | batch_out |
|---|---|---|---|---|---|---|
| 101 | John | Doe | 123 Main St | Acme Corp | 100 | 104 |
| 101 | John | Doe | 456 Elm St | Acme Corp | 105 | 119 |
| 101 | John | Doe | 456 Elm St | Beta Inc | 120 | 134 |
Status: No active record exists (all have a batch_out). However, the history remains for forensic analysis.
Why This Matters
With SCD2, you can ask questions like:
- "Where did John live when he worked for Acme Corp?"
- "How many employees did we have as of Batch 110?"
DataSurface manages this complexity automatically, ensuring your analytical data is always accurate and audit-ready.