How DataSurface Implements True "Shift Left" with Data Contracts — Enforcing Compatibility and Governance at the Source

December 30, 2025 • By Billy Newport

In the world of modern data engineering, the shift left philosophy has gained significant traction. It means moving responsibility for data quality, compatibility, and governance as far "upstream" as possible — to the point where data is produced — rather than waiting for downstream consumers to discover and suffer from issues.

While many discussions around shift left focus on data contracts for validation, quality checks, and semantic alignment at ingestion (often in streaming or data mesh contexts), DataSurface takes this concept even further with a uniquely robust, bilateral enforcement model. By integrating data contracts directly into pull request (PR) workflows, DataSurface ensures that both producers and consumers actively participate in maintaining trustworthy data ecosystems.

What Makes DataSurface's Approach Stand Out

DataSurface acts as a model-driven logistics layer for data — often described as the "Fedex for Data" — that connects producers to consumers across clouds with strong governance, without vendor lock-in using the best pipeline stacks currently available to meet the consumers current needs. As new pipeline stacks arrives, Datasurface migrated consumers to be 'best' pipeline tech automatically. DataSurface wants to manage the logistics of managing the data movement according to consumer wishes and DataSurface takes advantage of better technology over time to service those consumers.

At its core, DataSurface uses data contracts not just as documentation, but as enforceable, living agreements enforced through CI/CD checks. This creates a system where breaking changes are caught early, deprecations trigger necessary conversations, and access is explicitly controlled.

Here are the key mechanisms:

1. Backward Compatibility Enforcement — Keeping Producers Honest

When a data producer submits a PR that modifies a datastore or dataset they own:

  • The PR is automatically linted against the existing schema.
  • Any non-backward-compatible change (e.g., removing a field, changing types in a breaking way) fails the build.

This is pure shift left in action: the producer learns immediately — before merging — that their change would break downstream consumers. No more "surprise" breakages weeks later in analytics pipelines or ML models.

The philosophy here is simple yet powerful:

Once a producer advertises a dataset in a certain form, they must evolve it in a forward-compatible way. Consumers shouldn't have to chase producers for explanations — the system prevents the breakage upfront.

2. Deprecation as a Controlled, Conversation-Forcing Process

Data producers can mark datasets as deprecated in stages:

  • Soft deprecation: New Workspaces (consumer environments) are prevented from starting to use the dataset.
  • Hard deprecation: Existing consumers are strongly discouraged (or blocked) from continued usage.

Here's where it gets interesting:

  • Consumers can configure their Workspace policy to reject deprecated data entirely.
  • If a producer then marks a dataset as deprecated, any affected consumer's dependency causes PR failures in the producer's repo.

This forces a structured conversation:

  • Why is this being deprecated?
  • Is there a migration path?
  • Can the consumer accept the risk or timeline?

The producer cannot silently break things — the consumer's policy acts as a veto, shifting accountability both ways.

3. Producer-Controlled Access — Requiring Explicit Approval

The relationship works bidirectionally.

Producers can flag certain datasets with a policy that says:

"Any new consumer must get my explicit approval before using this data."

When a consumer tries to add such a dataset to their Workspace via PR:

  • The PR fails automatically.
  • The producer is notified and can add the specific Workspace to an approved list.
  • Only then can the consumer proceed.

This creates a lightweight approval gate that:

  • Prevents uncontrolled sprawl
  • Ensures producers know who's depending on their data
  • Forces onboarding discussions about SLAs, usage patterns, and expectations

It's essentially mutual consent for critical data dependencies.

4. Consumer-Driven Requirements: Retention and Latency

Consumers aren't passive — they declare their needs in the Workspace configuration:

  • Retention policies (how long data must be kept available)
  • Latency requirements (how fresh the data must be)

These directly influence:

  • The data platform's pipeline configuration
  • How frequently DataSurface pulls/refreshes data from the producer

This closes the loop: consumer expectations aren't just wishes — they become operational constraints enforced by the platform.

DataSurface allows multiple data platforms with different technology stacks to be plugged in concurrently and the pipelines supporting sets of consumers can be mapped to these.

Why This Matters in a Data Mesh / Decentralized World

Traditional data meshes rely heavily on social contracts, catalogs, and good intentions. DataSurface adds teeth to those contracts through automated enforcement in PRs.

The result is a system that:

  • Prevents breakage before it happens
  • Makes dependencies explicit and reviewable
  • Forces healthy communication between domains
  • Reduces "pipeline spaghetti" and technical debt
  • Scales governance without central teams micromanaging

In short: DataSurface doesn't just talk about shift left — it engineers it into the development workflow, creating a more reliable, self-regulating data supply chain.

If you're building a data mesh, dealing with frequent schema breaks, or trying to establish real governance in a decentralized organization, the combination of enforceable backward compatibility, staged deprecation, mutual approval gates, and consumer-specified SLAs might be exactly the missing piece.

Have you implemented something similar in your organization? How do you handle schema evolution and deprecation today? I'd love to hear your thoughts!