The Git-Backed Workflow

DataSurface treats your data ecosystem like software code. Collaboration, governance, and deployment are managed entirely through Git, ensuring a complete audit trail and robust validation before any change reaches production.

1. The Model Ecosystem

The entire data platform is defined by an Ecosystem object rooted in a primary Git repository (the liveRepo). This single source of truth defines:

  • Governance Zones: Federated areas of ownership (e.g., "Finance Zone", "European Zone").
  • Teams: Groups within zones that own data assets.
  • Infrastructure: Where data is stored and processed.
  • Policies: Rules for data retention, access, and classification.

Each Zone and Team can be linked to its own separate Git repository, allowing decentralized management while maintaining centralized visibility.

2. Collaborative Change Flow

Teams do not log into a UI to click buttons. They write Python code.

Step A: Clone & Branch

A data engineer clones their team's repository and creates a feature branch (e.g., feature/add-sales-data).

Step B: Modify the Model

They add a new Datastore or modify a Workspace using the DataSurface Python DSL. They can run tests locally to verify validity.

Step C: Pull Request (PR)

They push the branch and open a Pull Request. This triggers the Automated Linting System.

3. Automated Linting & Validation

Before a human even looks at the PR, DataSurface performs a deep semantic validation of the proposed changes:

  • Referential Integrity: Does the referenced database actually exist? Are the columns correct?
  • Policy Compliance: Is PII data being exposed to a public workspace? Does the data retention policy meet regulatory requirements?
  • Impact Analysis: Will this schema change break downstream consumers?

If the linter finds errors, the PR check fails. Merging is blocked until the model is compliant.

4. Deployment via Tagging

Once merged to main, the changes are effectively "in" the ecosystem but not yet live in every environment. Deployment is controlled by Runtime Environments (RTEs) using Git tags.

📁 Single Git Repository (main branch) v2.1-prod v2.3-uat PR PR PR PR PR PR PR Each platform "subscribes" to a tag pattern 🏭 PRODUCTION Platform Version Selector: v*-prod Running Version: v2.1-prod Infrastructure: AWS Prod Account Databases: Aurora, Snowflake 🧪 UAT Platform Version Selector: v*-uat Running Version: v2.3-uat Infrastructure: AWS Dev Account Databases: Smaller instances

UAT Environment

Configured to track tags like v*-uat. When ready to test, tag the repo v2.3-uat. UAT automatically picks up this version and applies changes.

Production Environment

Configured to track v*-prod. Only after UAT verification do you tag v2.3-prod. Production detects the new tag and safely rolls out changes.

Same Model, Different Versions: Both environments use the exact same Git repository. The only difference is which tag each platform is configured to track. This means UAT can test newer changes while Production remains stable on a proven version.

5. Audit Trails & Access Control

For highly regulated industries, knowing who changed what and when is critical. DataSurface leverages Git's inherent capabilities to provide a tamper-proof audit log.

  • Permissions: Access to modify specific parts of the model (e.g., the "Finance Zone") is controlled via Git repository permissions. Only authorized users can merge PRs to the finance-model repo.
  • Audit History: Every change to the data platform—from adding a column to changing a retention policy—is a Git commit. The commit log provides a permanent, searchable history of the platform's evolution.
  • Code Reviews: Changes require peer review (PR approval) before they can be merged. This enforces the "four-eyes principle" for sensitive configuration changes.

This approach eliminates "configuration drift" and unauthorized "shadow IT" changes, ensuring your data platform remains compliant and secure.