The "Day 2" Trap: Why Your New Data Platform Won't Save You

November 22, 2025 • By Billy Newport

Many CDOs expect that signing a multi-year deal for one of the big data vendor products will lead to immediate productivity gains. But the real work—and the real costs—begin after the contract is signed.

Looking back at what I built in this space in the mid-2010s, if I were starting again today, I’d need to build the exact same architecture—just on a different foundation. Maybe Iceberg on S3 rather than Parquet on HDFS. A different job scheduler instead of YARN. A catalog on top of S3/Iceberg. But mostly, nothing has changed in 10 years.

There are faster SQL engines today that can run on top of S3/Iceberg, but your CPU costs over the next few years will depend entirely on how well that specific SQL engine matches the workloads you force onto it. Because that’s what you’ve been told, right? You only need one engine, and it happens to be whichever one you just purchased.

The Reality of "Day 2"

After you sign the contract, that’s the end of Day 1. Day 2 starts now, and this is when the real costs begin. You can easily spend tens of millions of dollars per year on labor for the next five years and still end up with a mess.

If you’re lucky and have people who have done it before, you might end up with a platform that the rest of the firm can build on to save money. But more likely, you will end up with a decentralized collection of proprietary point-to-point data pipelines—"pipeline spaghetti" built around that expensive Day 1 platform you purchased.

When you leave, and your successor looks at "value for money" against the market products available in the future, decisions made now will determine whether your firm can leverage those future improvements, or whether the technical debt of your current software locks you down.

The "Logistics Layer" for Data

I built DataSurface with the hindsight of running a massive platform on HDFS/Parquet for six years. I built it to package that experience into a single, off-the-shelf solution that runs on top of what you just purchased—and on top of what your successor will purchase—without the technical debt.

Think of DataSurface as the Amazon Storefront for your data. Amazon customers order products; they don’t care whether FedEx, UPS, or Amazon’s own trucks deliver the package. If Amazon switches from FedEx to their own shipping fleet to save costs or improve speed, do the customers care? Are they consulted? No. The package just arrives.

DataSurface is that logistics layer. A data nervous system for your enterprise. It allows multiple platforms—whether Big Data Vendors, traditional databases like Oracle, SQL Server, IBM DB2, or Postgres, across multiple clouds (AWS, Azure, on-premise)—to be used interchangeably to support the requirements of your data consumers as those requirements evolve over time. It allows platforms to be swapped out or upgraded without breaking the business. Without this layer, even a simple version upgrade from a single vendor can turn into a full platform port.

With DataSurface, migration is a configuration change, not a code rewrite. It is the insurance policy that lets you safely use today’s best engines while keeping your data architecture independent, compliant, and ready for tomorrow.