Reclaiming Simplicity in Data Architecture
There’s a strange irony in the way enterprise data strategy evolves. In our pursuit of cutting-edge solutions, we often trade simplicity for complexity, clarity for newness, and proven tools for experimental stacks only to circle back years later when the dust settles.
At some point along the way, “modern” became synonymous with “complicated.” More layers. More distributed components. More real-time orchestration. More vendor-driven terminology. We were told to chase scale, agility, and flexibility. But what many teams got instead was higher cost, longer timelines, and brittle systems that are hard to explain and exorbitantly complex to maintain.
When Complexity Becomes a Default
Let's acknowledge a growing tendency in data architecture conversations to assume that scale is always the goal. So we reach for tools built for massive concurrency or streaming use cases, even when our own data volume doesn’t justify them. In many environments, this leads to solutions that are elegant on paper but overengineered in practice.
I’ve seen a pipeline built to process 3,000 daily sales transactions that involves Fivetran for ingestion, landing data in S3, triggering a Databricks job via Airflow, transforming data with dbt, syncing outputs to Snowflake, and finally visualizing it in Power BI—spanning AWS, Azure, and a third-party orchestration layer in the process. A breathtaking feat in "modern" architecture. The business impact? Delays, confusion, and unnecessary technical debt. Data and IT leaders can forget that elegance isn’t just about what a system can do, it’s about how effectively and transparently it meets the needs of the people using it.
The Case for Simplicity
Simplicity doesn’t mean basic. It means purposeful. It means choosing technologies that solve the problem without introducing fragility, cost, or maintenance burdens that outpace the value delivered.
For many use cases - particularly in finance, operations, or sales analytics - what’s needed isn’t a real-time distributed lakehouse with a semantic layer. It’s a fast, governed, relational database that works well with existing BI tools and can be maintained by a lean team. That’s not regression, it's strategic alignment.
A recurring pitfall in modern data strategy is mistaking a vendor roadmap for an enterprise roadmap. The reality is that vendors are incentivized to sell platforms. This may not necessarily help your organization make the cleanest, fastest architectural decision.
Over time, vendor ecosystems start to shape architectural thinking: what gets implemented, how performance is measured, which tools are “recommended.” While many of these tools are excellent in isolation, stacking them without clear business justification can introduce more risk than reward.
As data leaders, we should resist the pull of one-size-fits-all architectures and instead build ecosystems grounded in our actual needs. That means asking, What are we trying to solve?, What are the simplest tools to do it?, and How do we stay flexible if our needs change later?
A Better Definition of ‘Modern’
Let’s redefine what modern should mean in the context of data systems:
Understandable: Your architecture should be transparent to the people operating it—not just the engineers who built it.
Performant for purpose: Not overbuilt for theoretical scale, but right-sized for actual demand.
Maintainable: Able to evolve without a full rewrite every 18 months.
Vendor-neutral: Decisions made on capability and need, not exclusivity or bundled discounts.
Outcome-aligned: Architecture that maps directly to the business problems it’s supposed to solve.
Modern isn’t about how many tools you use. It’s about how effectively your architecture supports insight, speed, and decision-making at scale.
Pragmatism over Posture
In the end, what’s considered “modern” will always change. A decade ago, it was Hadoop. Then it was real-time streaming. Now it’s lakehouses, open tables, and AI-first data stacks. The narrative keeps evolving but the principles of good architecture remain constant.
This is not a rejection of modern tooling outright. There are absolutely cases where advanced architectures are not only appropriate, but necessary. Here are a few examples where cloud-native stacks shine:
Machine learning feature pipelines that require real-time joins across high-volume clickstream and transactional data
Multimodal data processing, such as combining structured financial data with unstructured documents or sensor feeds
Global-scale personalization or recommendation systems that demand sub-second inference and distributed retrieval
Cross-organizational data sharing and governance, where open table formats and data contracts enable interoperability and control
Elastic compute for cost optimization, where autoscaling and separation of storage/compute are non-negotiable for seasonal workloads
But for a sizable majority of reporting, forecasting, and decision-support use cases, especially in mid-sized enterprises or functional departments, relational databases, straightforward pipelines, and well-governed data marts deliver better performance, lower costs, and faster time-to-value.
Let's build with intent, and perhaps stop writing the obituary of the relational database.