Why do data people make everything complicated?

Put data in front of something and watch it explode in complexity

Daniel Schruhl
3 min readNov 20, 2023

I do not know where it comes from but for along time I have seen one thing happen again and again on different levels and in different ways. Once people do something in a data context, they forget completely how they used to do things or known best practices on how to do things in general.

One prominent example is product management. Once you introduce the data context into your product, people loose their cool and start doing product management in a weird way they have not done before. I have seen this go from previously writing nicely defined user stories to writing technical tasks that have no connection to any business value. Or people switch from an agile way of working with short iterations and lots of feedback loops towards a more waterfall way of working.

Another example is doing data architecture. I would say we have reached a point where microservices are well explored and people have understood when to use them and how. Or databases, we know that one central postgres would be a bad idea for all microservices to use.
Why the hell do we think it is a good idea to collect all data in one central data warehouse? Why do we think its a good idea to build one huge service that ingests, stores and processes the completeness of all the data in a huge monolith? Why don’t we use our knowledge about microservices, distributed systems, domain boundaries, highly available databases and scalability to build the corresponding data architectures and systems?

If you think about it, data mesh is giving you nothing more than microservices with domain boundaries in the data realm. People building (data) products using the data mesh and leveraging its infrastructure capabilities to do so somewhat autonomously. We did exactly that for non data products when we decided microservices, devops, ci/cd, infrastructure as code and autonomous teams are a good idea.

I once worked at Otto “the amazon of Germany”. They had a huge project which they did before I joined and still was somewhat underway while I was working there. That project decided they wanted to move away from their monolith (based on some vendor solution + customisations) they used for e-commerce. In the wake of that change, they did also reorgs where for example the previously central QA/testing team was disbanded and QA and testing capabilities moved inside each domain team. This they also did with their operations team (especially when devops was fresh and a thing). They moved infrastructure capabilities inside the teams and let people own and operate their services completely, with the help of a platform (does this sound familiar to you? — that is what is usually meant by data mesh).
THIS WAS 2016 (this article is written in 2023, 2016 was 7 years ago)!
How can data mesh be still a “controversial” and not fully understood concept?

So please, if you are working in the data space or have anything to do with data, do not stray away from best practices you are already used to or are usually doing. Just because there is data involved, does not mean everything needs to be changed. I do understand there is a somewhat caveat working with data comes with but I do strongly believe it does not change anything about everything. It just needs some small little adjustments…

If you made it this far into the text, I am interested to know your examples of this antipattern happening. Have you encountered a similar situation where people reinvented things for no reason just because they put “data” in front of it?

--

--

Daniel Schruhl
Daniel Schruhl

No responses yet