Outlook on the Data Mesh: 4 Steps for the Paradigm Shift

The "Data Mesh" is a real hype in the IT. Why companies benefit from a decentralised data architecture and how Apache Kafka helps establish this new structure is discussed in this article.

How we design IT landscapes has changed massively through microservices. I explained the advantages in the previous post, so here’s a summary: Microservices make companies more capable of action because smaller teams can usually work more efficiently and, above all, more independently of each other. There are – if the set-up is configured correctly – fewer waiting times and more flow.

What’s crucial for this is that data is available in real-time. If you’ve read Blog 1 and Blog 2 of the series, you’ll understand the advantages of real-time data processing and microservice architectures. Data Mesh is the next level.

The Data Mesh is a Game-Changer

But why is the Data Mesh a game-changer? First, the definition:

A Data Mesh is a concept for organising data architectures. This is based on the idea that responsibility for data is distributed to individual teams or domains. What does this mean more specifically?

  1. The era of centralised data processing is over.

  2. Each team is responsible for managing its own data products and services.

  3. The specialist departments make the processed information available to other teams.

When set up correctly, the scalability and flexibility of data architectures will improve.

A tangible example helps with understanding.

Imagine you post on LinkedIn. You would receive the likes and comments that your network leaves, but not live, but with a delay of several hours. Unthinkable and annoying, isn’t it?

We’re now accustomed to receiving an immediate response. But that’s by no means a given. The IT infrastructure must first be able to deliver this. Let’s stick with the LinkedIn post example. Because it gets worse.

Imagine you’ve posted something and receive countless comments and messages. So far in companies, it’s arranged that someone else collects and clusters this data for you. The problem: Unlike you, the person doesn’t know which messages are important to you and which aren’t. Under certain circumstances, the data essential to you is filtered out — or placed in false contexts. You don’t just get the data late — but also distorted.

Much better: All specialist departments may draw the most important information for themselves in real-time from the data pool. And, this is NEW: self-determined.

Data Swamp Problem
Figure 1. A team of Data Engineers ensures that all data from all services is collected centrally in a Data Lake. Since these teams cannot precisely evaluate the quality of the data, the Data Lake often becomes a Data Swamp — a quagmire of data.

But we’re still far from this offering. In the vast majority of organisations, it still works like this instead: A team dedicates itself to data preparation. These are collected and curated according to best knowledge. The problem: These data teams often cannot evaluate the quality of information nearly as precisely as the departments themselves.

Through central data teams, all other departments depend on the work of one specialist area. This isn’t just a bottleneck, but an economic risk. Because what is supposed to happen if people without domain expertise make a pre-selection of information for the corresponding departments? Right: chaos.

How Can It Be Done Better?

With a new approach. We must reorganise data management. In such a way that specialist teams are allowed to look after their own data products themselves. And they later make these verified and sorted data available to everyone else — refined.

Data Mesh Architecture
Figure 2. How it works differently: a central Data Platform Team provides a data exchange platform and the specialist departments are responsible for delivering high-quality and reliable data — and then also using it.

This doesn’t mean that central data teams will soon have no job. On the contrary: They provide the infrastructure via Apache Kafka and create an environment in which departments can efficiently juggle their relevant data. This is and remains a complex and important job on which those affected can focus even more strongly. Because: Only if the data platform can be operated easily will the departments also use it bilaterally. That is, not just pull data, but also return the clustered information.

Because in future, the departments should analyse the information, check the quality, and publish the extracted data in a better quality. These, in turn, become visible in real-time for all participants in the Data Mesh.

The Paradigm Shift

This new approach would be associated with a paradigm shift.

We must move away from a central team that evaluates data. A self-service portal is needed in which departments are responsible for processing relevant information.

  • We may no longer see data export as a burden, but as a valuable product.

  • We don’t need a monolithic approach from top to bottom, but a domain-driven system.

  • All this only works if we don’t transfer data in delayed batches, but in real-time.

With a Data Mesh, we create exactly this data infrastructure. We procure data without thousands of processing programs and without a data swamp. Instead, different teams can operate these systems with little effort and process the corresponding data for themselves and everyone else. This is the idea behind the Data Mesh — and thus the basis for us being able to understand data as internal products.

About Anatoly Zelenin
Hi, I’m Anatoly! I love to spark that twinkle in people’s eyes. As an Apache Kafka expert and book author, I’ve been bringing IT to life for over a decade—with passion instead of boredom, with real experiences instead of endless slides.

Continue reading

article-image
Data as a Product: The 6 Principles of Effective Data Management

Data is available in real-time. The infrastructure for microservices is in place. The goal is clear: We now want the Data Mesh. But how do we reach our business departments with it? This post explains why data management is a cultural and product question.

Read more
article-image
Data in a Microservice World

From start-up to corporation – more and more companies are adopting microservice architectures. In this post, you'll learn how companies use Apache Kafka to simplify communication between their services.

Read more