11. December 2023

Data, News
#observability

11. December 2023

Scalability and Responsibility with Data Mesh

Data Mesh – Who needs it, what is it for, how does it help? he model is particularly useful for companies that have a large data requirement and want to use data as a strategic resource. In contrast to conventional models, the decentralized model treats data as an internal product.

Data Mesh is a methodical approach to managing and scaling data. It aims to democratize data access, improve agility, strengthen accountability and overcome the challenges associated with the scalability and complexity of large amounts of data.

Traditional approach

In a traditional data architecture, data management typically follows a centralized and monolithic approach. Here are some key concepts associated with the traditional approach, along with is challenges:

Centralized Data Ownership: A centralized data team or department ( "data warehouse team” ) is responsible for data management. This team is tasked with collecting, storing, processing and distributing data across the organization.
Data as a Cost Center: Data is considered a necessity and an expense, rather than a strategic asset that can be explored. For example, data is primarily used for operational purposes rather than making analytics-driven business decisions.
Complex ETL: Complex processes are often created to answer the needs of different teams and maintain a central data warehouse.
Limited Agility: Business units or internal departments must request data-related tasks to the central Data Warehouse team. Sometimes even reporting is done this way because all the knowledge of the way the data is organized and transformed is centralized in just one team, creating bottlenecks in development.
Data Governance and quality: Typically, the responsibility of data governance and maintaining data quality lies with the data warehouse team. However, this can result in a knowledge dependency on a single team.
Scalability: As data volume grows exponentially, centralized architectures face increased costs and operational complexities.

In summary a traditional approach, which relies on a centralized data team to manage the entire organization, can be very good at security and governance but it can hinder agility, scalability and even limit the analytics that could otherwise be achieved if there were no bottlenecks in either development on processing power. This could be a good approach for an organization focusing on security and control of data like governmental organizations.

Data Mesh approach

Data Mesh leans on a decentralised approach to data ownership and analytics. Because of this decentralised approach, the organization has now clearly defined teams regarding data. Usually, the team structure follows the same as the business-oriented teams. For instance, the marketing domain has its data team, the financial domain has another, and so on.

The transversal team that will support every other domain is the IT, which plays a crucial role in supporting the different teams with tools and infrastructure, as well as implementing data governance and quality measures.

These are some key concepts that differ from a traditional approach:

1. Decentralization: Data is distributed across domain-specific teams or business units. These teams are responsible for the data generated and consumed within their respective domains, fostering a sense of accountability and ownership.

2. Data as a product: Data is viewed as a first-class product. Data products are designed with the same rigor as software products, and each team is responsible for data, quality, governance, security, documentation, and so on.

3. Reduced complexity: Each team is responsible for their own data projects and needs, reducing complexity as responsibilities are distributed. The objective of each team being providing to another team (or themselves) the cleanest possible dataset.

4. High Agility: Business units and domain-specific teams are now empowered to develop and analyse data on their own, eliminating the need to rely on a central data team and reducing bottlenecks.

Data Governance and Quality: As discussed this would be the job of an IT team, this would be the point where traditional architecture wins. Enforcing data governance and quality is not an easy task when there are multiple systems instead of a central one. This is also where accountability and responsibility of every domain regarding data comes into play.

6. Scalability: Since now data is partitioned for every domain, and any given domain only possesses the data they need to perform a task, scalability is easily achieved without the overhead of a full data warehouse.

7. Data Availability: Data should be clear and available for every domain to use as they see fit. There should be layers, defined by IT, that offer different ways of accessing data. For example, API Data access layer for individual data points and a Tabular access layer for batch processing.

Keep in mind that by simply following this domain-centric logic you could end up with a data team in each domain, but this should be avoided initially due to cost considerations. Typically, the domain teams start “hiring” internally from the transversal (IT) team to develop and analyse their data. Eventually a domain specific data manager should be hired and, if there is a need for it, the domain data team would then grow organically.

Data Mesh illustrated - A simple example:

This is a very simple example of how data mesh would work in a FinTech. Let’s call our company FinTechOne. FinTechOne offers financial services to their clients, and they have 6 domains inside the organization:

The IT team, as previously discussed, would be the pillar supporting every other team and would provide:

Guidelines on how to construct data models and what nomenclature to use
Guides for best practise use of tools and models
Define what possible tools are available (example this organization only uses Microsoft cloud and IT would block any dev in any other cloud)
General support

As there is no central data warehouse, each team would develop pipelines and models for their own needs. Now let’s say that the marketing team needs operational data to contact their customers, and in turn, the analytics teams would need the marketing teams data to analyse performance metrics.

The operational team would be responsible to provide a way (individual or batch) for the marketing team to consume clean data they need, and the marketing team would be responsible for storing and making the data they create available the same way for every other team.

Data Mesh - It's all in the mix

Take note that these are only a few of the key concepts of Data Mesh and that there is no one solution fits all.

The best approach would be to take a few concepts from each methodology, a full data mesh approach may not even be possible in certain organizations. A hybrid approach is the most common one: cherry picking each concept that fits well. Also keep in mind that some types of centralized data are very useful and should stay that way, as an example take client data, this should be in a CRM that is available for everyone to consume and there should only be one single source of truth to not confuse any one domain.

In summary, a Data Mesh approach usually requires a big cultural shift to implement but has the potential to address the limitations of a traditional approach by distributing data responsibilities, improving agility, and democratizing data access, ultimately enabling organizations to better leverage their data as a strategic asset.

Do you want to develop and implement a suitable data strategy for your company?

Guest contribution by

Francisco Capa

Data, News
#observability

Share this post

What could also interest you

Baustelle aus der Sicht einer Q-Tainer Drohne.

Blog

Scalability and Responsibility with Data Mesh

Traditional approach

Data Mesh approach

Data Mesh illustrated - A simple example:

Data Mesh - It's all in the mix

Do you want to develop and implement a suitable data strategy for your company?

Guest contribution by

Francisco Capa

Share this post

What could also interest you

Mobile Infrastructure as the foundation for industrial applications: Q-Tainer and Private 5G in Action

More efficiency, clear structures and transparency in therapy with CUREosity and COCUS

Q-Tainer becomes a mobile infrastructure platform with Campus2Go

Mobile Infrastructure as the foundation for industrial applications: Q-Tainer and Private 5G in Action

connecting industries.
empowering innovation.

Work Life

Company