Centralized or Decentralized Analytics: From Antagonism to Symbiosis
In technology, there has long been a tug-of-war between centralized and decentralized control of platforms, approaches, and implementations. Both paradigms have validity. Centralized control assures consistency, and regulatory compliance at the corporate level. Decentralized empowerment enhances the likelihood that tech solutions will be meaningful, contextually relevant, and well-adopted at the business unit level. I have been working with SAP on this topic and wanted to share some thoughts.
In the data and analytics world today, this question is sharply in focus. The strong drive toward data-driven operations and decision making, brought on by rapid digital transformation, means that individual knowledge workers need to make analytics a part of their daily approach to work. That, in turn, drives the need for analytics solutions that work well in the context of a particular team’s domain. But it has also accelerated the deployment of analytics around the organization and heightened the need for those deployments to be coordinated and centrally visible, to assure compliance with data protection regulations.
Business units, sometimes called “domains,” have specific responsibilities (including profit and loss) and needs. Analytics isn’t merely a technological pursuit – it’s a means to the end of success and growth. Business domains understand better than anyone – including IT – the semantics of their data and the exact way it should be structured, blended, and cleansed. Domains also understand how best to apply the data for analytical use, and they need the latitude to work with their data as they please, to extract the most value from it and use it to optimize their own performance and results.
Meanwhile, IT likes to maintain control, uniformity, and consistency across systems and the way they are used in different business units. That may at first appear to be an arbitrary preference. But in many organizations, if that neat and organized environment is not maintained, IT is held accountable for it. The zeal for consistency is based on obligation, not rooted in arrogance.
Much of the time, it’s axiomatic: IT must devise and maintain an organization-wide strategy and structure for data. And while IT may not understand the minutiae of a particular domain’s data, it does and must have a working understanding of the organization’s data more broadly. Just because the business domain knows best what to do with its own data, doesn’t mean IT is unenlightened, uncaring, or willingly acting as an obstacle to the domain’s goals.
They’re Both Right
Ostensibly, these two sets of need are in conflict. But this is an issue of give-and-take, not of good and evil. Business domains need autonomy, and, at the same time, IT needs to retain meaningful control for itself.
Even if it requires suspension of disbelief, we need to consider whether both sets of needs can be served, because both constituencies’ agendas and outlooks are valid, important and, in a very real sense, correct. Neither can trump the other. They must coexist, not just passively but cooperatively.
While the equation may seem like it’s zero-sum, the truth is that customers need a balance of centralized and decentralized approaches to gathering, modeling, transforming, publishing, and analyzing data. What may look like conflicting concerns have to be blended and harmonized.
This ambidextrous imperative has given rise to architectures and paradigms – like Data Fabric, Data Mesh, and others – that recognize the dispersed nature of data, the varied needs of business units, and the need for analytics efforts around the enterprise to be organized and coordinated.
But these solutions still have a tendency towards imbalance. Data Mesh envisions a world where central IT provides and maintains data infrastructure and is in control of data governance. Virtually everything else is delegated to the business domains, including data ingest, modeling and transformation on the creating side, and the publishing, productization, evangelism and support of data sets and data feeds on the analysis side. In the world of Data Mesh, the business domains have cross-functional responsibility, and not merely self-service autonomy.
Data Fabric, on the other hand, recognizes the physical and organizational distribution of data, but can still orient towards centralized unification at the logical layer. Aggregating different data services, and virtualizing data allows it to remain in place. But data models are still universal, and all innovation and implementation work may still be centralized.
So, we are still left with the same puzzle: how in the world can these two seemingly conflicting interests be accommodated concurrently? How can we solve this problem intellectually? And then how can we find an appropriate solution, technologically?
All Together Now?
As with many things in technology, the answer lies in cooperative integration and layers of abstraction. For example, in software development, the notion of object orientation and inheritance allows for the creation of base classes, from which new subclasses – very often defined within the scope of a domain – can be derived. The subclasses can augment the base classes, change some of their behaviors and/or override elements of their structure. And if something changes in the base class, all derived classes inherit those changes.
That’s instructive, and encouraging because, in many organizations, IT must be able to design an organization-wide data model that minimally serves as a baseline template. Domain-specific models and data sets can be built atop the organizational one, though. This ensures a default level of semantic definition, including analytical hierarchies. And instead of business domains having to start from a “blank page,” they instead have a point of departure, from which they have the option to build their own semantic layer, their own data pipelines, their own materialized data sets and/or virtualized data views.
From those assets, domains can develop full-on data products. And they can promote, support, maintain and enhance them, too. These data products can then be used by other domains and can serve as platforms in much the same way as the centralized data layer can. This fulfills exactly what the Data Mesh methodology prescribes and endorses.
Seemingly, then, the path toward servicing centralized and decentralized analytics is composition, and derivation. It allows for domain-level autonomy without fragmented, siloed, detached solutions.
Give Me Some Space
In the SAP world, this layered reality and delegated authority is supported directly in the SAP Data Warehouse Cloud (DWC), through the provision of “Spaces.” Spaces can run in any cloud. Spaces easily correspond to business domains, and each can provide their own interfaces and schemas to analytics tools and users. Domain-level spaces can be built atop a centralized space provided by IT, and different business domain-level spaces can share data with one another.
This means the root-level, physical data model can coexist with domain-level logical models. This works laterally as well, so that data provided in one domain can be consumed, enriched and even differently exposed, by another. Every business domain has full control of its data corpus and the ability to create its own data products. This capability is provided without disenfranchising IT from developing and maintaining an organization-wide data framework, and without precluding other domains from having their own control, even when some data overlaps multiple domains.
The Full Complement
A Space is a container for a domain’s full set of data and semantics. Within each DWC space, domain- or audience-specific tables, views, models, and data flows can be created. Full data ingest, transformation, modeling and analytics capabilities are enabled at the domain level, rather than the business being limited to a “crippled” subset of functionality. A Space implements a complete semantic model for a domain’s data, in an environment that is independent, and yet not siloed. This gives domain-level organizations (like lines of business, or even cross-LOB teams) an unrestricted canvas to express their data semantics and develop their data models. It also prevents those teams from building their data products in isolation, and ensures visibility of their work to IT. This ensures coordination between IT and the business without imposing onerous restrictions on either one.
And where domain-savvy work is occurring, organization-wide assets are never out of reach. Connectivity to, and reuse of, assets in SAP BW is provided in DWC, avoiding duplication of efforts, and retaining the modeling and permissions set up in that environment. Connectivity to remote tables in SAP sources (including HANA, of course), and non-SAP systems (including data lakes, databases and SaaS applications), provides a similar assurance of reuse.
Remember, some 90% of all business transactions go through SAP applications, making BW an authoritative analytics repository – and remote tables a perfect transactional source – for operational business data. Both bridges facilitate domain-level independence without forfeiting reuse of enterprise-level assets.
The result: centralized and decentralized data layers don’t just coexist; they federate, cooperate and are composed, so they are consistent, even if distinct.
This game is not zero-sum. Domain-level autonomy and creative control can be granted, even while central IT involvement is retained. Domain-level environments, even when highly customized, can still be coordinated. Delegated IT needn’t be shadow IT, and innovation doesn’t have to be “rogue.” Data culture can be facilitated within business units while enterprise-wide compliance and organization remain in place. Prerequisites at the central level don’t preclude breakthroughs at the domain level.
Layering and composition is the key to balancing centralization requirements with domain level ownership and independence. Establishing true data-driven operations, across the organization and business unit by business unit, is the result.
Hello Andrew Brust !
Thank you for your interesting blog about centralizing/decentralizing analytics.
From my experience to run decentralized analytics can be an complex approach and still reaching a high overall data value for a company. You really need a very good data culture and the right organizational structure.
To be decentralized is not new in many ways. And even if SAP DWC supports this from a technological perspective I experienced this in SAP BW many times, too.
I really recommend to go deeper with Data Mesh who is interested in creating more value from data. Everyone should go and understand the four Data Mesh principles. Not because Data Mesh is now a hot idea. But it is really helpful to understand these before talking about a specific technology: