What is a Data Platform
Databases and its capabilities are well known but are databases the Data Platform ? Simple answer to that question is No. Data Platform is more than just Database. Database is infrastructure component, used to store your data with all necessary constraints and resilience which are minimal requirement nowadays in the enterprise.
Data Platform is built on top of several components including centrally a database. It uses Database as one of the infrastructure beneath, but provides many functional capabilities on top, which can substantially improve the business functionality in many of the applications which mostly and finally hit the Data Platform in the end.
Why Data Platform and Why Now
Whether it be banking, insurance, utilities, pharmaceuticals or any other domain, data is the net asset of the IT systems in any organization. Many would argue that it is the applications which provides business functionality and gives data the contextual meaning by molding the raw data in a domain which makes it relevant. But the truth of the matter is that in a Data Platform, we push certain aspects of the applications to the Data Platform itself making the application lighter and far more functional. This would mean that there is blurring of lines and a fusion between applications and the Data Platform.
For several years and reasons this idea had existed but did not get adopted en-masse due to the following factors in my opinion.
Database’s unfulfilled promise to grow up as a Data Platform
Initially Databases were very proprietary, developed certain extensions to query language which were not portable and did not build higher order functionality which applications could leverage. They did not support all data types, data stores and database types. They did not provide granular security features, high ingestion mechanism, self data cleansing pipeline, event sourcing, capability to build Expert Systems ( Rules ) machine learning capabilities and validations. These tasks then shifted to other platforms and applications and whole industry was created to fulfill this vacuum.
Databases and storage were very expensive
Earlier databases and storage were expensive, the technology was limited, and this put several financial and resource constraints. Application server were cheap, could be replicated and could do lot of further data processing if required, and could scale and could be replaced easily. This led to databases being a simple( read dumb ) data store and not data platform that it should have become.
Not scalable, parallel and distributed to handle complex business
Databases were not scalable beyond a certain limit, were not distributed, were hard and expensive and took years to optimize, applications on the other hand could easily be load balanced, clustered were cheaper to create and manage and even throw away if there were better solution.
Refactoring had enormous impact.
Schema changes had a wide impact, no refactoring was possible as applications had SQL littered all over them in string formats, and there was no possibility of refactoring. Read and Write operations in the CRUD applications happens with SQL on the same set of tables, any changes to these table schema impacts the applications.
Most applications were CRUD applications and all necessary data was brought to the user interface, interlinked and contextualized in the UI and decision and validation were made and data was fed back to database after that.
Database vendors used clever techniques to lock their customers in early days, this created mistrust among customers and developers, to use minimal feature set of the Database and write most of their data munching code inside the application itself. This is true somewhat today as well.
The big question is why are we debating and discussing the Data Platform NOW. Did we suddenly discovered the awesomeness of SQL and that data is our core asset in the enterprise. What happened that suddenly people talk about data platforms and big data all the time in every place and that analytics has become the de-facto hype.
In my opinion two things contributed vastly
Economics always has a big impact on the decision related to technology. Many companies could not afford the IT infrastructure that they can do now. After all, your smart phone now is as powerful as a fastest computer 15-20 years back. It was said that developers will always be expensive than the system, and hence developer productivity is of significant importance as systems becomes cheaper over time and they pay out for themselves in a very short span of time.
With the advent of cloud, you can rent the IT infrastructure as each layer of hardware, platform and software. This makes it easy for many organizations to push some of their auxiliary applications into the cloud. (I’m of the opinion that core data should be managed in the private cloud and not in the public cloud, applications in particular external facing, mobile applications should be managed in public cloud, for now at least).
Internet is bringing in new economies, and this is evident all around us. It is changing the access mechanism and shaping customer behaviour significantly. Slowly but surely all business and services would be accessible via the internet and people in large would use internet ubiquitously to access services. Even machines would provide/exchange their information to perform maintenance and bring in the kind of efficiency not seen before. (In my opinion I don’t think a toaster or fridge communication over the internet is IoT, for Smart Meters, Oil & Gas, Water Utilities, Aviation equipments, public health systems, smart cities, constitute some of the idealuse cases for IoT)
Internet also brings the data deluge, and the question of privacy which is something that all companies should think deeply on this and try to create anonymization and security of data a top priority.
But still question remains as to why we need a Data Platform. We still have capabilities to manage large amount of data for our applications and analytics and we can go along the same lines scaling our system to incorporate the new economic phenomenon observed earlier.
The question is not of technology but of strategy. In order to stay competitive and relevant you need to have a full view of enterprise at the moment and the granularity of information should be such that decisions can be made swiftly and in many cases automated. Automation of lower level tenets in business is central to organization growth in any sector, as consumption increases and brand loyalty becomes fickle with one mistake. Automation would become the key to innovation and customer satisfaction as the customer expectations are changing and workload on the employees are increasing.
The data silos in organization happened because each application in your organization works with its own database, with its own schema structure. You do not have a full holistic view of your company and therefore you ETL the entire data periodically to a warehouse to be analyzed later. This periodic batch ETL of data misses out some of the events and real time proactive reactions that an organizations could have had to the events from its most valued customer or new customer. BI tools then munch on this data to produce reports which have very little relevance with progression of time. As you not only need data but the whole situational context under which the events happened to learn from your data. Otherwise the data would not speak to you and would be irrelevant as the situation would be very different in present and in the future than in the past.
This all happened because we did not have the technology at our disposal. We had various components, but then it got IT overwhelmed with its own myriad of complexity of interlinking these components, which it was unable to manage and maintain and hence retracted to a sulking mood not to innovate, as it would bring in more and more complexity. Why do you think mobile being such a hot requirement has not taken off in the enterprise? It should have been here yesterday.
Complexity kills. Complexity sucks the life out of users, developers and IT. Complexity makes products difficult to plan, build, test and use. Complexity introduces security challenges. Complexity causes administrator frustration.
As we did not have a coherent data platform, the business and technological requirements could not be encapsulated and hence leaked out in myriads of application suit, integration suite, complex event processing, business intelligence and warehousing. The whole discussion and effort shifted to managing these systems then business requirements.
As we were limited by the constraints of infrastructure, we could not build a single holistic view the entire enterprise, so we split the functionality into several of applications and then did the plumbing to make it all work together. This lead of multiple version of truth, no business process management strategy and had multiple batch shipping data which should have been worked on yesterday.
But now we entering a technological phase where distributed computing, which was only in the realms of major research and university is becoming a universal way of doing compute. We are entering the age of architecture where we can replay data to the point of our desire, we can pause, rewind and forward into the future with predictive analytics. Our applications would not be doing just the dumb crud applications ,but would become smart applications which would have highly contextualized information at its disposal to automate or assist user to make critical decisions.
Many organizations can choose to sit on the sidelines and wait till the change is thrusted upon them and complexity overwhelms them to the point of paralysis, or they can start rethinking and building a data platform for their own organization and re-invigorate their organization with automation, learning and adaptation.