As mentioned on the first article, data is the essential component to power all of the Intelligent Enterprise machineries. And, as of any type fuel, and it needs to be of high quality to guarantee maximum performance.
Currently, it’s estimated that up to 80% of data analysts’ time is spent finding, formatting and integrating data instead of analyzing it.
That’s not surprising considering that, in most organizations, data is acquired, processed and stored in silos (departments, work groups, etc). And, more often than not, those silos have their own tools, processes, models and rules that might completely differ from other organizations in the same company.
This generates tons of data duplication, inconsistencies and non-reusable data, which leads to inefficient data projects and a dark pit of hidden costs generated by overhead and redundancy.
That’s alarming when we consider that the data available to us doubles every 2 years. By 2020 the world will have 40 zettabytes of data available for processing, a number that will grow to 180 zettabytes by 2025. With growth magnitudes as high as these, it’s essential to automate some steps of the data process or it won’t be possible to analyze all that data in a timely manner.
DataOps aims to provide the tools and processes to allow organizations to cope with this significant increase in data.
It was heavily influenced from DevOps, where development teams sought to automate many steps of the development process, allowing them to release software faster and more reliably. DevOps also enabled collaboration between teams that, up to that point, had always worked in silos.
DataOps, in the words of Andy Palmer from TAMR, “acknowledges the interconnected nature of data engineering, data integration, data quality and data security/privacy — and aims to help an organization rapidly deliver data that accelerates analytics and enables previously impossible analytics”.
So from the very nature of DataOps, all of the disciplines are interconnected and collaboration between is necessary to achieve the desired harmony.
But there’s still a big piece missing. As in DevOps, DataOps seeks to automate and monitor data-related pipelines (ETL, cleansing, anonymization, quality checks, model training, etc). That’s the operations part of the deal.
DataOps aims to increase speed, quality and reliability of the data and the analytic processes around it by improving the coordination between data science/analytics/data engineering and operations. It recognizes that there will be many data connections and pipelines to be managed, and the only way to keep up with the demand, meeting quality and reliability requirements, is via repeatable and testable workflows.
SAP Data Intelligence
SAP Data Intelligence is a cloud-native data sharing, pipelining, and orchestration solution that helps companies accelerate and expand the flow of data across their modern, diverse data landscapes.
It provides visibility and access to a broad range of data systems and assets; allows the easy and fast creation of powerful, organization-spanning data pipelines; and optimizes data pipeline execution speed with a “push-down” distributed processing approach at each step.
Furthermore, it contains central metadata and dataset management capabilities, with embedded data preparation and quality rules, allowing the entire organization to search and utilize quality data wherever they may be inside the organization, effectively helping breaking down silos and data duplication.
SAP Data Intelligence’s pipelines are equipped with powerful operators that can execute and handle:
- data transformation
- database and data lake connections
- data quality checks and anonymization
- streaming processing
With the recent addition of the Machine Learning Services, data science and machine learning modeling can be scaled to enterprise-wide levels, allowing collaboration between data scientists and business teams, while using pipelines for automating ML model training and publishing.
It is the all-in-one tool for enterprise-wide data discovery, orchestration and collaboration, alongside strong machine learning and data science capabilities.
SAP Data Intelligence allows seamless collaboration between all the different disciplines in the data world (engineering, security, quality and integration) while adding capabilities for scheduling, automation and monitoring of workflows and pipelines, matching all pieces of DataOps together.
The intention of this blog post is not to serve as a tutorial for SAP Data Intelligence. So, for further information on tutorials, capabilities and features, please check the amazing articles written by other colleagues from SAP here, here and here.
With this article I hoped to explain a little about the problem organizations are facing with the exponential growth of data, how DataOps can help them cope with this problem to leverage most out of the data they acquire, and how SAP Data Intelligence is the tool to help them achieve the level of orchestration and collaboration required to thrive in such a data-driven world.
See you next time!
Senior Enterprise Architect – SAP Business Transformation Services
Connect with me on LinkedIn.
Learn about how our Innovation & Advisory services can help you and your business run better. Check out the SAP Business Transformation Services page.