Exploratory Analytics: helping analysts to automatically find insight in data
On November 13th and 14th 2014, artist Sven Sachsalber spent two days searching (and finally finding) a needle in a haystack at the Palais de Tokyo modern art museum in Paris (http://www.palaisdetokyo.com/en/events/sven-sachsalber).
When we are in front of a new dataset we sometimes feel like we are in front of a haystack: will we ever find the information we hope for? How long will it take to find it? And is there really the information we are expecting in it? Could the dataset hide some unexpected and even more precious information?
Sven Sachsalber used his hands and eyes for two days to find the needle as we could use classic analytic tools to manually find information.
But today we can make use of Exploratory Analytics technologies to point us quickly and automatically to the most useful information hidden in a dataset.
What are exploratory analytics and how do they relate to classic and advanced analytics solutions?
Classic Analytics, Advanced Analytics and Exploratory Analytics
Classic analytics (aka, classic business intelligence)
For the past 30 years business intelligence has been about workflows where the analyst would take a dataset and then try to find information in it with a trial and error approach. We call this Classic Analytics.
Hypothesis are done and then tested. Analysts run queries to retrieve data then they filter, drill, pivot, create sections, etc. until when any useful information would surface. Data visualization techniques greatly help in this manual search for information.
Usually the output of the analysis is presented in reports or dashboards where analysts want to share the findings with their community. The information is not usually directly actionable but is rather used to support or influence decisions.
More recently, with the advent of big data, analysts are facing datasets which become longer (more records) and wider (more columns, more attributes). Bigger dataset make it very difficult for a human being to grasp the meaning of the data in its entirety and they also require a longer processing time.
As an example, if it is possible for a human being to fully comprehend the meaning of a 10 columns dataset it is virtually impossible to do the same with a set containing thousands of columns (e.g. generated by sensor readings).
In Classic Analytics, the human being drives the analysis and the software is just used to speed up calculations and provide the requested visualizations. The limits of Classic Analytics are hence based on the limits of human brains to deal with their representation of the information.
With the new kind of business data, wide and long, those limits have been reached and human beings need new facilities to deal with the information.
On the other end of the spectrum of data analysis, Advanced Analytics solutions make use of mathematical algorithms and computer automation to find patterns in data and use them to support decision making in operational environments.
In Advanced Analytics, a data scientist or a skilled analyst would set up algorithms of regression, classification, clustering, association, time series analysis, outlier detection etc. to understand the internal structure of an existing dataset and extrapolate rules which can be applied to new data with a precise goal in mind.
Those rules, or models, can be embedded into applications and decision taking processes to support operational users in their daily tasks.
Advanced analytics, by looking at existing data, answer forward looking questions and suggest best actions: “what customers should I target in my marketing campaign?”, “what is the likelihood that this customer will purchase this product?”, “what product should I recommend to this customer? “, “how many items should I have in stock next Tuesday?”.
Advanced Analytics have then the infrastructure of mathematical algorithms, computer programs, data mining concepts which could help human beings overcome their limitations when dealing with finding information in datasets.
Exploratory Analytics are the applications of methods and technologies of Advanced Analytics to the task of helping the business analyst finding useful insights in a dataset, as is manually done in Classic Analytics. Exploratory Analytics provide an extension of the classic descriptive and diagnostic analytics by automatically exposing interesting information available in a dataset.
A detailed explanation of the concept of exploratory data analysis, and mainly on the use of visual representations is provided in NIST/SEMATECH e-Handbook of Statistical Methods (http://www.itl.nist.gov/div898/handbook/).
Long datasets are very lengthy to analyze, wide datasets are difficult to interpret and analysts don’t know in advance what parts are important and what parts are not for a given problem (and importance is always a relative concept). Moreover, analysts might not even know if the available datasets do contain useful information for their problem. In classic analytics, analysts, faced with the challenge of working with a large dataset (long and/or wide), might just give up any attempt of analysis.
In Exploratory Analytics a business analyst would have a dataset automatically analyzed with data mining algorithms to find information about key influencers of measures, outliers, anomalies, points of interest, hidden structures (such as associations between values), groups of records showing similarities, bands of values having a common business meaning. Visualizations proposed by the solution would help the analyst to better grasp the meaning of the data in the dataset, its pertinence and value to the business problem.
Figure 1 shows how Exploratory Analytics is enabling analysts to go beyond classic analysis by means of automated, advanced algorithms and visualization techniques.
Figure 1: Exploratory Analytics at the intersection of human driven and algorithmic aided analysis
With an Exploratory Analytics approach, the application used to analyze the dataset would automatically highlight the most important findings and suggest the best way to visualize the information. The end user would receive information to understand the content of the dataset and will be able to judge of its business value. Starting with this initial set of pre-analyzed information, the analyst can concentrate on the important parts. Exploratory Analytics reduce the noise and provide a smaller but more insightful space of data.
Exploratory Analytics at SAP
Exploratory Analytics are at the core of various activities in SAP. One of the goal of SAP Analytics team is to make data analysis as easy as possible for any user.
SAP Predictive Analytics is the key solution which enables Exploratory Analytics workflows for all users. In SAP Predictive Analytics it is possible, in a few steps, to analyze a dataset to find outliers, influencers, patterns in data, data quality issues, correlations between variables, etc. With the solution it is possible to automatically answer questions such as ‘does this dataset contain useful information for my business problem?’, and ‘what parts of this dataset are actually influencing the answer I am looking for?’. SAP Predictive Analytics finds the needle (or the golden ring) in the data haystack.
Moreover, by applying the underlying technology of SAP Predictive Analytics into business analyst tools such as SAP Lumira or into vertical applications such as SAP Hybris we enable also other users to automatically get the insight they need into their business problem.
SAP historically has a deep knowledge of the business analyst needs and dreams across many industries and lines of business and has all the technology needed to satisfy them. With the existing foundation of SAP Predictive Analytics, SAP HANA predictive libraries and the SAP cloud infrastructure, our users will experience more Exploratory Analytics workflows in their preferred applications and environments.
Imagine if Sven Sachsalber had a tool which would find the needle in the haystack in a few seconds and, possibly, would have told him that in the haystack there was also a lost golden ring.
Well, that probably wouldn’t have made the headlines in newspapers all over the world but it might have interested any person who’s looking after lost treasures.
Exploratory Analytics on the other hand, can actually make a good headline showing how business can be improved by better understanding data, of any length and any width, and by accepting that humans can be helped by automation and mathematical algorithms to do the initial part of the job and then let real brains work on higher value problems.