Smart Discovery – Viable Machine Learning for Business Analysts
As a business analyst, you want to get the most out of your data. You want to have access to the best techniques to really understand what’s going on in your business. This empowerment is at your fingertips with Smart Discovery in SAP Analytics Cloud. Smart Discovery offers you a viable way to use automated machine learning on top of your BI data, without losing precious analysis time on data preparation. Simply decide the business question you want to ask your data and let Smart Discovery analyze it for you, by running a machine learning algorithm. You can then explore the generated results to gain insights into your data.
What’s new with Smart Discovery in Q1 2021?
In Q1 2021 a significant update to Smart Discovery was released that can better answer your business question by now helping you more clearly define the context of your question. Being able to define a better question means that Smart Discovery can automatically prepare the data for you and create better analysis results for you to explore. To make sure you’re happy with the business question you’ve defined, Smart Discovery now offers you a preview of your question before it starts its analysis. The three main benefits to you as a business analyst are that Smart Discovery:
- Bridges the knowledge gap between machine learning and BI and allows analysts to easily use machine learning in BI.
- Automatically leverages data contained in existing BI models eliminating manual preparation requirements.
- Maps the output cleanly to the business question making it simple to then refine and improve.
This Blog describes the recent updates to Smart Discovery and explains how to use Smart Discovery to explore your data and answer real business questions.
Specify the Business Question
Smart Discovery helps you, as a business analyst, to understand the process that it’s using to analyze your data for you. It helps you to specify the right question, and quickly understand the generated results. You can refine the question by modifying the target or entity, filtering the dataset or excluding variables for the analysis. The Target is the measure or dimension you would like to know more about like Revenue or Customer Churn. The Entity defines the dimension or dimensions that describe the object in the data you would like to know more about, for instance customer or product. The entity describes the key that identifies each instance of that object. Smart Discovery will aggregate the data to the level described by the entity.
Previously, business analysts required specialist data science knowledge to effectively apply machine learning techniques to business data. Some of the challenges they have faced are:
- Selecting the correct machine learning technique for a particular problem.
- Selecting and preparing the data.
- Correctly interpreting the results.
Smart Discovery allows you as a business analyst to simply specify the business question. Based on this question the correct predictive algorithm is selected and the BI data is automatically prepared to allow the predictive algorithm to be applied. Smart Discovery then produces results that are easy to understand. As the automatic data preparation allows machine learning to be applied directly to BI data, it is simple to refine the question, or you can always ask more than one business question. Explore your data from different angles by asking Smart Discovery to analyze the same target in relation to different entities, and it produces different results.
Confirm the Business Question
Smart Discovery will analyze the data and generate content to gain insights into how underlying variables influence a target in relation to an entity within a dataset. Smart Discovery automatically prepares the data and builds a predictive model to predict Gross Margin for Customer Name. From this predictive model, it extracts and generates content that helps the analyst understand Gross Margin.
A key issue when applying machine learning to BI data is that the data is not naturally structured in a way that allow ML to be applied. This can mean that the results generated by ML do not match the user’s expectation and can be misleading. When configuring Smart Discovery, you specify the question by selecting both the Target and Entity. The Entity defines the object in the data you wish to explore. The entity is defined by a dimension or multiple dimensions. Essentially, this forms the key of the generated dataset. By specifying both the data can be prepared to match the question ensuring the generated output is safe and easily understood.
In this example you specify the target as Gross Margin and the entity as customer name. The other dimensions in the data may play an important role for explaining the target and must be represented in the flattened dataset. Measures are aggregated based on their aggregation type at the entity level.
How a dimension is represented in the dataset depends on the relationship it has to the entity; based on the following:
- If a dimension has a single value per entity it will be included in the dataset as is with its original name. The relationship in this case will be many to 1 (m:1).
- If there is a unique value of a dimension for each value of the key the dimension will not be included. In this case the relationship is 1 to 1 (1:1).
- If a dimension has multiple values per entity a count of the distinct values will be included with a “Number of” prefix. The relationship in this case can be many to many (m:m) or one to many (1:m).
Smart Discovery automatically prepares a dataset that contains one row of data for each instance of the entity. For instance, if the selected dimension was customer ID the dataset would contain 1 row of data for each unique customer ID. Identifying the entity allows the automated machine learning to provide much more focused analysis.
Automatically Generated Story
Smart Discovery automatically prepares the data for the business question, analyzes the data and generates content for you. The process automatically builds a predictive model to predict the target. The insights provided on the Key Influencers, Unexpected Values and Simulation pages are based on this model.
It is important to note that the analysis is performed on a snapshot of the data at the time Smart Discovery is run and that the analysis is not updated automatically in response to updates to the data. All the content generated by Smart Discovery is dynamic and changes based on the underlying data.
The Overview page provides visualizations to summarize the results for your target dimension or measure in relation to your entity.
The Key Influencers page is generated based on the predictive model. The Key Influencers page lists (ranked from highest to lowest) up to 10 dimensions and measures that significantly impact the target. For each influencer, visualizations are provided that show the average target value and a distribution of the target for each value in a dimension or for each binned value for measures.
In this case there is a record in the data for every customer name. This record contains data at the customer level such as the aggregated Gross Margin for that customer and any dimension values that are unique for that customer.
The Unexpected Values page provides records in the data where the value predicted by the predictive model is very different to the actual value in the data. These values are significant as the predicted values is based on the patterns generally found in the data, so these values are exceptions to the general rule. In this example, the value of Gross Margin for these customer names is different from that predicted by the behavior of the other customers. These customers may be interesting to the analyst as they may reveal special cases that require investigation or may show issues with the underlying data quality.
The influencers are listed with an indication of the relative impact the selected values have on the expected value. In this example, we can see the expected value for a customer with these properties has an expected Gross Margin of 2,278,419.
With Smart Discovery, business analysts can easily use automated machine learning to quickly understand their BI data directly in SAP Analytics Cloud without the need for any data science or machine learning expertise. By simply specifying your business question you can benefit from insights generated by automated machine learning. The elimination of data preparation can be a game changer as being able to quickly run analysis, understand the simple results, then modify the settings and run further analysis allows you to iteratively gain a better understanding of your business data. This simple process allows you to power better decision making and story building while generating useful content.
Want to experience and run Smart Discovery for yourself? Take the leap and start your journey towards making data-driven decisions with confidence by signing up for a 90-day free trial, today. Or, if you would like to request enhancements to Smart Discovery please enter your requests in SAP Customer Influence.