Skip to Content
Author's profile photo Richard Mooney

How can the business Leverage Hadoop with SAP Predictive Analytics?

One the key challenges facing enterprises today is making sense of the explosion in data generated by employees, partners and customers.  With the trend towards Internet of Things (IoT) this problem is getting worse.  Every new product, service, device and system is becoming computerized and generating data.

Executives understand that data is valuable and can be used to help their organizations run more efficiently but they struggle to work out how to achieve this:

  1. The data becomes very big very quickly.  Storing it for long periods on traditional database platforms is very expensive and businesses have turn to lower-cost alternatives – such as Hadoop.
  2. A large percentage of the data captured has very low information density and does not have the same profile as traditional enterprise data.  Finding insight through manual analysis where a business analyst compiles reports and dashboards to measure and understand KPI’s is not viable. The way to extract value from this mountain of data is to use Machine Learning and Data Mining techniques.

So how do we use advanced analytic techniques to extract insight from huge volumes of data stored on Hadoop?

Hadoop has incredibly powerful data mining capabilities from languages like Python and Scala to frameworks such as Spark ML & Spark/R, but these technologies all require very skilled practitioners.  They need to understand both the problem domain, the data science techniques to solve it and also the programming languages to implement it in. The custom-coded solutions need to be integrated into operational systems such as ERP, CRM, and Finance and maintained by IT departments.  This is a real challenge for IT departments struggling to maintain a balance between delivering innovations for the business while containing operational costs.

This is where Predictive Analytics from SAP can help.  We have an end-to-end solution that turns data mining on Hadoop into a predictable, repeatable and easy to manage process. 

So how does this work in practice?  Let’s take the example of Alan who works as a Data Analyst in the IT department of a Fortune 500 company. 

  1. Alan’s employer has created a Data Lake on Hadoop.  They have created a replica of their enterprise data and mashed it up with IoT data generated from their latest mobile application.
  2. The business wants to know how geolocation information from the mobile app can be used to identify potential high value customers early.
  3. Alan builds an Analytical Dataset (ADS) using Predictive Analytics Data Manager.   Alan uses his domain knowledge to enrich the customer record with derived attributes and make the data more predictive.  This ADS will be reused in the future to answer other questions about the customer. 
  4. Alan uses Predictive Analytics Modeler to automatically identify high value customers.  Once he has built the model he creates a report directly from the tool which shows his executive management the ROI they can expect from the solution.
  5. Finally Alan uses Model Manager to automate the deployment of the model into a production environment.  Model Manager will automatically monitor and maintain the model to ensure accuracy.
  6. Because the model is embedded directly in the CRM application, every time a new customer is added the system automatically validates if they are likely to become high value.
  7. New high value customers are given a differentiated experience and there is a clear improvement in marketing effectiveness.

By the end of the process Alan has successfully prepared the data, built a model and deployed it into production without writing a single line of code.  The whole project was completed faster than expected. The other good news is that as Alan develops Data Science skills, he can use Predictive Analytics Expert Mode to answer more complex problems using traditional Data Science languages and frameworks. 

SAP will be in New York for the Strata+Hadoop World from Sep 29-Oct 1.  Stop by and we’ll show you how you can take advantage of SAP Predictive Analytics on your Hadoop data.

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Marin Videnov
      Marin Videnov

      Sorry to say that, but I see no real explanantion on "How" you leverage Hadoop unstructured data lake with SAP PA? What connectors are used? How do you create your ADS directly from such unstructured data?

      Author's profile photo Richard Mooney
      Richard Mooney
      Blog Post Author


      Thanks for the comment.  In order for this technique to work on unstructured data it needs to be pre-processed to impose some structure on it.   If it is semi-structured data such as event log, a weblog or text document then there are tools available with Predictive Analytics to help you achieve this.  Look at Sequence Coding, Text Coding and Event Logging manuals at

      For very unstructured data you would need to use data science techniques such as those supported by PA Expert Mode or an open source tool like R in order to populate the ADS.

      Once this structure is created the ADS acts exactly as in the article.