This blog aims to answer few questions related to one of the new features called ‘Modeling Support for Vora’ in SAP BusinessObjects Predictive Analytics as announced here.
What does this integration offer ?
This integration allows Data and Business Analysts to access and run predictive models on SAP HANA Vora using SAP BusinessObejcts Predictive Analytics.
SAP HANA Vora is an in-memory query engine. It runs on Apache Spark to analyse Big Data stored in Hadoop. It enables analysis on Big Data by providing OLAP style capabilities on Hadoop and also enables massive scale out scenarios for HANA. SAP BusinessObjects Predictive Analytics provides ready-to-use tool for data mining, building predictive models and industrializing models on various different platforms. It already integrates to Hadoop and enables predictive training natively on Spark using Native Spark Modeling feature.
With SAP HANA Vora being SAP’s answer to ‘Analytics on Business & Big Data’ on Hadoop, this integration extends the capabilities to perform machine learning on Hadoop. Thus the integration would allow end users of both solutions to form and use advanced analytics use-cases on Hadoop.
When was this integration released ?
The integration comes as part of SAP BusinessObjects Predictive Analytics 3.1 release. It connects to SAP HANA Vora version 1.2.
Why is distributed computing so important for these two products?
Hadoop’s adoptions is increasing as low cost distributed data storage platform and at the same time Hadoop ecosystem continues to build tools and packages that can cater to dynamic scalability needs of any business. The main challenge for enterprises hence is to provide higher efficiency and better performance for application that would run on top of these platforms.
In case of SAP HANA Vora, distributed computing is offered in terms of drill downs/hierarchical support on Spark to start with and for predictive analysis it is offered for training/learning phase which would otherwise be extremely time consuming on amounts and nature of data analysed on Hadoop. There are many more capabilities in SAP solutions where distributed computing would be needed to massively process amount of data in parallel and these type of solutions can be seen in future.
How it is different from running SparkML on SAP HANA Vora ?
For data scientists and application developers who prefer to work in SparkML or SparkR environment, SAP HANA Vora makes it simpler to enrich candidate datasets in Hadoop through enhancements to Spark’s Datasource API.
Predictive Analytics on the other hand provides workbench and automates end to end processing of predictive scenario right from data manipulation to training and then to scheduling apply/retrain. Its automated analytics mode can be used by Analysts who would not want to code or use one algorithm over the other rather can rely on tool’s machine learning capabilities for their business problems. In this case Automated Analytics can now connect to SAP HANA Vora as data source for range of its automated algorithms and Classification/Regression particularly will run on same spark instance where SAP HANA Vora is based upon.
Can you give me some business use-cases that are possible using these two solutions ?
There is no restriction to type of use-case or industry dataset that you can consider for your predictive needs on Vora, but as the nature of application is more to provide mashed up data between Hadoop and HANA, the more interesting scenarios will be where you have huge amounts of historic or unstructured data in Hadoop and enterprise data in HANA. Some of the example datasets:
- Predictive Maintenance with sensors/logs on Hadoop using machine learning to understand key factors contributing to failure/downtime/repairs
- Customer Churns to understand behaviors and influences that could lose customers for telecom company in next coming months
- Up-selling/Cross-selling of products discovered based on customer history, demographic and social information
- Predicting Hackers and intrusion based on website logs
- Understanding patterns for chronic diseases represented with huge number of columns based on patient history in SAP HANA Vora
As SAP BusinessObjects Predictive Analytics has extensive support for SAP HANA platform, it is also possible to TRAIN data on SAP HANA Vora and then APPLY it in HANA with few simple clicks. Here for example you may have 10 year archived payment history data in Hadoop managed & used for analysis via SAP HANA Vora but then as your future data sit in HANA, you can use the same trained model to predict payment dates for unsettled invoices in HANA.
Whats the future Roadmap ?
In future releases of SAP BusinessObjects Predictive Analytics, we would like to complete remaining features for SAP HANA Vora first in Automated Analytics and then extend support for Data scientists community to work with SparkML on SAP HANA Vora using ‘Expert Analytics’ .
Whats the best way to run these solutions ?
Today there is no complete support of Data Manager and APPLY parts of Predictive Analytics for Vora due to special handling needs for generated SparkSQL. As an alternative users can use SAP HANA Vora and SAP Predictive Analytics as illustrated here for various steps of a predictive scenario:
In this case, Vora Modeler tool is used to model and prepare the dataset that can be used for training in Automated Analytics. And generated SparkSQL from Automated Analytics is used to calculate scores in SAP HANA Vora.
Are there any restrictions in current release ?
There are some restrictions related to SAP HANA Vora data source: for example no support for inbuilt data management and InDatabse APPLY features within Predictive Analytics tool as described above. Refer to release note https://launchpad.support.sap.com/#/notes/2391541 for more details.
Is SAP HANA Vora included in the same price package as that of SAP BusinessObjects Predictive Analytics ?
No. They are prices and sold as separate packages.
Which Hadoop distributions and Spark versions are supported ?
There are different Hadoop distributions supported in SAP BusinessObjects Predictive Analytics ; refer to PAM for more details. For SAP HANA Vora 1.2, HortonWorks and Cloudera clusters with Spark version 1.5.2 are supported and in case of Native Spark Modeling , Predictive Analytics needs to be installed on the UNIX operating system only. In this client-server installation, the client can be on Windows with the server on UNIX.