Machine Learning in SAP HANA #ASUG Webcast Summary
This was an ASUG webcast last month provided by the BITI Dev/Tech Special Interest Group. What is machine learning, and what is SAP HANA Machine Learning?
Figure 1: Source: SAP
Computers learn from data without explicitly programmed
Today, you have to recode application
In ML algorithm encompasses decision making and prediction – decouple logic from algorithm
Increase the flexibility of applications
Computers learn from data
Input data, train model, test that model is valid
As new data comes retrain the model to take advantage
Model becomes more robust
Three phases – input, machine learning, output
Input could be text, images, data cleansing, build model, subset data to train model
Embed model in applications
Figure 2: Source: SAP
Why now? Explosion of data, coming from variety of sources, collect data for competitive advantage
Look at past transactions to see what people have been buying
Increase in processing power (such as SAP HANA, PAL)
Building deploying applications has become easier due to large set of algorithms for numerical, non-numerical data, and deep learning
Integrate with applications
Model management capabilities
Figure 3: Source: SAP
Figure 3 shows Machine Learning in HANA – end to end process
Start with data ingestion – HANA can load streaming
Next data exploration; – how does data look? Does it have data missing? Tools to help
Feature engineering – transform input
Once data is ready, split in 2 groups – 1 for training, other for testing and validate ML algorithm
Store model in HANA
Models need to be deployed in a variety of different ways – SQL or as a service
Scoring or prediction – real time, execute models quickly
Model management -ensure you can retrain models on an event driven basis or periodic basis
Figure 4: Source: SAP
HANA comprehensive learning capabilities
Integrate with 3rd party libraries ®
High performance in scoring
Ready for developers using HANA express
Figure 5: Source: SAP
Figure 5 shows the scenarios addressed by PAL
Applicants suitable for credit card processing – classification scenario – credit score, income level, geography; use a decision tree – which one will default
Build model on historic data, new applicants to predict if they default or extend credit
Regression model for predicting house prices; build model, periodically trigger/retrain model as needed to ensure model predictability is accurate
Look at customers to run marketing to group as a logical entity – clustering, k-means, you may want to put some marketing programs
Analyze customer transaction data – customers who bought milk, did they buy, using sequential pattern mining
Over 90 algorithms in PAL (shown on the right of Figure 5)
Figure 6: Source: SAP
Algorithms are typically used by data scientists, to allow a fine grain of control
Each release of SAP HANA continue to release and offer new algorithms
Figure 7: Source: SAP
APL provides higher level of abstraction
Can be used by business analysts
Library embedded in SAP HANA
The user doesn’t have to select input or classification algorithm
Takes set of inputs, derives them, forecasting for accuracy
Complementary to PAL
Could use both PAL/APL
Figure 8: Source: SAP
Figure 8 shows you can classify documents, input into HANA, do text mining, term document, then given a new document can run the classification algorithm
Figure 9: Source: SAP
Figure 9 shows business forecasting, predict future inventory levels, future sales and consumption
HANA can store and process time series
Provide algorithms such as exponential smoothing
Business function library contains algorithms for business processing
Figure 10: Source: SAP
More need to deal with event streaming; machines have sensors and want to analyze information as it comes out
High likelihood of failure? Take corrective action
Smart data streaming is a component in SAP HANA
Streaming engine – define query continuously
SAP HANA supports incremental machine learning, as data comes in and prediction on event streams, scoring decision trees in real time
Figure 11: Source: SAP
R is a popular 3rd party language, offering a variety of different packages
Stored procedure can contain R code
R server runs in a separate node, processing the code
Results are returned to stored procedures
Figure 12: Source: SAP
High performance – quickly take model to production to be competitive
Iterations – to do quickly
Once build model, execute quickly
Use models for predicting in real time
HANA approach is three-fold
Pushed processing logic to data base
Training algorithms take advantage of multiple cores
Focus on multi-node architecture for parallelization
Figure 13: Source: SAP
Can use Predictive Analytics
Data scientist can use the expert mode and store models in Predictive Factory and models can be retrained
Business analysts use automated predictive modeler
Figure 14: Source: SAP
On the HANA side you have PAL and R
HANA studio supports
You can write SQL script to invoke PAL for training and scoring
Figure 15: Source: SAP
Figure 15 talks about embedding the predictive models in applications
Figure 16: Source: SAP
The webcast ended with SAP saying you can start developing with this today using HANA Express
Related
Video from SAP about machine learning and why you should come to SAP TechEd
Learning about #MachineLearning Overview #Simplified #UnlimitedUsecases #UnlimitedPossibilities
Simply putting a blog and teched session won't help. If SAP wants to empower us the real people who who would work with these technologies then put it on opensap with proper sap dev tools
Hi - I don't work for SAP - and many tools are available now at the Cloud Appliance Library - cal.sap.com for you to try yourself
Hi Tammy,
is there any recording available of this talk?
BR, Damir
I'm sorry for the delay - I updated the blog for the link - you will need to register to view it. It is after Figure 16.