SAP HANA External Machine Learning: Take 2
You might have already seen my earlier blog introducing SAP HANA External Machine Learning library which discusses how you can access TensorFlow models from HANA?
Well in the rush to produce hands-on video tutorials in time for RTC (release to customer) perhaps the content wasn’t as elegant as it could be.
The scenario involved tweaking of the standard TensorFlow MNIST model to change the default signature and a HANA table with 784 columns – Ouch!
Partly this was due to me using an internal pre-release – some things changed between that and RTC – but also very much due to my lack of knowledge. Interesting your feedback also made it clear that not everyone has a TensorFlow Serving environment up and running – so perhaps we could provide some pointers in that area too?
A lot of water has passed under the bridge since then and having had time for additional research we’ve been able to significantly revise the content. Yesterday we published 6 all-new video tutorials that show how to get going with the EML and TensorFlow and all in the context of SAP HANA, express edition.
The scenario now uses the standard MNIST model without modification, has a single BLOB column in HANA to store the image of a handwritten digit, and uses Database Explorer (via Web IDE for SAP HANA) rather than Eclipse as the development environment (though you could still use Eclipse if you prefer).
There’s a video showing how to install and configure TensorFlow with Tensorflow Serving from scratch in Google Cloud Platform – and given that SAP HANA, Express Edition is also available there you can take full advantage of the free trial!
Very much inspired by Frank Gottfried‘s excellent blog here. there’s also a video showing how to get started with the now fully supported Python driver delivered with SAP HANA Client Tools in SPS 02. This enables you to load test data directly into HANA using a Python script. You don’t need to do this to follow along with the tutorials however as a few rows of MNIST test data have also been published to our GitHub repository which can easily be imported via Database Explorer.
It’s the same playlist as before: SAP HANA Academy – EML Playlist
There are now 7 main videos covering the following topics:
- Introduction
- Getting started
- Install TensorFlow and TensorFlow Serving
- Train and serve a model in TensorFlow
- Create Remote Source and register model
- Real-time scoring
- Python 2 HANA
UPDATE: 11Oct17 – HANA 2.0 SPS02 revision 21 introduces significant changes to the way output tensors are handled. If you’re using revision 21 please refer to the following tutorial for updates to the real-time scoring example with MNIST:
The previous videos are still there at the foot of the playlist – just in case you’re addicted to Eclipse:
- Getting started
- Build, train and serve a model in TensorFlow
- Create Remote Source and register model
- Make predictions
You’ll also find above tutorials posted to the What’s New for Predictive HANA 2.0 SPS 02 playlist where you can also get the low down on enhancements to the Predictive Analysis Library (PAL) featuring the all-new web-based Application Function Modeler, as well improvements to R Integration.
If you’re interested to learn about what’s new with HANA 2.0 SPS 02 in general check out the following blog.
Should you be attending SAP TechEd please drop by our SAP HANA Academy event on the Monday. A full day of lecture/demo and hands-on sessions where we’ll be covering the EML / TensorFlow as well as other hot HANA topics – and unlike YouTube you can interrupt us to ask questions! Attendance is free and we’ll help you get started with SAP HANA, express edition on the Google Cloud Platform using the free trial – so you can keep your work afterwards.
May your tensors flow with HANA!
–
The SAP HANA Academy provides free online video tutorials for the developers, consultants, partners and customers of SAP HANA.
Topics range from practical how-to instructions on administration, data loading and modeling, and integration with other SAP solutions, to more conceptual projects to help build out new solutions using mobile applications or predictive analysis.
For the full library, see SAP HANA Academy Library – by the SAP HANA Academy
For the full list of blogs, see Blog Posts – by the SAP HANA Academy
- Subscribe to our YouTube channel for updates
- Join us on LinkedIn: linkedin.com/in/saphanaacademy
- Follow us on Twitter: @saphanaacademy
- Google+: plus.google.com/+saphanaacademy
- Facebook: facebook.com/saphanaacademy
UPDATE: 11Oct17 - HANA 2.0 SPS02 revision 21 introduces significant changes to the way output tensors are handled. If you're using revision 21 please refer to the following tutorial for updates to the real-time scoring example with MNIST: Real-time scoring (revision 21)
Thanks Philips for this usefull links & informations.....
I apreciate it....
Hi! Please do you know if Is it possible to deploy the entire SAP HANA EML + TensorFlow Serving all within the SAP Cloud (without touching other clouds like Google, AWS, Azure, etc).?
e.g SAP Cloud containing a HANA DB + a VM instance with Tensorflow running on it. All within the SAP Cloud. (Neo environment)
Hi Johnny,
Not that I'm aware of however you can use the SAP Leonardo Machine Learning Foundation which is built on SAP Cloud Platform: https://cloudplatform.sap.com/capabilities/machine-learning.html.
Thanks,
Philip
Hello Philip,
I'm a little confused about SAP and machine learning. SAP Leonardo does not use the EML, so when is the EML used and when is Leonardo Machine Learning Foundation used? What is the difference? I thought that the SAP Leonardo Machine Learning Foundation works with the EML.
Hi Samira,
The EML and Leonardo ML Foundation are two separate yet complementary approaches to ML.
EML is for when you've already built your own ML model(s) in TensorFlow and want to reach out to them for inference/scoring directly from HANA and do that via SQL.
Leonardo ML Foundation is a comprehensive library of cloud-based services comprising ready built ML models where you can also retrain models and upload your own models. Access is via HTTP web services (API's) not SQL and there is no requirement/dependency on HANA. This approach is well suited to when you want to access ready made ML services in the cloud and/or don't have the ML knowledge to build TensorFlow models from scratch.
The EML is a HANA component and not part of Leonardo ML Foundation and vice versa. They're two separate and complementary approaches to ML.
Philip
Hello, Philip,
we are building a tailored text classifier on the SAP platform, but would like to avoid using Tensorflow, at least for the moment.
So, could a custom ML model also be served in the same way as TF models are?
If I understand correctly, the model is served via some kind of an HTTP REST API, where certain "alignments" need to made for it to work seamlessly, which to me doesn't sounds quite generic and customizeable.
Thanks in advance!
Regards,
Nemanja
H Nemanja,
SAP HANA provides a number of ways you have make use of ML/predictive, integration with TF is just one of them.
Just to avoid any confusion, TensorFlow Serving "serves" TF models that have already been trained and is specific technology to TensorFlow. HANA acts as a "client" by the "remote" TF server to perform inference/scoring.
For custom ML beyond TensorFlow, maybe consider using the R integration for SAP HANA (link to playlist) where you can basically do anything you want on the R side by embedding R script into a HANA stored procedure. Again HANA acts as the "client" and connects to the "remote" Rserve which actually does the processing.
Hope that helps?
Philip
Hi, Philip,
first of all -- thanks for the lightning-fast reply. 🙂
Yes, I am aware of R integration (which is a bit more "embedded", as I can see), but we are looking to reuse our existing Python (scikit-learn) code or maybe export the model to PMML and then use "openscoring" to serve the model via a REST API.
Since TF is also running on a VM within SAP Cloud, and "serves" it's predictions via an HTTP API, to me it seems like HANA+EML is not actually "married" to Tensorflow and "doesn't care" what's on the other side of the line -- as long as it responds to its requests and delivers the response in a specific format that HANA requires.
Did I understand things correctly?
Many thanks in advance!
Nemanja
Hello Nemanja,
the current EML implementation does *not* use a generic REST API, but rather the somewhat higher performing gRPC mechanism and the TensorFlow Serving protocol layered on top of that. Thus the current version is indeed married to TensorFlow.
If you were absolutely bent on integrating your current code that way, you could write your own python-based TensorFlow Serving server and point the EML at that. A "conforming" server would only need to implement the GetModelMetaData and Predict calls of the full TensorFlow Serving protocol and can actually be implemented in a moderate amount of python. even if it is no match for the full blown C++ based server.
Hello Philip,
Sorry was on different assignment so did not get time earlier to look into it. I am able to follow you all the way as shown in the video. However I get stuck while running the python script:
python mnist_softmax.py
Traceback (most recent call last):
File "mnist_softmax.py", line 28, in <module>
from tensorflow.examples.tutorials.mnist import input_data
ImportError: No module named tensorflow.examples.tutorials.mnist
I tried to check but not sure if the module is available or not. Can you please suggest something on this.
Best Regards,
Naveen
Hi Naveen,
The mnist_softmax.py script is from the standard TF tutorial: https://www.tensorflow.org/get_started/mnist/beginners
I wonder if you're tensorflow machine is behind a firewall and that is why the script is unable to download the MNIST demo data?
I suggest you follow the tutorial (https://www.tensorflow.org/get_started/mnist/beginners) by pasting the code step by step rather than running the entire script in one go - this may help you debug what's going wrong.
Thanks,
Philip
Hello Philip,
I am using HXE on GCP. Can I get SAP HANA client tools too to try the last part of the playlist - Python to HANA?
Regards,
Sumit
Hi Sumit,
Yes of course - using the HXE download manager.
The download process is shown in the video starting at 2 minutes and the install process at 5 minutes.
To download and install the HXE download manager itself register here: https://www.sap.com/cmp/ft/crm-xu16-dat-hddedft/index.html
Regards,
Philip
Hi Philip,
I did not install HXE locally, Google Cloud launcher service installed it for me as VM on GCP compute engine. The link you provided will also work in my case, I thought this is the whole package of HXE.
Regards,
Sumit
Hi Sumit,
Indeed, you don't need to download the server components - just use the download manager to download the client tools only.
Regards,
Philip
Got it. Thanks for your help Philips.
I shall try that and update here.
Regards,
Sumit
Hi Philip,
I am stuck at " Create Remote Source and Register Model".
When I try to verify MODEL IS UP AND RUNNING ON REMOTE SOURCE, I get back an error:
"Remote source TensorFlowModelServer server is alive but refused connection".
I tried using external IP instead of internal IP. I still get the same message.
Any ideas?
Regards,
Rahul
Hi Rahul,
This error means that it's able to see the TF server machine (ip address/hostname) but not able to connect to the model. Most likely issues are either that the port is blocked (do you have a firewall? is there a proxy server in-between?) or that the model is not up and running on that server and port.
Hope this helps,
Philip
Hi Philip,
I think I could isolate the issue. I create a new VM instance on google cloud platform. I tried accessing the model server from new VM.
I could not even if I use internal or external IP. How do I open the port to accept connections from external IPs.
Regards,
Rahul
Hi Rahul,
I’m not sure exactly where your HANA machine is hosted and where the TF serving machine is hosted.
Assuming TF serving is in GCP and HANA outside GCP, in GCP go to VPC Network > Firewall rules & then create a rule with tcp:<yourTFPortNr> and IP range of the external ip address of your HANA server (or 0.0.0.0/0 to be open to all) and ensure this firewall is associated with your TF serving instance.
More doc can be found here: https://cloud.google.com/vpc/docs/firewalls
Philip
Hi Philip,
I could connect to TensorFlow model server running on one VM from a client on another VM in different region using both internal and external IP.
But, I am still not able to connect to the server from HANA. I still get
“Remote source TensorFlowModelServer server is alive but refused connection”.
Is there some configuration in HANA that needs to be done to able to connect to external servers?
I am trying this from HANA Studio. I have followed steps as per your video. I have confirmed that EML is installed and all the functions are available.
Regards,
Rahul
Hi Rahul,
Where is the HANA server hosted? Is it perhaps behind a corporate firewall that is blocking outbound access to the TF serving port?
Sounds obvious, but perhaps also double-check the model name specified in your HANA SQL Script? It could be that HANA can connect to the TF server port but the requested model is not being served?
Regards,
Philip
Hi Philip,
You are right. The HANA Server is hosted behind SAP corporate firewall. Any idea how do I still be able to connect to GCP?
Or is it strictly not allowed?
Regards,
Rahul
Hi Rahul,
My understanding is that this isn't possible right now however the ability to specify a proxy server for use with EML remote sources could be added in a future HANA release.
Regards,
Philip
Hi Philip ,
I installed EML on HANA express edition but when i execute the select statements to check eml installation on System Database with System user i do not get anything , followed your HANA academy videos playlists -
Any advice on this ?
Thanks,
Shivam
Hi Shivan,
Thanks for your question and watching the videos.
I presume you mean that the following statements return 0 rows?
In that case it really sounds like EML has not been installed - for HANA Express you typically need to download the eml.tgz and run an optional configuration step to install the for EML. Full details in the HANA Express Getting Started pdf documents.
Hope this helps,
Philip
Hi Philip ,
Thanks for your reply , I am done with the above part , Installed EML successfully but i have a doubt about tensorflow installation on HANA express edition .
1 - Can i install Tensorflow on HANA express edition (Virtual Machine I have setup , Not using GCP) ?
2 - If i do not install tensorflow on HANA Express edition and i have setup tensorflow on Anaconda & using python HANA database connectivity to fetch data from HANA Database & then perform machine learning at Anaconda Application , Please suggest if this can also be the option on Tensorflow & HANA Integration for Machine learning.
Thanks,
Shivam
Hi Shivan,
If you want to use the EML then it's HANA which connects to the remote TensorFlow server (which basically can be anywhere).
Yes you can install TensorFlow on the HANA Express machine for development purposes - there's a video showing how to do that in the EML playlist on YouTube.
The whole concept of the EML is that you have an app running in HANA which wants to make calls out to the TensorFlow server to do real-time scoring and then get the results back in a HANA table.
If you simply want to pull data from a HANA database using Python (or any other supported mechanism) you can do that no problem - in this case you don't need the EML.
Thanks,
Philip
Hi Philip ,
Thanks a lot for all the clarifications.
Thanks,
Shivam
Hi Philip ,
Great videos on Tensorflow installation and Serving on Ubuntu Xenial , but on last Command execution of ruuning tensorflow model Server i got the below error -
tensorflow_model_server: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.22' not found (required by tensorflow_model_server)
I checked on web & found the solution in below thread just thought to share if anyone else ecounter's same can follow it.
https://github.com/tensorflow/serving/issues/819
Thanks,
Shivam
Hi Philip ,
Any advice on below error -
Could not execute 'CALL "SCORE_DIGITS" ("PARAMS", "V_DATA", ?)' in 126 ms 34 µs .
SAP DBTech JDBC: [423]: AFL error: search table error: _SYS_AFL.EML:PREDICT: [423] (range 3) AFL error exception: No remote source matching model
Thanks,
Shivam
Hi,
I am trying to setup a HANA EML / Tensorflow, where I have deployed HANA express on one VM server and TensorFlow (Ubuntu) on another VM within the same local network.
I followed all the steps of the EML guide, and also followed the steps to ensure the TensorFlow integration:
https://www.youtube.com/watch?v=R4AV1zPgyKg&list=PLkzo92owKnVwrZto5m1pl3JNajP94wHju
https://www.sap.com/developer/tutorials/mlb-hxe-setup-tensorflow.html
I also downgraded the Tenserflow version to 1.5.0, as 1.8.0 is not supported.
Everything is working on the TensorFlow side, the model is served and working.
On HANA though, the SDA connection keeps giving me an error:
SAP DBTech JDBC: [403]: internal error: Connection failed for remote source TensorFlowModelServer. Reason: not supported
I tried installing the XS Workbench as well, to check if it was related to my HANA studio. It seems the GRPC destination isn't working properly in SPS03, as it doesn't update any parameters, when selecting the adapter.
Any idea on what the issue could be?
Br,
Mads
Hello ,
this issue is an unfortunate side effect of the decision to collect all different types of remote sources under the same UI even though there are substantial differences between remote sources.
I assume that you are have defined a remote source of type "grpc" and have tried to "open" it under the
"<system-name>" -> Provisioning -> Remote Sources -> "<my-remote-source>"
tab of HDBSTUDIO. For "ODBC" type remotes sources this then would list the remote database artifacts defined on that ODBC (or JDBC) source. For the non-database-like remote sources (such as RSERVE or GRPC) no such artefacts exist and there actually is no method to enumerate remote artefacts in either R-Servers or TensorFlow Modelservers.
The error message: "JDBC/403" correctly states this fact as "not supported" but with a (technically correct) but utterly unhelpful and misleading message that no ODBC connection could be established. Which of course is true (it isn't an ODBC connection).
So, please ignore that message. To actually check define and check connectivity of gRPC resources (which actually right now are restricted to TensorFlow Serving Modelservers, not general purpose gRPC targets), please follow the methods and instructions outlined in
"Validating Model and Landscape Definitions"
of the EML documentation at
https://help.sap.com/viewer/ab6b04eb12d3452aa904d5823416a065/2.0.02/en-US/89920175aa844b3989f8719431b9947e.html
i.e. the CHECKDESTINATION call.
Sorry for the confusion
Regards,
Burkhard Neidecker-Lutz, EML Development
PS: As for the TensorFlow versions: Unless a particular model uses any operators that are only in a version 1.y with y larger than the 1.x version of TensorFlow in the TMS, there is no need to upgrade the TensorFlow server. But that's anyhow completely unrelated to the misleading error message you are seeing
Hi Burkhard,
Thanks for the detailed response. You're right. The connection works when using the CHECKDESTINATION call and the rest of the tutorial.
But yes, I just have to ignore the message. 🙂
Br,
Mads