Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
abdel_dadouche
Active Contributor
In case you are catching the train running, here is the link to the introduction blog of the Machine Learning in a Box series which allow you to get the series from the start. At the end of this introduction blog you will find the links for each elements of the series.




Before we get started, a quick recap from last time


Last time, we looked at how to use Jupyter notebooks to run kernels in Python and R and even SQL. I'll be honest, Jupyter is now my new "go to" tool.

I even contributed to the SQLAlchemy for SAP HANA GitHub repository recently to add the support of HDB User Store which will allow you to connect without providing your user credentials in one of the cells.

I hope you all managed to try this out, and probably some of you already decided to switch from the "good old" Eclipse IDE and its SAP HANA Tools plugin to run your SQL.

Let's now look at this new post.

I know that I promised to dive into the TensorFlow integration over and over.

All is now available and ready!!! I have fixed the little technical difficulties on my NUC and my Virtual Machines.

I also upgraded to HXE SPS03, which I highly recommend you doing so as well.

For example, SPS03 brings the support of INT64 to SAP HANA EML (else you will need to adjust your models signature and cast your tensors to float).

So, let's get started!




Welcome to week 8 of Machine Learning in a Box!


SAP HANA External Machine Learning Library & TensorFlow Integration






You probably all heard about Google TensorFlow and how it can help you solve many Machine Learning problems especially when it requires the use of neural network or deep learning.

It is true that there are many applications where deep learning and neural networks have reached a level of accuracy that now surpasses human capabilities. Also, thanks to hardware evolution and cloud resources, you can now complete the tasks in a fraction of the time that was required a few years back.

I won't pretend that I can explain you in this single blog all the details about the benefits of deep learning and neural networks, or even TensorFlow as there are plenty of content out there from people with much more experience and credits than I do.

So today, my goal will be to help you get started with the SAP HANA External Machine Learning and TensorFlow integration.



Using TensorFlow implies some understanding of the TensorFlow programming concepts but also some Python coding skills. And I really encourage you to have a look at the TensorFlow Get started and Tutorials page.




About TensorFlow & TensorFlow Serving (a.k.a ModelServer)


TensorFlow™ is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices.

Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domains.

For more details, you can check the TensorFlow web site.

TensorFlow™ Serving is a flexible, high-performance serving system for machine learning models, designed for production environments.

TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs.

TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.

For more details, you can check the TensorFlow Serving web site.




About SAP HANA External Machine Learning (EML)


The integration of Google TensorFlow within SAP HANA is based on the SAP HANA Application Function Library (AFL), meaning that you have now the ability to interact with your TensorFlow models from within SQLScript executed in SAP HANA.

Using Google’s gRPC remote procedure call, SAP HANA will access models exported in the SavedModel format from the TensorFlow Serving system.

Here is a quick diagram explaining the interactions:

SAP HANA EML

For more details, you can check the SAP HANA External Machine Learning Library (EML) documentation..




Installing SAP HANA EML, TensorFlow & TensorFlow Serving


The SAP HANA External Machine Learning (EML) library is part of the SAP HANA, express edition downloadable package, so there is no trick here around its installation. You can simply follow the installation guide.

Regarding TensorFlow & TensorFlow Serving, you can install directly on your SAP HANA, express edition server or any other machine.

But here is the "trick"!

If you decide to install TensorFlow Serving on a SUSE Linux Enterprise or Red Hat Enterprise Linux system (which are the officially supported platforms for SAP HANA, express edition), you will need to compile it from the ground.

If you decide to install it on SAP HANA, express edition downloadable virtual machines, then you are also using SUSE Linux Enterprise, so you will need compile it too.

And if you want to install it on Debian/Ubuntu distributions, the installation is pretty straight forward, it is as simple as something like:
apt-get install tensorflow-model-server

For detailed step by step setup instructions, I produced the following tutorial that guides you through the all process:

This tutorial cover Debian/Ubuntu distributions, SUSE Linux Enterprise and Red Hat Enterprise.

Feel free to use the “provide feedback” link in the tutorial to let me know what you think about it.




The TensorFlow SavedModel format explained


When exposing your TensorFlow models in TensorFlow Serving for SAP HANA consumption, you need to save them using the SavedModel format as documented in the SAP HANA EML documentation.

You need to pay a particular attention to the model Signature definition especially the shapes used for the input and output elements.

The following 2 examples will highlight some of the common situations you will need to address when using content available on the TensorFlow site.

Image Retraining


For the Image Retraining scenario, here is the default signature definition that will be generated when using the provided retrain script:
The given SavedModel SignatureDef contains the following input(s):
inputs['image'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 299, 299, 3)
name: Placeholder:0
The given SavedModel SignatureDef contains the following output(s):
outputs['prediction'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5)
name: final_result:0
Method name is: tensorflow/serving/predict

You can see that there is an input tensor (image) and an output tensor (prediction).

The input tensor has the following shape (-1, 299, 299, 3), which a rank 4 shape, a 4 dimensions input.

In other words, you can represent the input as a vector where each vector entry is a 299 by 299 table (representing the 299 by 299 pixels of the input image) where each table cell represents a vector of 3 floats elements (representing the 3 RGB colors).

You can check the following link for more details about Tensor dtype and shape.

However, SAP HANA EML only support input tensors of rank two at most which match a table or matrix form.

Therefore, I've added some steps in my tutorial to explain how to add a series of steps that process a raw image blob represented as a string and transforms it into the expected shape.

The following signature will then be generated:
The given SavedModel SignatureDef contains the following input(s):
inputs['inputs'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: RawJPGInput:0
The given SavedModel SignatureDef contains the following output(s):
outputs['classes'] tensor_info:
dtype: DT_STRING
shape: (-1, 5)
name: index_to_string_Lookup:0
outputs['scores'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5)
name: TopKV2:0
Method name is: tensorflow/serving/predict

With this signature, to process a SAP HANA EML call for this model, you will provide:

  • one input table/view with one column that represents the image raw blob

  • one output table with ten float columns (5 for the classes and 5 for the scores)


Iris classification problem


With the Iris classification problem, the original script doesn't actually include a save function.

In most content available online, you will be advised to save your models using the "parse example" API and define the serving input receiver like this:
def serving_input_receiver_fn():
feature_spec = tf.feature_column.make_parse_example_spec(feature_columns)
return tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)()

At the end, the input signature will end up be like this:
  inputs['inputs'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: input_example_tensor:0

The Iris model actually uses four input floats to represent the Petal and Sepal width and length.

Where did they go if now it expects only one string?

 

Here is the saved model graph for the Iris model with the "parse example" API:



When using the "parse example" API, it assumes that you will be using tf.train.Example Protobuf objects as an inputs.

A tf.train.Example Protobuf objects contains a Features object, which contains a map of Feature which is a list float (FloatList), byte (ByteList) or integer (Int64List).

The tf.train.Example Protobuf objects are not supported by SAP HANA EML.

Instead, you will need to save the model using the raw tensors and produce a model graph that looks like this instead:



You can achieve that with a piece of code like this:
# Define the input receiver spec
feature_spec = {
'PetalLength': tf.placeholder(dtype=tf.float32, shape=[None,1], name='PetalLength'),
'PetalWidth' : tf.placeholder(dtype=tf.float32, shape=[None,1], name='PetalWidth'),
'SepalLength': tf.placeholder(dtype=tf.float32, shape=[None,1], name='SepalLength'),
'SepalWidth' : tf.placeholder(dtype=tf.float32, shape=[None,1], name='SepalWidth'),
}
# Define the input receiver for the raw tensors
def serving_input_receiver_fn():
return tf.estimator.export.build_raw_serving_input_receiver_fn(feature_spec)()

Which will end up generating the following signature:
The given SavedModel SignatureDef contains the following input(s):
inputs['PetalLength'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: PetalLength:0
inputs['PetalWidth'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: PetalWidth:0
inputs['SepalLength'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: SepalLength:0
inputs['SepalWidth'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: SepalWidth:0
The given SavedModel SignatureDef contains the following output(s):
outputs['predicted_class_id'] tensor_info:
dtype: DT_INT64
shape: (-1, 1)
name: dnn/head/predictions/ExpandDims:0
outputs['probabilities'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 3)
name: dnn/head/predictions/probabilities:0
Method name is: tensorflow/serving/predict

With this signature, to process a SAP HANA EML call for this model, you will provide:

  • four distinct input tables/views with one column each that represents the Petal and Sepal width and length (it will be using the tensor alphabetical order, as in the signature order)

  • one output table with four columns, one integer column for the predicted class id and three floats for the classes probability


As you can notice, each input is using a distinct table/view, whereas all the outputs are stored into a single table.

Note: DT_INT64 type is not supported with SPS02 or prior, you will need to cast it to a DT_FLOAT type




Serve a TensorFlow model in SAP HANA, express edition


Now, let's deploy a TensorFlow model and consume it in SAP HANA, express edition!

Actually, it's not going to be one, but two models that I have content available for, one that uses the well know Iris Flowers dataset and one that process Flowers images.

The idea with the second model was to demonstrate how a blob representing an image stored in SAP HANA could be processed thru the EML library.

And here are the tutorial links:

You can off course use a different set of images to retrain your ImageNet model.

Executing an EML call is no different from other AFL call. Here the example for Iris:
SET SCHEMA EML_DATA;

-- Define table types for iris
CREATE TYPE TT_IRIS_PARAMS AS TABLE ("Parameter" VARCHAR(100), "Value" VARCHAR(100));
CREATE TYPE TT_IRIS_FEATURES_SEPALLENGTH AS TABLE (SEPALLENGTH FLOAT);
CREATE TYPE TT_IRIS_FEATURES_SEPALWIDTH AS TABLE (SEPALWIDTH FLOAT);
CREATE TYPE TT_IRIS_FEATURES_PETALLENGTH AS TABLE (PETALLENGTH FLOAT);
CREATE TYPE TT_IRIS_FEATURES_PETALWIDTH AS TABLE (PETALWIDTH FLOAT);
CREATE TYPE TT_IRIS_RESULTS AS TABLE (
-- when SPS02 or prior, make PREDICTED_CLASS_ID type FLOAT instead of INTEGER
PREDICTED_CLASS_ID INTEGER,
PROBABILITIES0 FLOAT, PROBABILITIES1 FLOAT, PROBABILITIES2 FLOAT
);
-- Create description table for procedure creation
CREATE COLUMN TABLE IRIS_PROC_PARAM_TABLE (
POSITION INTEGER,
SCHEMA_NAME NVARCHAR(256),
TYPE_NAME NVARCHAR(256),
PARAMETER_TYPE VARCHAR(7)
);
-- Create the result table
CREATE TABLE IRIS_RESULTS LIKE TT_IRIS_RESULTS;

-- Drop the wrapper procedure
CALL SYS.AFLLANG_WRAPPER_PROCEDURE_DROP(CURRENT_SCHEMA, 'IRIS');

-- Populate the wrapper procedure parameter table
INSERT INTO IRIS_PROC_PARAM_TABLE VALUES (1, CURRENT_SCHEMA, 'TT_IRIS_PARAMS' , 'in');
INSERT INTO IRIS_PROC_PARAM_TABLE VALUES (2, CURRENT_SCHEMA, 'TT_IRIS_FEATURES_PETALLENGTH' , 'in');
INSERT INTO IRIS_PROC_PARAM_TABLE VALUES (3, CURRENT_SCHEMA, 'TT_IRIS_FEATURES_PETALWIDTH' , 'in');
INSERT INTO IRIS_PROC_PARAM_TABLE VALUES (4, CURRENT_SCHEMA, 'TT_IRIS_FEATURES_SEPALLENGTH' , 'in');
INSERT INTO IRIS_PROC_PARAM_TABLE VALUES (5, CURRENT_SCHEMA, 'TT_IRIS_FEATURES_SEPALWIDTH' , 'in');
INSERT INTO IRIS_PROC_PARAM_TABLE VALUES (6, CURRENT_SCHEMA, 'TT_IRIS_RESULTS' , 'out');

-- Create the wrapper procedure
CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE('EML', 'PREDICT', CURRENT_SCHEMA, 'IRIS', IRIS_PROC_PARAM_TABLE);

-- Create and populate the parameter table
CREATE TABLE IRIS_PARAMS LIKE TT_IRIS_PARAMS;
INSERT INTO IRIS_PARAMS VALUES ('Model', 'iris');
INSERT INTO IRIS_PARAMS VALUES ('RemoteSource', 'TensorFlow');
INSERT INTO IRIS_PARAMS VALUES ('Deadline', '10000');

-- Create the input views
CREATE VIEW IRIS_FEATURES_SEPALLENGTH AS SELECT SEPALLENGTH FROM TF_DATA.IRIS_DATA ORDER BY ID;
CREATE VIEW IRIS_FEATURES_SEPALWIDTH AS SELECT SEPALWIDTH FROM TF_DATA.IRIS_DATA ORDER BY ID;
CREATE VIEW IRIS_FEATURES_PETALLENGTH AS SELECT PETALLENGTH FROM TF_DATA.IRIS_DATA ORDER BY ID;
CREATE VIEW IRIS_FEATURES_PETALWIDTH AS SELECT PETALWIDTH FROM TF_DATA.IRIS_DATA ORDER BY ID;

-- Call the TensorFlow model
CALL IRIS (IRIS_PARAMS, IRIS_FEATURES_PETALLENGTH, IRIS_FEATURES_PETALWIDTH, IRIS_FEATURES_SEPALLENGTH, IRIS_FEATURES_SEPALWIDTH, IRIS_RESULTS) WITH OVERVIEW;

Feel free to use the “provide feedback” link in the tutorial to let me know what you think about it




Conclusion

Adding TensorFlow to our stack opens a huge set of capabilities including the ability to process images, videos and other type of "unstructured" content like text.

You can also use it for let's say more classic models like with the Iris dataset.

Again ( and sorry for repeating myself), what you really to pay attention to is the signature input and output elements where the SAP HANA EML enforce some restrictions with the shape dimensions on the both the input and output.

A huge thank you to burkhard.neidecker-lutz Former Member & christoph.morgen from the SAP HANA engineering team for their support when producing this content.



(Remember sharing && giving feedback is caring!)


UPDATE: Here are the links to all the Machine Learning in a Box weekly blogs:
8 Comments