Embedding HANA machine learning models into SAP Data Warehouse Cloud views
In this blog post we describe a simple approach on how data scientists can expose already trained HANA machine learning (ML) models to SAP Data Warehouse Cloud (SAP DWC). Implementing the proposed method allows business analysts to apply ML models on-demand according to their needs (i.e. for generation of reports, dashboards, or other BI activities) in a “no-code” manner, using the data modeling capabilities of SAP DWC (i.e. the data builder component).
To demonstrate the idea, we train a simple time series forecasting model and expose it for consumption following the proposed approach. The developed model predicts the future number of notifications originating from a shop floor device, based on a dataset containing 2 years of historical data. The model creation is not reviewed in detail here, so if you want to learn more about how to create ML models, make sure to check the blog post of Andreas Forster and also the post of Yannick Schaper.
To start with, we assume that the data needed for machine learning has already been shared in a space in the SAP DWC tenant, which we will call MLSPACE. In addition, a database user, MLUSER, has been created. This user has read access to the MLSPACE as well as read/write access to the open SQL schema MLSPACE#MLUSER.
As a next step, we will review the process from accessing the data, via data modeling up to model creation and exposure for consumption. The figure below shows the main steps (in yellow) in the order they are described in the following sections:
1. Accessing the data
The data scientist can connect to the underlying HANA system using Jupyter Lab and Python with the hana_ml package and access the open SQL schema of the user:
hana_port = 443 hana_user = 'MLSPACE#MLUSER' hana_password = [YOUR_PASSWORD] hana_address = [YOUR_DWC_HOST] # Instantiate connection object conn = dataframe.ConnectionContext(address = hana_address, port = 443, user = hana_user, password = hana_password) # Control connection conn.connection.isconnected()
Also the input data located in the read-only schema can be previewed using the following code snippet:
df_remote = conn.table('AssetForAnalytics', schema = 'MLSPACE') df_remote.sort_values('DateTime').head(10).collect()
2. Creating the ML model
This step usually involves several iterations to identify the optimal data as well as the features needed for modeling. Furthermore, the model selection and tuning require a substantial amount of work to complete. Since the focus of this blog post is on model sharing, rather than on model creation, these steps are skipped.
So, the data scientist decides to go with Exponential Smoothing from the HANA Predictive Analysis Library (PAL) for the time series forecasting. As an example of a simple data preparation an ID column is generated. Finally the model is created and the fit_predict() method is called to predict the amount of device notifications from the shop floor for the next 20 time intervals.
df_remote = conn.table('AssetForAnalytics', schema = 'MLSPACE') df_remote_id = df_remote.sort('DateTime').add_id() #... from hana_ml.algorithms.pal.tsa.exponential_smoothing import AutoExponentialSmoothing autoexpsmooth = AutoExponentialSmoothing(model_selection=True, forecast_model_name='TESM', seasonal_period=365, forecast_num=20) df_res = autoexpsmooth.fit_predict(df_remote_id, key='ID')
3. Deploying ML models in a view
To be able to trigger and apply an ML model from SAP DWC the model needs to be embedded into a view. The process to achieve this involves the following two steps:
3.1 Creating a table function
The first step is to create a table function, which wraps the call of the apply function of the PAL ML model (here Exponential Smoothing, i.e. _SYS_AFL.PAL_AUTO_EXPSMOOTH). One can write this call in SQL from scratch or use a handy functionality of the hana_ml package, which returns the SQL execute statement of the model object. The returned string contains already most of the SQL script required for the table function definition.
DO BEGIN DECLARE param_name VARCHAR(5000) ARRAY; DECLARE int_value INTEGER ARRAY; DECLARE double_value DOUBLE ARRAY; DECLARE string_value VARCHAR(5000) ARRAY; param_name := N'MODELSELECTION'; int_value := 1; ... param_name := N'FORECAST_MODEL_NAME'; ... string_value := N'TESM'; param_name := N'FORECAST_NUM'; int_value := 20; ... param_name := N'CYCLE'; int_value := 365; ... param_name := N'TRAINING_RATIO'; double_value := 1.0; ... params = UNNEST(:param_name, :int_value, :double_value, :string_value); in_0 = SELECT "ID", "Notifications" FROM (SELECT "ID", "Notifications", "DateTime" FROM (SELECT "ID", "AssetID", "Notifications", "DateTime" FROM (SELECT CAST(ROW_NUMBER() OVER() AS INTEGER) + 0 AS "ID", * FROM (SELECT * FROM (SELECT * FROM "MLSPACE"."AssetForAnalytics") AS "DT_24" ORDER BY "DateTime" ASC)) AS "DT_26") AS "DT_27") AS "DT_28"; CALL _SYS_AFL.PAL_AUTO_EXPSMOOTH(:in_0, :params, out_0, out_1); CREATE LOCAL TEMPORARY COLUMN TABLE "#TEMPNAME1" AS (SELECT * FROM :out_0); CREATE LOCAL TEMPORARY COLUMN TABLE "#TEMPNAME2" AS (SELECT * FROM :out_1); END
As mentioned, the generated string can be used as it is, except for the last two lines, which create temporary tables to hold the results and statistical properties of the ML model. Those lines need to be substituted with a statement to return the results directly. Finally the table function definition header is added. The following SQL code shows the changes required:
-- Add the function definition CREATE FUNCTION "MLSPACE#MLUSER"."APPLY_TIMESERIES_FUNC"() RETURNS TABLE (TIMESTAMP BIGINT, VALUE FLOAT) LANGUAGE SQLSCRIPT AS BEGIN DECLARE param_name VARCHAR(5000) ARRAY; -- See previous SQL box for details CALL _SYS_AFL.PAL_AUTO_EXPSMOOTH(:in_0, :params, out_0, out_1); -- Add the return statement RETURN select TIMESTAMP, VALUE from :out_0; END
3.2 Creating a view out of the table function
The second step is to create a view, which queries the table function. Through the view the table function is visible in the SAP DWC space. The view can be created with the following SQL code:
CREATE view "MLSPACE#MLUSER"."APPLY_TIMESERIES_VIEW" as ( select * from "MLSPACE#MLUSER"."APPLY_TIMESERIES_FUNC"() )
The data scientist can use here directly the following helper function in Python to perform the two steps, thus avoiding the SQL scripting part. It is important to mention that this helper function will only execute without any errors for this type of ML model. Of course the logic of the function can be adapted to support also more complex models (e.g. when they require access to model repository).
def deployExpSmoothToDWCView(sql_cmd, func_name, view_name, schema_name, hana_address, hana_user, hana_password, append=False): sql_interm = sql_cmd.split('\n') del sql_interm[-4:] del sql_interm sql_interm.append('RETURN select TIMESTAMP, VALUE from :out_0; \n END') sql_string = '\n'.join(sql_interm) command = 'CREATE' if append: command= 'ALTER' sql_view = ''' %s view "%s"."%s" as ( select * from "%s"."%s"() ) '''%(command, schema_name, view_name, schema_name, func_name) sql_t_func = '''%s FUNCTION "%s"."%s"() RETURNS TABLE (TIMESTAMP BIGINT, VALUE FLOAT) LANGUAGE SQLSCRIPT AS %s '''%(command, schema_name, func_name, sql_string) from hdbcli import dbapi conndb = dbapi.connect( address=hana_address, port=443, user=hana_user, password=hana_password ) cursor = conndb.cursor() stat_func = cursor.execute(sql_t_func) stat_view = cursor.execute(sql_view) return stat_func, stat_view
The function needs to be called directly after the execution of the fit_predict() method, otherwise the returned sql execute statement will be an empty string. An example on how to use the function is shown here:
df_res = autoexpsmooth.fit_predict(df_remote_id, key='ID') sql_cmd = autoexpsmooth.execute_statement deployExpSmoothToDWCView(sql_cmd, 'APPLY_TIMESERIES_FUNC', 'APPLY_TIMESERIES_VIEW', 'MLSPACE#MLUSER', hana_address, hana_user, hana_password, True)
4. Accessing the ML model from SAP DWC
If the previous steps were successful, the data scientist has managed to expose the model for other users in SAP DWC. Switching back to SAP DWC, we see that the view APPLY_TIMESERIES_VIEW can now be imported in the MLSPACE from the remote sources. Once added to the SAP DWC space, the view can be used for any further data modeling tasks in the data builder component.
This blog post demonstrates an approach on how ML models, developed in the underlying HANA database, can be exposed and applied from the SAP Data Warehouse Cloud layer in an ad-hoc manner via a table function and a view. This is a lightweight scenario, which can be useful for example to allow business analysts to incorporate ML models on-demand directly into their data models and analysis in a self-service, no-code manner.
It is important to mention that there are cases, where this approach is not suitable. For example, when the model needs to be made available for use in a database application. In this case the blog post of Frank Gottfried shows the best practices approach. Also the post of Wei Han presents a different approach, where the model is applied in the database directly on the input data and the results of this application are saved in a table, which is imported into the SAP DWC space.
I want to thank Dimitrios KostakisKostas for providing his data engineer’s view during the discussions related to this blog post!