Skip to Content
Author's profile photo Andreas Forster

SAP HANA and R hands-on: From freestyle to deployment

Learn how to script R in your preferred editor with SAP HANA data and how to deploy the code directly in SAP HANA.

If you are a Data Scientist, there is a good chance that you enjoy working with R. You can bring in your creativity, choose from thousands of R packages and script some very agile data analysis and predictions.

You probably know that SAP HANA has multiple options to create predictive insight. I am a big fan of creating predictive models automatically and efficiently with SAP Predictive Analytics. But if the needed arises and you have the time and skill to script a specific task yourself, it can be fun to open up your favorite R editor.

Working in your preferred R editor is what I call “freestyle”. Do what you do. And once you have found the code that solves the business challenge you are addressing, you will feel a nice sense of accomplishment. But the project has not yet been completed. How do you integrate your R code into your existing SAP HANA landscape, under the governance of IT? How to deploy freestyle? Well, read on and try it out!

Scenario

I will introduce the concept of working with R and SAP HANA along a time series forecasting scenario. We will use the monthly history of vehicle registration numbers from 20+ European countries to predict how many vehicles will be registered in future in each country. Think of it as a demand forecast or revenue forecast.

In reality, I believe such a use case ought to be implemented using the automated concept of SAP Predictive Analytics, which provides a comprehensive framework to operationalise such forecasting. The R code we are using here, will only cover a fraction of what the automated framework delivers, but let’s stick with it for the sake of learning the concept of scripting R with SAP HANA.

Disclaimer: Just bear in mind some small print. I am hoping this blog will help you get familiar with the R integration in SAP HANA, but the code or any of the blog’s content is not supported by SAP. Should you want to use part of the code, please carry out your own tests to ensure the syntax is working for your use case. Also, please have look at the “Things to consider” section towards the end of this blog. And finally, please verify with your Account Executive at SAP any license implications such an R integration might have.

Prerequisites

R Server connected to SAP HANA

You must have a SAP HANA system with a connected R Server to follow the whole example. Adding such an R Server to SAP HANA is described in the SAP HANA R Integration Guide. The supported R versions are listed in SAP Note 2185029. You can also work with the free SAP HANA, express edition as described in this tutorial.

The SAP HANA system might or might not have the Automated Predictive Library (APL) or the Predictive Analysis Library (PAL) installed.

 

Local R installation

I assume you are already comfortable scripting with R. Don’t despair if you are new to R. This collection of online learning resources maintained by RStudio is one of many ways to get started.

For the R scripting itself we should use an R development environment to be efficient. I personally like using RStudio, but there are many alternatives. We could develop directly on the R Server, for instance by running RStudio Server on that system. However, for this tutorial I am using a local installation of RStudio Desktop on my own laptop to develop the code. This code will be deployed later to SAP HANA and run on the connected R Server.

Specifically, the laptop must have:

  • An installation of R, in the same version of the R Server that is connected to SAP HANA. The versions should be identical to ensure that the code that was developed will run well on the SAP HANA’s R Server.
  • The R development environment, ie RStudio Desktop.
  • The SAP HANA client to be able to connect from R to SAP HANA. Should you not have the client installed yet, one place to download it is SAP Development Tools.
  • A SAP HANA development environment to deploy the R syntax. I am using the Eclipse-based SAP HANA Modeler.

 

Historic data in SAP HANA

For our hands-on example you need to load the historic data of vehicle registrations into SAP HANA. I used the straight-forward import from SAP HANA Studio: “File” → “Import” → “SAP HANA Content” →  “Data from Local File”. Just make sure to change the DataType of the MONTH column to DATE before uploading, as shown in the screenshot.

The example below assumes that the table is called VEHICLEREGISTRATIONS and exists in the schema ML.

 

A big thanks goes to the European Automobile Manufacturers Association for collecting and providing these registration figures.

 

R freestyle on SAP HANA

Now that the data is in SAP HANA we can start scripting with R. We begin by loading a few rows of the data into R. For our time series forecast we just need the history of a single country to get going. Later on, we will see how the logic is applied to all countries.

Open up the R development environment on your laptop!

 

Execute the following code to load the history of Switzerland into a data frame. Just adjust the connection parameters according to your environment. In my example I am using the JDBC driver. If you are wondering which port to use for the JDBC connection, this blog by Ian Henry and this tutorial by Daniel Wroblewski are a great help.

You may also need to adjust the SELECT statement if your schema or table name are different.

library("RJDBC")
jdbc_driver <- JDBC(driverClass = "com.sap.db.jdbc.Driver",  
                    classPath = "C:/Program Files/SAP/hdbclient/ngdbc.jar")

jdbc_connection <- dbConnect(jdbc_driver,
                             "jdbc:sap://SERVER:PORT", 
                             "YOURUSER", 
                             "ANDYOURPASSWORD")
data <- dbGetQuery(jdbc_connection, "SELECT COUNTRY, MONTH, REGISTRATIONS FROM ML.VEHICLEREGISTRATIONS WHERE VEHICLEREGISTRATIONS.Country = 'Switzerland' ORDER BY MONTH ASC")

 

The historic registration numbers are now loaded in a data frame. Feel free to use R for some exploratory data analysis on these numbers.

We continue by using this history to create the forecast. The code below is commented, so I will not explain all details here. But overall, a number of important parameters are set first, such as the forecast horizon (12 months, see dates_to_forecast).

The script then uses the forecast package from Rob Hyndman to try out different forecasting methods and tests their accuracy on a hold out sample. The approach with the lowest MAPE (mean absolute percentage error) is then used to create the final forecast of the truly unknown future. This forecast is displayed in a plot.

You may wonder why the R code is printing a status update, specifying which time series (meaning which country) it is currently working on. Strictly speaking this is not required, but there is a good reason for it anyway that will be revealed later…

# Load required libraries ---------------------------------------------------------
library(lubridate)
library(forecast)
library(MLmetrics)
library(gplots) 

# Specify any parameters, adjust to the use case ----------------------------------
col_name_date     <- "MONTH"   
col_name_measure  <- "REGISTRATIONS"
col_name_segment  <- "COUNTRY"
date_format       <- "%Y-%m-%d"
dates_to_forecast <- 12 
confidence_level  <- 0.95
forecast_methods  <- c('arima', 'ets', 'rwdrift', 'naive') 
frequency         <- 12 # This value must not be changed

# Retrieve the individual columns from the data frame -----------------------------
col_date    <- as.character(data[, col_name_date])
col_measure <- data[, col_name_measure]
col_segment <- data[1, col_name_segment]

# Print status update -------------------------------------------------------------
print(paste("Now starting with: ", col_segment, sep = ""))

# Ensure data is sorted on the date in ascending order  ---------------------------
data <- data [order(data[, col_name_date]), ] 

# Convert time series into ts object (required by forecast function) --------------
start_date  <- as.Date(data[1, col_name_date], date_format)
ts_historic <- ts(data[, col_name_measure], 
                  start = c(year(start_date), month(start_date)),
                  frequency = frequency)

# Keep a hold out sample of forecast length to test forecast accuracy -------------
ts_short      <- head(ts_historic, n = length(ts_historic)-dates_to_forecast)
ts_hold_out   <- tail(ts_historic, n = dates_to_forecast)

# Assess all forecasting methods on their performance on the hold out sample ------
ts_short_mapes <- rep(NA, length(forecast_methods))
for (ii in 1:length(forecast_methods)) {
    stlf_forecast <- stlf(ts_short, 
                        method = forecast_methods[ii], 
                        h = dates_to_forecast, 
                        level = confidence_level)
    ts_short_mapes[ii] <- MAPE(as.numeric(ts_hold_out), stlf_forecast$mean)
}

# Select the best performing method to carry out the final forecast ---------------
forecast_best_mape <- min(ts_short_mapes)
ts_forecast <- stlf(ts_historic, 
                      method = forecast_methods[which(ts_short_mapes == forecast_best_mape)], 
                      h = dates_to_forecast, 
                      level = confidence_level)
plot(ts_forecast)

 

We have a forecast!

In our example we want to write the forecast into a table in SAP HANA. Hence, we create a data frame with the structure and content we would like to store the data in SAP HANA. The following script combines the history with the forecasted values and enriches the dataset with additional context, ie the bounds of the forecast interval, some information on the model type, etc.

# Dates, name of time series (segment) and date type (Actual or Forecast) ---------
dates_all    <- as.character(seq(from = start_date, by = "month",  length.out = length(ts_historic)+dates_to_forecast))
col_segments <- rep(col_segment, length(dates_all))
model_descr  <- rep(paste(ts_forecast$method, "- MAPE:",  round(forecast_best_mape, 3)), length(dates_all))
date_types   <- as.character(c(rep("Actual", length(ts_historic)), rep("Forecast", dates_to_forecast)))

# Actual and historic measures ----------------------------------------------------
forecast_mean   <- rep(NA, dates_to_forecast)
forecast_mean   <- ts_forecast$mean
forecast_upper  <- ts_forecast$upper
forecast_lower  <- ts_forecast$lower
dates_all_mean  <- as.numeric(c(as.numeric(ts_historic), as.numeric(forecast_mean)))
dates_all_lower <- as.numeric(c(rep(NA, length(ts_historic)), as.numeric(forecast_lower)))
dates_all_upper <- as.numeric(c(rep(NA, length(ts_historic)), as.numeric(forecast_upper)))

# Return the combined data --------------------------------------------------------
result <- data.frame(SEGMENT = col_segments,  
                     MONTH = dates_all, 
                     MEASURETYPE = date_types, 
                     MEASURE = dates_all_mean, 
                     MEASURELOWER = dates_all_lower, 
                     MEASUREUPPER = dates_all_upper, 
                     MODEL = model_descr)
result

 

The output for Switzerland:

 

You may want to change the filter in the SELECT statement and re-run the code to produce the forecasts for different countries.

Eventually we need to deploy this code in a more governed context. SAP HANA and R should produce the forecasts for all countries without requiring the laptop. Let’s see how to deploy this syntax in SAP HANA and how to apply it dynamically for each country.

 

R deployment on SAP HANA

Now open your modelling environment for SAP HANA. I am using the Eclipse-based SAP HANA Modeler. Go into the SQL console and specify the schema into which you want to write the predictions for the various countries. Then create the table itself, in the structure of the data frame created by R. I called it VEHICLEREGISTRATIONS_FORECAST.

SET SCHEMA ML;

--- Create table for the output
DROP TABLE "VEHICLEREGISTRATIONS_FORECAST";
CREATE COLUMN TABLE "VEHICLEREGISTRATIONS_FORECAST"(
SEGMENT NVARCHAR(14),
MONTH DATE,
MEASURETYPE NVARCHAR(20),
MEASURE DECIMAL,
MEASURELOWER DECIMAL,
MEASUREUPPER DECIMAL,
MODEL NVARCHAR(100));

 

Now create an RLANG procedure with the R code we wrote earlier in RStudio. The JDBC connection does not need to be specified since the R code is now maintained by SAP HANA, which will also provide the data to the procedure.

The procedure’s parameters specify that:

  • data in the structure of the historic VEHICLEREGISTRATIONS table will be passed into it (input data)
  • and that the procedure returns data in in the structure of the VEHICLEREGISTRATIONS_FORECAST table (output data)

When the RLANG procedure is called, SAP HANA will pass the historic data and the R code to the R Server. The “CREATE PROCEDURE” statement below specifies in my example that the input data will be available to the R code as data frame named “data” (see the “IN data” element). The statement also specifies that the RLANG procedure will return the content of a data frame called “result” (see the “OUT result” element”. Note that you can use different variable names in in your own projects. You don’t have to use “data” and “result”.

Since the procedure will return the data, the line that was plotting the chart is removed. Apart from this, it is exactly the code we created earlier in RStudio. Since the code is using a few additional libraries you have to make sure that these libraries are also installed on the R Server.

--- Procedure for a single monthly time series
DROP PROCEDURE SINGLE_FORECAST_MONTHLY_R;
CREATE PROCEDURE SINGLE_FORECAST_MONTHLY_R(IN data "ML"."VEHICLEREGISTRATIONS", OUT result "VEHICLEREGISTRATIONS_FORECAST")
LANGUAGE RLANG
AS
BEGIN

# Load required libraries ---------------------------------------------------------
library(lubridate)
library(forecast)
library(MLmetrics)

# Specify any parameters, adjust to the use case ----------------------------------
col_name_date     <- "MONTH"   
col_name_measure  <- "REGISTRATIONS"
col_name_segment  <- "COUNTRY"
date_format       <- "%Y-%m-%d"
dates_to_forecast <- 12 
confidence_level  <- 0.95
forecast_methods  <- c('arima', 'ets', 'rwdrift', 'naive') 
frequency         <- 12 # This value must not be changed

# Retrieve the individual columns from the data frame -----------------------------
col_date    <- as.character(data[, col_name_date])
col_measure <- data[, col_name_measure]
col_segment <- data[1, col_name_segment]

# Print status update -------------------------------------------------------------
print(paste("Now starting with: ", col_segment, sep = ""))

# Ensure data is sorted on the date in ascending order  ---------------------------
data <- data [order(data[, col_name_date]), ] 

# Convert time series into ts object (required by forecast function) --------------
start_date  <- as.Date(data[1, col_name_date], date_format)
ts_historic <- ts(data[, col_name_measure], 
                  start = c(year(start_date), month(start_date)),
                  frequency = frequency)

# Keep a hold out sample of forecast length to test forecast accuracy -------------
ts_short      <- head(ts_historic, n = length(ts_historic)-dates_to_forecast)
ts_hold_out   <- tail(ts_historic, n = dates_to_forecast)

# Assess all forecasting methods on their performance on the hold out sample ------
ts_short_mapes <- rep(NA, length(forecast_methods))
for (ii in 1:length(forecast_methods)) {
  stlf_forecast <- stlf(ts_short, 
                        method = forecast_methods[ii], 
                        h = dates_to_forecast, 
                        level = confidence_level)
  ts_short_mapes[ii] <- MAPE(as.numeric(ts_hold_out), stlf_forecast$mean)
}

# Select the best performing method to carry out the final forecast ---------------
forecast_best_mape <- min(ts_short_mapes)
ts_forecast <- stlf(ts_historic, 
                    method = forecast_methods[which(ts_short_mapes == forecast_best_mape)], 
                    h = dates_to_forecast, 
                    level = confidence_level)

# Dates, name of time series (segment) and date type (Actual or Forecast) ---------
dates_all    <- as.character(seq(from = start_date, by = "month",  length.out = length(ts_historic)+dates_to_forecast))
col_segments <- rep(col_segment, length(dates_all))
model_descr  <- rep(paste(ts_forecast$method, "- MAPE:",  round(forecast_best_mape, 3)), length(dates_all))
date_types   <- as.character(c(rep("Actual", length(ts_historic)), rep("Forecast", dates_to_forecast)))

# Actual and historic measures ----------------------------------------------------
forecast_mean   <- rep(NA, dates_to_forecast)
forecast_mean   <- ts_forecast$mean
forecast_upper  <- ts_forecast$upper
forecast_lower  <- ts_forecast$lower
dates_all_mean  <- as.numeric(c(as.numeric(ts_historic), as.numeric(forecast_mean)))
dates_all_lower <- as.numeric(c(rep(NA, length(ts_historic)), as.numeric(forecast_lower)))
dates_all_upper <- as.numeric(c(rep(NA, length(ts_historic)), as.numeric(forecast_upper)))

# Return the combined data --------------------------------------------------------
result <- data.frame(SEGMENT = col_segments,  
                     MONTH = dates_all, 
                     MEASURETYPE = date_types, 
                     MEASURE = dates_all_mean, 
                     MEASURELOWER = dates_all_lower, 
                     MEASUREUPPER = dates_all_upper, 
                     MODEL = model_descr)
                     
END;

 

The procedure is now available and can be called. Remember that the R code was written for an individual country. This allows us now to pass the data of each country individually into the procedure. Let’s test this with a single country first, before using the concept for all countries.

I found it easiest to split the historic data by creating a temporary table that contains only one specific country. This table is then used as input parameter when calling the RLANG procedure. Since a “?” is used as output parameter, the results are not yet written into a table, they are just displayed in the SAP HANA Modeler.

--- Forecast a single country
DROP TABLE #"tmpTable" ;
CREATE LOCAL TEMPORARY TABLE #"tmpTable" 
	(COUNTRY NVARCHAR(14),
	MONTH DATE,
	REGISTRATIONS INTEGER);
INSERT INTO  #"tmpTable" SELECT COUNTRY, MONTH, REGISTRATIONS FROM ML.VEHICLEREGISTRATIONS WHERE VEHICLEREGISTRATIONS.COUNTRY = 'United Kingdom';
CALL SINGLE_FORECAST_MONTHLY_R (#"tmpTable",?);

 

You should now see the forecast in SAP HANA Studio! SAP HANA has passed the historic data of the United Kingdom and the R code to the R Server. The server did its forecast and returned the output into SAP HANA Modeler. The local installation of R on your own laptop was not used anymore at this point!

 

You don’t have to execute the following statement. It just shows how such a single forecast could be written into our target table. Compare this to the earlier procedure call and you will notice that the “?” was replaced with the output table name and that “WITH OVERVIEW” was added to the end of the statement.

CALL SINGLE_FORECAST_MONTHLY_R (#"tmpTable","VEHICLEREGISTRATIONS_FORECAST") WITH OVERVIEW;
SELECT * FROM "ML"."VEHICLEREGISTRATIONS_FORECAST";

 

By now we can forecast a single country. So, let’s build on this logic to dynamically create a forecast for each country in the historic data.

We just need to create an SQL procedure that calls the RLANG procedure for each individual country. The “WITH OVERVIEW” option we saw above is not supported for nested calls, hence we will implement the logic slightly differently. Instead of writing the forecast directly into the output table, the forecast is first passed into a temporary table. From that temporary table the data is then inserted into the final target table.

The purpose of the iterative approach, passing the data of each country individually, is to make the task digestible for R. If you have very large datasets, you may not want to pass it all at once.

--- Procedure to forecast multiple monthly time series
DROP PROCEDURE MULTIPLE_FORECASTS_MONTHLY_R;
CREATE PROCEDURE MULTIPLE_FORECASTS_MONTHLY_R
LANGUAGE SQLSCRIPT AS

CURSOR cursor_segment (SEGMENT NVARCHAR(256))
FOR SELECT DISTINCT COUNTRY FROM "VEHICLEREGISTRATIONS" ORDER BY COUNTRY ASC;

BEGIN
FOR cursor_row AS cursor_segment('input_value_1') DO
	single_country = SELECT COUNTRY, MONTH, REGISTRATIONS FROM ML.VEHICLEREGISTRATIONS WHERE VEHICLEREGISTRATIONS.COUNTRY = cursor_row."COUNTRY";
	CALL SINGLE_FORECAST_MONTHLY_R (:single_country, tmp_single_forecast) ;
	INSERT INTO "VEHICLEREGISTRATIONS_FORECAST" SELECT * FROM :tmp_single_forecast;
END FOR;
END;

 

We are almost done, just call the procedure to trigger the forecasts.

--- Forecast all monthly time series 
CALL MULTIPLE_FORECASTS_MONTHLY_R();
SELECT * FROM "ML"."VEHICLEREGISTRATIONS_FORECAST";

 

This call will take longer to complete now, as forecasts for 29 different countries are created. And if you want to see what is currently happening in R, you can see the R output in the index server’s trace file. In the screenshot below we see that R just started forecasting the time series for Greece. Now you know why the code we are using is printing a status update.

Being able to see the R output here just requires a small change in configuration as explained in the SAP HANA R Integration Guide.

 

The target table should now include the forecasts for every country that was found in the historic data. The call to this procedure can now be scheduled to create new forecasts when needed.

The code that was produced freestyle in RStudio is now deployed in SAP HANA!

 

Scheduling

The new procedure can now be scheduled to regularly carry out the forecasts. In our example you may want to create a new forecast every month, which takes the latest history into account. There are various options for setting up such a schedule. I will use a command, which can be executed by an external scheduler.

We use the hdbsql application, which is installed with the SAP HANA client, to execute an SQL file that contains all necessary syntax. The external scheduler can then regularly run the hdbsql application, which executes the statements from within the SQL file.

The SQL file can be very simple as we only need to delete the old forecasts and call the function that creates and writes new forecasts in the target table. Save the following Syntax in a file called VehicleForecast.sql.

SET SCHEMA ML;

--- Clear the target table
TRUNCATE TABLE "ML"."VEHICLEREGISTRATIONS_FORECAST";

--- Forecast all monthly time series 
CALL MULTIPLE_FORECASTS_MONTHLY_R();

 

The SQL syntax in that file can be executed in SAP HANA with the hdbsql application. This, of course, requires a logon to SAP HANA. I would not recommend it, but the user name and password could be passed as parameters in clear text to hdbsql.

hdbsql -n SERVER:PORT -u YOURUSER -p ANDYOURPASSWORD -d TENANT -I "C:\VehicleForecast.sql" -m

 

For security reasons we want to avoid this of course and use the hdbuserstore as secure storage for the user name and password instead. Open a command prompt and go to C:\Program Files\SAP\hdbclient. You can save the logon credentials as follows:

hdbuserstore -i SET myuserkey “SERVER:PORT” YOURUSER

 

You can choose under which key you want to store the user name and password. I chose “myuserkey”. Even at this point you don’t have to enter the password in clear text. You will be prompted for it instead.

 

Now the hdbsql application can retrieve the logon credentials from the user store.

hdbsql -U myuserkey -d TENANT -I "C:\VehicleForecast.sql" -m

 

Execute this command and a forecast will be created. Use the scheduler of your choice to regularly update the forecasts.

 

Things to consider

If you have read the blog this far, hopefully it has become clear how R can be integrated with SAP HANA. If you have even implemented the logic yourself, hopefully you managed to execute the code without too much trouble. Well done!

Just to point out, that even though the forecasts have been produced, chances are that more efforts will be needed to use this approach in a common business environment. A few thoughts that come to mind are:

  • The code expects at least 3 years of history to produce the forecasts. It takes the most recent 12 months as hold-out sample, leaving two years to identify seasonality. If you have a shorter history, you might get an error.
  • The code is expecting a complete history without missing data. If there is data missing within the time series, the output will be inaccurate. Similarly, if you have missing data at the end of the time series, the forecast will be out of sync.
  • The code does not allow for additional predictor variables that might help improve the accuracy.
  • The code is tweaked towards monthly forecasting. Different intervals can be handled by R of course, but the code needs to be adjusted.
  • You have to take care of the scheduling process.
  • You don’t have any visualisations showing the predictions.

These points go back to what I mentioned at the beginning of the blog, that I am a big fan of creating predictive models automatically and efficiently with SAP Predictive Analytics. The automated component of SAP Predictive Analytics provides such functionality out of the box.

I also mentioned that it is fun scripting in R. I also enjoy this myself, that’s why I wrote this blog after all! I just want to point out that different projects and different tasks might require different approaches. Automated Predictive Analytics is one of them. R is another.

Happy predicting!

 

More Information

Assigned Tags

      13 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Zhiyi Tang
      Zhiyi Tang

      Hi Andreas,

      thanks very much for this excellent blog. I luckily have done some experiments by these two approaches. I once used local r to connect one system, download the data, do the calculation on PC, then upload the result to another system. But this approach was quite time-consuming and I had to wait dozens of  minutes for the completion of data transferring. So I prefer the second approach.

      Even with the second approach I still encountered some issue - I did some Monte-Carlo simulations on Rserve with the data fetched from HANA database. When the resampling numbers are small (less than 10,000), Using R even has better performance than using PAL. But when the number goes up, the performance of R drops rapidly. In my experiment when this number is over 100,000, R couldn't return any result because the connection time's out. I've tried several (parallelization) methods to overcome this issue, but unfortunately since I'm a C-user and working on an SAP system, my requests of installing further R-packages and opening more Rservers weren't approved. Anyway this issue made me curious about the data communication between HANA and R (with my limited knowledge in IT I could only imagine it was caused by the data communication between HANA and R). In particular I understand R calculation is also in-memory,  why the data communication between them becomes the bottleneck of the process - I even can run these Monte-Carlo on my PC with much more resampling paths.

      I would be interested if you've encountered such issue and what's your suggestion if I want to dig deeper into this topic?

      Thanks in advance!

      Author's profile photo Andreas Forster
      Andreas Forster
      Blog Post Author

       

      Hello Zhiyi,
      Thank you for the feedback!

      This blog on R parallelization is very useful
      https://blogs.sap.com/2016/04/03/parallelization-options-with-the-sap-hana-and-r-integration/

      Also, the new SQLScript Plan Profiler might point you in the right direction.
      https://blogs.sap.com/2018/04/18/whats-new-for-r-integration-in-hana-2.0-sps03/

      Many Greetings
      Andreas

      Author's profile photo Zhiyi Tang
      Zhiyi Tang

      Andreas, Thank you so much for the useful links! Best regards, Zhiyi

      Author's profile photo Lars Breddemann
      Lars Breddemann

      Really nice post, Andreas! Thanks for writing this up.

      One thing that bugs me, though, is that you present the code part for the database login like this:

      dbc_connection <- dbConnect(jdbc_driver,
                                   "jdbc:sap://SERVER:PORT", 
                                   "YOURUSER", 
                                   "ANDYOURPASSWORD")

      While this does not result in an error message it is the worst option to use; even though it has been shown in all those other blog posts and the documentation as well.

      We're in 2018 and putting login information into the R script code has to be considered a bug.

      Instead, why not use the HANA secure store?

      dbc_connection <- dbConnect(jdbc_driver, "jdbc:sap://?key=myuserkey")

      works just as well and makes use of the hdbuserstore key "myuserkey".

      This approach also simplifies deployment of scripts to other users or production servers.

      Also, if the login data have to be changed, this only needs to be done once.

      The "key" feature is also documented (https://help.sap.com/viewer/0eec0d68141541d1b07893a39944924e/2.0.03/en-US/109397c2206a4ab2a5386d494f4cf75e.html) and available with HANA 2 JDBC drivers (which can be used to access HANA 1 systems as well).

       

      Author's profile photo Andreas Forster
      Andreas Forster
      Blog Post Author

      Hi Lars,

      Yes, absolutely. I am using the secure user store myself. I took the shortcut in the blog to focus on the scripting. If anyone wants to leverage this, you need to add the SAP HANA user name and password to your secure user store.

      This can be done with the hdbuserstore, which is installed with the SAP HANA client. You may find it on your laptop for example in C:\Program Files\SAP\hdbclient

      hdbuserstore -i SET myuserkey "SERVER:PORT" YOURUSER

      You will be prompted to enter the user's password.

      Now you can logon to SAP HANA in your local R environment as described by Lars:

      library("RJDBC")
      jdbc_driver <- JDBC(driverClass = "com.sap.db.jdbc.Driver",
      classPath = "C:/Program Files/SAP/hdbclient/ngdbc.jar")
      jdbc_connection <- dbConnect(jdbc_driver, "jdbc:sap://?key=myuserkey")

      Many Greetings
      Andreas

      Author's profile photo Guoquan Xing
      Guoquan Xing

      Hi Andreas,

      Is there any plan for python integration to SAP HANA?

      Regards,

      Guoquan

      Author's profile photo Andreas Forster
      Andreas Forster
      Blog Post Author

      Hello Guoquan,
      Yes, absolutely. Today the SAP HANA client already contains a Python client.
      https://blogs.sap.com/2017/07/26/sap-hana-2.0-sps02-new-feature-updated-python-driver/

      There are also plans to release a Python client API
      For more details please see the "SAP HANA Road Map" on https://www.sap.com/products/roadmaps.html

      Author's profile photo Singh Avinash Pratap
      Singh Avinash Pratap

      Hi Andreas,

      Thank you so much for detailed description. I tried the same codes to learn. However, While plotting the graph. I am getting error figure margins too large.

       

      Can you help me on this issue?

       

      Thanks ,

      Avinash Singh

      Author's profile photo Andreas Forster
      Andreas Forster
      Blog Post Author

      Hello Avinash,
      I have seen this message when the plot window in R was too small. If you are using RStudio, just try increasing the height and width of the "Plots" frame on the bottom right.

      If that doesnt help, please try
      dev.off()
      followed by the plots command
      plot(ts_forecast)

      Many Greetings
      Andreas

       

      Author's profile photo Singh Avinash Pratap
      Singh Avinash Pratap

      Thank you so much Andreas!

       

      It worked for me.

       

       

      Author's profile photo David Bertsche
      David Bertsche

      Hi Andreas,

      Can an R Server be set up to work with HANA on Cloud or only with HANA on Premise? This is not mentioned in the Integration Guide.

      Thanks,

      David

      Author's profile photo Andreas Forster
      Andreas Forster
      Blog Post Author

      Hello David, If the HANA system is "just" hosted, ie a full system on AWS for example, then I believe you can attach the R server just like HANA was on-premise. With HANA as a Service it might be different. Please ping me directly if you would like to know for sure and we can discuss the exact requirements.
      There is also another option now to deploy R syntax in the cloud. SAP Data Hub and our ML engines are being merged into SAP Data Intelligence. It's a cloud solution, in beta at the moment and I already had the chance to deploy the code from this blog in it. The implementation is very similar to this hands-on guide
      https://blogs.sap.com/2018/12/10/sap-data-hub-and-r-time-series-forecasting/
      Many Greetings, Andreas

      Author's profile photo David Bertsche
      David Bertsche

      Hello Andreas -- Thanks for your quick reply, would be great to connect with you directly for a detailed conversation -- I looked for you in the SAP chat/ direct message tool, but I didn't find you. What is the best way to connect? -- Best regards, David