Skip to Content
Technical Articles

SAP Data Intelligence Cloud : How to use R Kernel in Notebook?

You might already know that you can build complex Machine Learning scenarios as part of your Data Value Journey with SAP Data Intelligence . If not then look at the blog post by Andreas Forster  and Ingo Peter on how to create your first ML scenario in SAP Data Intelligence using Python & R :

SAP Data Intelligence: Create your first ML Scenario

SAP Data Intelligence: Create your first ML Scenario with R

In this blog post you will learn how to start R Kernel inside Notebook in SAP Data Intelligence Cloud to perform Data exploration and Free-Style Data Science and also to test R code before using it in the R Operator available in pipeline modeler.

Although R Kernel is not yet officially available in SAP Data Intelligence Cloud notebook (as of version 2010.29.9) , this blog post will guide you to install one using “conda” on python and will also present a way to load data sets stored in DI DATA LAKE using R library – “reticulate”.

Launch Notebook inside your ML SCENARIO

Launch a Notebook session inside your ML Scenario:

Select Python 3 Kernel :

 

Install R Kernel

Use below “Conda” command to install the R-Kernel :

Note -y option here is for the silent install. (else we get prompted to enter “y”/”yes” to begin the install)

!conda install -y -c  r r-irkernel

All required/dependent packages will be installed and the final summary is as below :

 

Install Required R Packages

Before you begin using the newly installed R kernel, remember that the base R install won’t have all packages that you need ,for example look at below R code snippet requiring dplyr, sqldf,readxl etc

library("sqldf")
library("dplyr")
library("cluster")
library("tidyverse")
library("reshape2")
library("excel.link")
library("RODBC")
library("randomForest")
library("car")

Let’s first learn how to install such R packages.

Execute below command in the python 3 kernel notebook session (same as the one on which we installed R-kernel) to install “dplyr” and this is same as doing install.packages(“dplyr”) on the R kernel :

conda install r-dplyr

Follow the same for other required packages:

conda install r-reshape2
conda install r-RODBC

Launch a New Notebook with R Kernel

Select the Newly installed R-Kernel:

Or use the Launcher to create a new notebook with R kernel :

Let’s follow the IRIS R Example to use the R Kernel :

Load the Data:

library(datasets)
data(iris)
summary(iris)
names(iris) <- toupper(names(iris))
library(dplyr)


A basic Filter:

setosa <- filter(iris, SPECIES == "setosa")
head(setosa)

sepalLength5 <- filter(iris, SPECIES == "setosa", SEPAL.LENGTH > 5)
tail(sepalLength5) 

Loading Data from a local CSV ( CSV uploaded in the Jupyter lab session) :

SFO landings Data Reference: https://data.sfgov.org/Transportation/Air-Traffic-Landings-Statistics/fpux-q53t

Loading Data from SAP DI Data Lake

We will make use of R library “reticulate”  (library(reticulate)) to access artifacts stored in DI DATA LAKE using a python script , again this is a workaround which lets us access the shared artifacts.

First install the R reticulate library using the python 3 kernel with conda as below :

conda install r-reticulate

Create a new Text file to create a python script and insert below code cell which imports python packages ( sapdi , pandas) , and further defines a function to read a file from DI data lake :

import pandas as pd
import sapdi

def read_FILE_DL():
	ws=sapdi.get_workspace(name='Air_Traffic_SFO')
	dc=ws.get_datacollection(name='LANDINGS')
	with dc.open('Air_Traffic_Landings_Statistics.csv').get_reader() as reader:
		landings = pd.read_csv(reader)
	return landings



Now Open the Notebook with R-kernel ,Restart the R Kernel before running below commands.

use below commands to load the Data :

library(reticulate)
source_python("load_data.py")
landings <- read_FILE_DL()
landings

Using the “MARATHON” dataset example

Refer Ingo’s Blog post  and for Dataset use the blog post : SAP Data Intelligence: Create your first ML Scenario

df_train <- read.csv(file="RunningTimes.txt", header=TRUE, sep=";")
 head(df_train)

ID	HALFMARATHON_MINUTES	MARATHON_MINUTES
1	73	149
2	74	154
3	78	158
4	73	165
5	74	172
6	84	173

str(df_train)

'data.frame':	117 obs. of  3 variables:
 $ ID                  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ HALFMARATHON_MINUTES: int  73 74 78 73 74 84 85 86 89 88 ...
 $ MARATHON_MINUTES    : int  149 154 158 165 172 173 176 177 177 177 ...


lm.fit<-lm(MARATHON_MINUTES~HALFMARATHON_MINUTES,data=df_train)
str(lm.fit)
List of 12
 $ coefficients : Named num [1:2] -6.01 2.25
  ..- attr(*, "names")= chr [1:2] "(Intercept)" "HALFMARATHON_MINUTES"
 $ residuals    : Named num [1:117] -9.18 -6.43 -11.43 6.82 11.57 ...
  ..- attr(*, "names")= chr [1:117] "1" "2" "3" "4" ...
 $ effects      : Named num [1:117] -2361.82 294.44 -9.94 8.49 13.21 ...
  ..- attr(*, "names")= chr [1:117] "(Intercept)" "HALFMARATHON_MINUTES" "" "" ...
 $ rank         : int 2
 $ fitted.values: Named num [1:117] 158 160 169 158 160 ...
  ..- attr(*, "names")= chr [1:117] "1" "2" "3" "4" ...
 $ assign       : int [1:2] 0 1
 $ qr           :List of 5
  ..$ qr   : num [1:117, 1:2] -10.8167 0.0925 0.0925 0.0925 0.0925 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:117] "1" "2" "3" "4" ...
  .. .. ..$ : chr [1:2] "(Intercept)" "HALFMARATHON_MINUTES"
  .. ..- attr(*, "assign")= int [1:2] 0 1
  ..$ qraux: num [1:2] 1.09 1.18
  ..$ pivot: int [1:2] 1 2
  ..$ tol  : num 1e-07
  ..$ rank : int 2.

///

 

SUMMARY

Its often a need in projects to quickly test R Code snippets before we can use them in Data Pipelines, so I hope that above workaround to have a working R kernel in Notebook within SAP Data Intelligence will be helpful.

I would also reiterate that the solution above serves as a workaround to test some of the R code snippets in notebook in SAP DI, and that the R SDK for SAP DI is not yet available ( unlike Python SDK for SAP DI) hence the artifacts stored in SAP DI data lake cannot be directly referenced using R kernel among other limitations( e.g. as is done using “sapdi” python library in SAP Data Intelligence)

Please follow the tag SAP Data Intelligence to get notified on latest updates, or visit the SAP Community page for SAP Data Intelligence to keep learning more: https://community.sap.com/topics/data-intelligence

5 Comments
You must be Logged on to comment or reply to a post.
  • Hello Vinay, that's really good to know but after executing the steps i am always getting the following error, even after i have restarted the kernel or created a new notebook from scratch. Any idea?

     

    The kernel for 057596fe-f23e-44bc-aac1-185a427834bd/notebooks/10_Loading_Datasets.ipynb appears to have died. It will restart automatically.

     

    Thanks

    • Hi,

      Thanks for your comment, you may try restarting the Jupyter lab application from the "System Management" Application :

       

      Thanks!

      /
      • Hi, Thanks for the answer, but I have tried several times to do it but after restarting the Jupyter Lab the kernel R dissapear and I have to execute again the steps to install the R kernel. the output of the conda info is as follows:

             active environment : None
               user config file : /home/labuser/.condarc
         populated config files : /opt/conda/.condarc
                  conda version : 4.7.10
            conda-build version : not installed
                 python version : 3.7.3.final.0
               virtual packages : 
               base environment : /opt/conda  (writable)
                   channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                                  https://conda.anaconda.org/conda-forge/noarch
                                  https://repo.anaconda.com/pkgs/main/linux-64
                                  https://repo.anaconda.com/pkgs/main/noarch
                                  https://repo.anaconda.com/pkgs/r/linux-64
                                  https://repo.anaconda.com/pkgs/r/noarch
                  package cache : /opt/conda/pkgs
                                  /home/labuser/.conda/pkgs
               envs directories : /opt/conda/envs
                                  /home/labuser/.conda/envs
                       platform : linux-64
                     user-agent : conda/4.7.10 requests/2.23.0 CPython/3.7.3 Linux/5.4.0-5-cloud-amd64 sles/15 glibc/2.26
                        UID:GID : 1000:100
                     netrc file : None
                   offline mode : False
        
        
        Best Regards,