From R and Python into the SAP World
Everything started with a harmless phone call before christmas back in 2018. I had no idea which actions this one-hour phone call set in motion. Back then I was studying Quantitative Finance at the University of Kiel and was occupied with lectures and exams about data mining and statistical learning. Sarah Detzler and I talked about my favorite R libraries, use cases and my goals in life. It was not the ordinary job interview and Sarah motivated me from the beginning, sharing her experiences as a data scientist at SAP. One particular use case about a fuzzy statistical distribution called Benford’s law and the idea to combine it with different machine learning algorithms sticked with us. A couple months later I was on my way from northern Germany to Heidelberg to write my master thesis in cooperation with SAP. In this blog I want to share with you my experiences coming out of the opensource world and how this fits together with the SAP world. Surprisingly for me I was able to reuse a lot of my R scripts which I created during my studies and incorporate them into different SAP solutions.
But let’s start from the beginning. The first time you come to the office in Walldorf can be quite overwhelming. In my first weeks I started learning a lot about the different tools from SAP and was able to try them hands on. In addition, I accompanied my new colleagues to their customer meetings and Sarah quickly pushed me out of my comfort zone. One of the first meetings I attended was an internal hackathon in which three different teams competed and tried to solve a use case with SAC Smart Predict. I had many questions about the algorithms and how the tool actually worked. The main focus lies on enabling the business user to integrate augmented analytics in their decision making and visualizing the results. I realized quickly that this exceeds by far the standard plot function I was used to in R or Python. But even in such an SAP tool I was able to integrate and use my open source world through the R Visualization capabilities which led to my first own hands on tutorial, see this link.
At the university I was focused on the statistics and the modeling of the machine learning algorithms but in practice we are not done with just our R or Python script. In reality, important aspects like the data quality or the deployment of the models must be addressed. Hence, I started setting up the R Integration with SAP HANA, which is not only an in-memory database but also comes with different machine learning capabilities. Through a free SAP HANA Express I had my own sandbox system, which was great to play around and try out new things. Honestly, setting up the R Integration was quite tedious and brought me a lot closer to our customers and their pain points. But in the end the effort was worth it, since through the R Integration I was able to stay in my used environment RStudio and use the computing power of the SAP HANA. Before, I was used to only train my ML algorithms locally on my laptop and for example wake up in the morning and hopefully see my results. Hence, I quickly saw the benefits of using the SAP HANA and the native, predefined libraries like the Predictive Analysis Library (PAL). If you want to try it on your own, you are able to get a free HANA Express under this link. Additionally, the following hands on tutorials and information really helped me to get started:
- Machine Learning with SAP HANA – from R
- Machine Learning with SAP HANA & R – Evaluate the Business Value
- Machine Learning with SAP HANA – with R API. Part 2.
Furthermore, I attended many workshops, meetings and innovative formats with our customers and partners especially in the context of SAP Data Intelligence and different machine learning use cases. These travels were always closely related to my master thesis and I was able to gain first experience as a data scientist at SAP.
BI Innovation Days in Frankfurt #GiveDataPurpose
Since we often face diverse IT landscapes, we might not even know where the relevant data for our use case lies or where to incorporate our machine learning model, such that we create real value for the business. Through SAP Data Intelligence we can discover and connect multiple data types regardless where the data resides and orchestrate and execute modular data pipelines across distributed infrastructures. A pipeline consists out of several operators and represents a workflow in which we can incorporate our machine learning models. This is done by for example building an R docker container and configuring an R operator. I was quite proud when my first pipeline was running successfully. The following hands on tutorials and the open SAP course helped me a lot to get started with SAP Data Intelligence:
- Freedom of Data with SAP Data Hub
- SAP Data Hub and R: Time series forecasting
- Data Intelligence integration with SAP Analytics Cloud
During the last year I gained so much experience and I am very thankful for my team here at SAP. They really have become like a second family for me and it is a privilege to work together with some of my closest friends. Of course, there are many people who guided and influenced me in the last year. I especially want to thank Christian Tietz, Stojan Maleschlijski, Jan Fetzer, Sarah Detzler and my manager Christian Scheidel for their support and motivation. In addition, I want to thank Prof. Demetrescu from the University of Kiel for his guidance and support during my master thesis.