Skip to Content
Technical Articles
Author's profile photo Tetsuya Haneishi

SAP Data Intelligence: The difference between “local” Jupyter Notebook dev and “SAP Data Intelligence”

A purpose of this blog post is to explain how to use Jupyter Notebook on our SAP Data Intelligence and the difference between “local” Jupyter Notebook dev and “SAP Data Intelligence”

The major difference is almost only the part that depend on the local environment such as the path to files(Imagine a file specified by a command like “pd.read_csv” that most data scientists have experience to use.) and libraries etc, so I am convinced that data scientists who already use Jupyter Notebook can smoothly use Jupyter notebook on SAP Data Intelligence:)

One use case for machine learning here. We can build an image similarity scoring page on your online shop. Once we create ML model with the image similarity scoring API on SAP Data Intelligence, the ML model can detect which product is uploaded by user.

I will now explain how to use Jupyter Notebook on SAP Data Intelligence.

■How to install libraries
Begin by installing the scikit-learn library, which is very popular for Machine Learning in Python on tabular data such as ours.

Got an error because the necessary libraries are not installed.
But we can install it as usual like pip install command as below.

import numpy as np
import pandas as pd
from sklearn import feature_extraction, linear_model, model_selection, preprocessing
pip install sklearn

 

After installing sklearn, we can see sklearn imported as Jupyter Notebook behave on local environment.

■Upload .csv from local laptop
We can use the pre-defined connection for the DI Data Lake to upload .csv.
In Metadata Explore menu as below, clicking “shared” folder>>View preparations>>Upload file(upper right on the page).
After that, data configuration page appear.

Those 3 csv was uploaded.

The path to a folder can be seen at Metadata Explorer.

In this case, the path to the folder was shown in the code. And the code snippet below can be used to access the files and create a combined data frame from them.

!pip install hdfs

from hdfs import InsecureClient
client = InsecureClient('http://datalake:50070')

client.status("/")
fnames=client.list('/shared/MY_CSV_FILES')

import pandas as pd
data = pd.DataFrame()
for f in fnames:
   with client.read('/shared/MY_CSV_FILES/' + f, encoding='utf-8') as reader:
    data_file = pd.read_csv(reader)
    data = pd.concat([data_file,data])

 

The rest of the development on the Jupyter Notebook is as simple as writing the required code:

■Conclusion
For data scientists, you’ve found that the analysis and development on SAP Data Intelligence is no different from the local environment.
In addition to that, we no longer need to waste a time rebuilding the development environment associated with the laptop OS update.

※If you’re Mac user
I guess you had to rebuild the Anaconda environment when you updated to MacOS Catalina because the folder relocation happen, then we need to fix it.
https://www.anaconda.com/how-to-restore-anaconda-after-macos-catalina-update/

Thank you for reading this blog post.

Assigned Tags

      5 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Shantanu Singh
      Shantanu Singh

      Hello Tetsuya,

      Nice article.

      How can we access the training server for data intelligence. I am a functional consultant and a machine learning trainee. Now, Data intelligence is amalgamation of both.  And, I did completed the OpenSAP course on DI

      Is there any way to access the trial version of SAP Data Intelligence.

      Any assistance is much appreciated.

      Regards,

      Shantanu

      +91 9971906154

      Author's profile photo Tetsuya Haneishi
      Tetsuya Haneishi
      Blog Post Author

      Hi Shantanu,

      Thank you for comment.
      I’m asking the person in charge now, so I’ll comment again when I’ll get the result.

      Best,
      Tetsuya

      Author's profile photo Tetsuya Haneishi
      Tetsuya Haneishi
      Blog Post Author
      Hi Shantanu,

      Enterprise trial is for customers, SAP employees should go for the demos in the demo store, or for CAL images when they need to do something custom.
      Best,
      Tetsuya
      Author's profile photo Remi ASTIER
      Remi ASTIER

      Beyond those basic technical differences, an important one is connection management !

      Data scientists could access all systems registered for that DI tenant without knowing the password. In a local jupyer env, passwords are often known and appear in clear text.

      Author's profile photo Hamza ZAIM
      Hamza ZAIM

      Hello Tetsuya,

      It's a very good article!

      In my case I want to import one file only each time, without using the loop For.

      I tried to read the use the file path and use the pd.read_csv("FILE_PATH") but it doen't work? Can you please help?

      Thank you.

      Hamza.