Train and publish machine learning models on SAP L...

former_member601964 · ‎09-25-2019

Welcome to the series of blog posts on training machine learning models on SAP Leonardo ML Foundation. In this series, I will start showing how to submit jobs on ML Foundation manually to building an API that can train machine learning models via jobs as well as use those models for live prediction.

Blog Post Content: Manual Submission of ML Foundation Job for training and evaluating a model.

GitHub Repository: ML-Foundation-TYOM-Manual

Prerequisites:

SAP Cloud Foundry Account (trial account will work with this).

SAP CP Cloud Foundry CLI.

SAP Leonardo Ml Foundation CLI plugin.

Basic knowledge of implementing machine learning algorithms using Scikit-learn (Python).

Once we are done setting up everything mentioned above we can go ahead with the tutorial. I will also suggest you to clone the GitHub repository mentioned above and use the configuration and code files there while going through this post. Lets get started.

Setting Up ML-Foundation Instance

In this step we are going to create an instance of ML-Foundation service available on Cloud Foundry and set up the environment for rest of the tutorial. We will be using the trial account.

Creating Instance

First, we need to make sure that we are entitled to create instances of ML Foundation. For that, open your trial account and go to the default sub account. Here, click on the "Entitlements" from the left panel and check if the service "SAP Leonardo Machine Learning Foundation Beta Services Trial" is available. If it's there we are good to go. As per now, we can't manually configure the number of service instances that can be created, but if it's there in your case make sure to configure it to be at least 1.

Now, go to the default "dev" space in the "trial" sub account and click on "Service Marketplace" from the left panel and open the service "SAP Leonardo Machine Learning Foundation". Go to instances and now we will create a new one.

Click on "New Instance" and go through the process to create it. It's not required to bind the service to any application yet. At the end provide some name and click on "Finish".

Now, as we haven't bind the service to any application, we have to create a service key manually to access the service. Open the "mlf" instance and go to "Service Keys". Click on "Create Service Key", provide a name and save it. We can see that a new service key is created and has all the parameters like different service URLs provided by ML foundation, authentication parameters and so on. With this we are done with setting up the environment from Cloud Foundry dashboard perspective.

Setting Up SAPML

Now, that we have the instance ready lets start setting up the environment using "sapml" plugin. You can follow the guide mentioned at the top to download and install the "sapml" plugin. Now, we need to run one command as follows to set up the remote file system.

cf sapml fs init

After running it, we can see in the console that a "Minio" file system has been assigned and it has returned the file system URL and authentication details like client id and client secret. You can check the file system remotely using those credentials.

Submitting Model Training Job

Now, lets get started with submitting the job to train and save the models. As mentioned before, please use the repository to get the code and configuration files required.

Understanding the files

In the repository, there is one folder called "code". This folder contains the python file where all the logic to train and save the model is written. The dataset on which the model will be trained is also there in the same folder in CSV format. Then there is the "requirements.txt" file to list down everything that we are going to use in the "model_training.py" file. Outside the "code" folder, we have the YAML configuration file for the job which contains the following.

job:

  name: "model-training"

  execution:

    image: "ml-foundation/sklearn:0.19.1-py3"

    command: "pip3 install -r requirements.txt && python3 model_training.py"

    completionTime: "10000" 

    resourcePlanId: "basic"

As we can see, the file contains the name of the job submitted which is "model-training" ("_" is not allowed in the job name as of now). Then we have the "execution" parameter that contains run time environment information of the job like which image of Scikit-learn to use, which files should be called and in which sequence, and finally "resourcePlanId" which specifies the machine configuration. For trial account only "basic" plan is provided. The different available plans can be found out on this internal GitHub page.

Submitting Job

The "model_training.py" file contains code that will train a model on the dataset and save it in Minio. To perform additional tasks you can modify the file. To submit the job in ML foundation we can run the following command in the repository folder.

cf sapml job submit code -f configuration.yaml

Once done, we can check the job status using the sapml CLI tool by running the following command.

cf sapml job status

It's most likely that the job will be in "PENDING" status as it takes at least 5 minutes on trial account for "basic" plan resources to get allocated for the job. Once, we see the job in "RUNNING" state we can check the job related logs using the following command.

cf sapml job logs <JOB_ID>

To check the logs continuously we can append "-f" flag at the end as follows.

cf sapml job logs <JOB_ID> -f

Now, we will be able to see all the execution logs from the model training job.

Navigating though Minio

Initially when we had initialized the file system, we had received the URL and credentials to log in. So, lets do it. Once we are in Minio, we will be able to see a folder called "jobs" within the bucket "data".

Now, open the jobs folder and we will see another folder with the name "model_training-<JOB_ID>". Open the folder and you will be able to see all the files that we submitted as well as files that are dumped by the job like the "model.pkl" file.

Now, we can download the model file if required and also host it there only. This is how we run manual jobs on SAP Leonardo ML Foundation. But as we saw that a lot of things were required to be done manually, most importantly submission of job. ML Foundation also provides APIs using which we can achieve the same thing but in a more efficient and productive way. In subsequent tutorials we are going to see how to set up a Python Flask API to submit model training jobs to ML Foundation and also use the created model for run time predictions.

Next in Series: Train and publish machine learning models on SAP Leonardo ML Foundation (TYOM) – Part 2

References

SAP CP Cloud Foundry CLI.

SAP Leonardo sapml CLI.

SAPML resource plan Id list.

GitHub Repository.