Train and publish machine learning models on SAP L...

former_member601964 · ‎09-26-2019

Welcome to the series of blog posts on training machine learning models on SAP Leonardo ML Foundation. In this series, I will start showing how to submit jobs on ML Foundation manually to building an API that can train machine learning models via jobs as well as use those models for live prediction.

Previous Blog Post: Train and publish machine learning models on SAP Leonardo ML Foundation (TYOM) – Part 1

Blog Post Content: Building an API to automate the process of model training job submissions and using the trained models for run time predictions.

GitHub Repository: ML-Foundation-TYOM-POC

Prerequisites:

SAP Cloud Foundry Account (trial account will work with this).

SAP CP Cloud Foundry CLI.

SAP Leonardo Ml Foundation CLI plugin.

Basic knowledge of implementing machine learning algorithms using Scikit-learn (Python).

Knowledge of Python Flask services.

Postman for sending requests

Once all the prerequisites are met and required CLI and plugins are installed, we can get started with building the API. Also, in this blog post we are not going to go over anything like setting up the environment, initializing system etc. So, it is advised that you go through the previous blog post mentioned above before starting this one.

Setting Up the Application

Once you have cloned the GitHub repository, we can get started with setting up the application. If your machine learning foundation instance name is "mlf", then you can straightway deploy the application with the command "cf push". If the instance name is different then you have to change it in the manifest file of the application.

This much is enough to get the application properly running on Cloud Foundry. The datasets required to train models are already available in the repository itself.

After it gets deployed, we can run the command "cf env tyompoc" to check the environment of the application where under "VCAP_SERVICES" we would able to see the "mlf" service with all the information we have seen while creating service keys. As this is available in the application environment, so we will be able to fetch all the required information from the environment itself.

Understanding the Flask Application

I will advise you to clone the GitHub repository to get the application we will be using in this tutorial. This application follows the proper stack for machine learning model training to model inference. So, lets get started by visiting the different components.

Model Training File

This file is very much similar to last time when we performed manual job submission. Only change here is that, we are also going to save the object of "StandardScaler" which would be used to scale the columns before training. Because we would have to transform the run time input as well based on scale parameters before predicting the result.

Helper Files

Under the folder "utils/", we can see 4 different helper files which contain functions that are used in the controller file endpoint functions.

JWT Helper

This file contains just one function which takes authentication url, client id and client secret as parameters and returns the header which contains the bearer token required for accessing any ML Foundation service.

def get_ml_request_header(xsuaa_base_url, client_id, client_secret):

    response = requests.get(url= xsuaa_base_url + "/oauth/token",

                            params= {"grant_type": "client_credentials"},

                            auth= (client_id, client_secret))

    if response.status_code == 200:

        access_token = response.json()["access_token"]

        return {"Authorization": "Bearer {}".format(access_token), "Accept": "application/json"}

    else:

        return {"message": "Something went wrong"}

VCAP Helper

This file also contains just one function which is responsible for returning values of required environment variables in this use case. In this case the function gets "VCAP_SERVICES" variable and fetches ML Foundation details like Authentication url, client id, client secret, job submission url etc.

def get_mlf_env_variables():

    vcap_services = json.loads(str(os.getenv("VCAP_SERVICES", {})))

    if MLF_NAME in vcap_services.keys():

        client_id = vcap_services.get(MLF_NAME)[0].get(CREDENTIALS).get(CLIENT_ID)

        client_secret = vcap_services.get(MLF_NAME)[0].get(CREDENTIALS).get(CLIENT_SECRET)

        authentication_url = vcap_services.get(MLF_NAME)[0].get(CREDENTIALS).get(AUTHENTICATION_URL)

        job_url = vcap_services.get(MLF_NAME)[0].get(CREDENTIALS).get(SERVICE_URLS).get(JOB_SUBMISSION_URL)

        return client_id, client_secret, authentication_url, job_url, API_VERSION

    else:

        return None, None, None, None, None

Minio Helper

As we already know by now that, Minio is the native file system that is used by ML Foundation. So, we also need to deal with it. I have defined functions to do things like putting any file in job specific folder, downloading a file from the same folder etc. The upload function is used before training the model to upload datasets, code files, requirements files etc to the job specific folder on Minio. Similarly the download function is used to fetch the model from Minio and use it for run time prediction.

Job Helper

This helper class again contains just one function as of now which is responsible for submitting the job in the ML Foundation. This function takes as parameter the authentication details, job submission url and API version and returns the job status details after submission.

def submit_job(job_submission_url, api_version, authentication_url, client_id, client_secret, job_id, job_description):

    url = "{}/api/{}/jobs/{}".format(job_submission_url, api_version, job_id)

    response = requests.post(url=url, data=job_description, headers=get_ml_request_header(authentication_url, client_id, client_secret))

    if response.status_code in (200, 201, 202):

        logger.error("Training has been Invoked")

        return response.json()

    else:

        logger.error("Couldn't submit job with job-id {} ".format(job_id))

        return {}

Controller

This file contains all the endpoints for training models as well as predicting the required class. As of now, there are just two endpoints for the two mentioned tasks.

Inside /train

This endpoint triggers a function which is responsible for reading environment variables of the application, fetching Minio credentials, uploading required files to Minio and finally triggering the job. This job at the end of model training stores the model file in the same Minio folder from where we can access it.

@app.route('/train', methods=['GET'])

def train_model():

	global current_job_id

	logger.info("Inside /train")

	client_id, client_secret, authentication_url, job_url, API_VERSION = get_mlf_env_variables()

	end_point, access_key, secret_key = get_minio_credentials(job_url, API_VERSION, authentication_url, client_id, client_secret)

	job_id = uuid.uuid4()

	job_name = "Training"

	remote_data_dir = "jobs/{}-{}".format(job_name, str(job_id))

	# Upload the dataset to Minio

	upload_file_to_minio("datasets/SocialMediaAdv.csv", "{}/SocialMediaAdv.csv".format(remote_data_dir), end_point, access_key, secret_key, csv = True)



	# Upload Job code files to Minio

	upload_file_to_minio("utils/code/model_training.py", "{}/model_training.py".format(remote_data_dir), end_point, access_key, secret_key, csv = False)

	upload_file_to_minio("utils/code/requirements.txt", "{}/requirements.txt".format(remote_data_dir), end_point, access_key, secret_key, csv = False)



	# Defining the Job Configuration

	job_config = {

    	'job': {

        	'name': job_name,

            'env': [

				{

					"name" : "job_name",

                	"value" : job_name

				},

				{

					"name": "job_id",

					"value": str(job_id)

				}

			],

            'execution': {

                'command': 'pip3 install -r requirements.txt && python3 model_training.py',

                'completionTime': 10000,

                'image': 'ml-foundation/sklearn:0.19.1-py3',

				'resourcePlanId': 'basic',

                'retries': 0

            }

        }

    }

	job_description = yaml.dump(yaml.load(json.dumps(job_config), Loader=yaml.FullLoader))

	job_status = submit_job(job_url, API_VERSION, authentication_url, client_id, client_secret, job_id, job_description)



	logger.error(job_status)

	if job_status["status"] in ["RUNNING", "PENDING"]:

		logger.info("Job submitted with id {} and status {}".format(str(job_id), job_status["status"]))

		current_job_id = job_id

		return app.response_class(

			response=json.dumps({"message": "Job submitted with id {} and status {}".format(str(job_id), job_status["status"])}),

	        status=200,

	        mimetype='application/json'

		)

	else:

		return app.response_class(

			response=json.dumps({"message": "Job couldn't be submitted"}),

	        status=200,

	        mimetype='application/json'

		)

The resource plan Id that we are using in this case is "basic" as ML Foundation on trial account only supports basic plan. When we run a job on ML Foundation, in Minio under the bucket called "data" a folder called "jobs" is created for the first time within which another folder is created each time with the name "<JOB_NAME>-<JOB_ID>". All the files corresponding to a job are stored in this folder only. So, when model training is finished the model is saved in this folder to be used later on.

Inside /predict

This endpoint requires a POST call with "age" and "salary" values in the body. It then fetches the model from Minio and applies it on the new run time data to predict the class. That's it and the flow is completed.

@app.route('/predict', methods=['POST'])

def predict():

	global current_job_id

	global class_def

	logger.info("Inside /predict")

	client_id, client_secret, authentication_url, job_url, API_VERSION = get_mlf_env_variables()

	end_point, access_key, secret_key = get_minio_credentials(job_url, API_VERSION, authentication_url, client_id, client_secret)

	payload = request.json

	if "age" not in payload.keys() or "salary" not in payload.keys():

		return app.response_class(

			response=json.dumps({"message": "Insufficient number of parameters sent"}),

			status=200,

			mimetype="application/json"

		)

	age = payload.get("age")

	salary = payload.get("salary")

	model_path = "jobs/Training-{}/model.pkl".format(str(current_job_id))

	scaler_path = "jobs/Training-{}/scaler.pkl".format(str(current_job_id))

	model = get_file_from_minio(model_path, end_point, access_key, secret_key)

	sc = get_file_from_minio(scaler_path, end_point, access_key, secret_key)

	if model is None or sc is None:

		return app.response_class(

			response=json.dumps({"message": "Cannot provide prediction now for model {}".format(model_path)}),

			status=200,

			mimetype='application/json'

		)

	prediction = model.predict(sc.transform([[age, salary]]))[0]

	return app.response_class(

		response=json.dumps({"prediction": class_def[prediction]}),

		status=200,

		mimetype="application/json"

	)

Application Workflow

Now that we have understood the application's components, lets check out the entire workflow of the application as demonstrated below.

The user starts by training the model. This process first fetches the environment variables from which it gets the MLF service and authentication details. Using this the process fetches Minio file system credentials by calling the training storage API ("<JOB_URL>/api/v2/storage"). These credentials are then used to create a Job folder in Minio in the "data" bucket within "jobs" folder. The folder name as mentioned before should be like "<JOB_NAME>-<JOB_ID>". To make it happen. We create a job id using uuid. After this the process uploads all the job files in that created folder and finally submits the job.

As you can see in the job execution configuration we have mentioned the following.

pip3 install -r requirements.txt && python3 model_training.py

This means that the job will first install all the requirements specified in "requirements.txt" and then run the "model_training.py" file which trains and stores the model.

After this is done, the user can call the predict endpoint. This endpoint expects a POST call with a body as follows.

{

	"age": 65,

	"salary": 20000

}

The model is trained on 2 parameters which are "age" and "salary". This is why prediction also requires 2 parameters. The function for prediction gets these parameters and then similar to the train function fetches Minio Credentials and then fetches the model and StandardScaler object from that same Job Folder. First it applies the scaling transformation on the "age" and "salary" fields and finally predicts the class.

This is the entire flow of the application. Here we use ML Foundation to train our custom coded models as well as store those models in the Minio file system to be used later for run time prediction. There are also other ways to deploy models using the model repository. But this is one of the best ways to do it if model is very much customized based on needs. If you liked the series feel free to provide feedback.

Previous Blog Post: Train and publish machine learning models on SAP Leonardo ML Foundation (TYOM) – Part 1

References

SAP CP Cloud Foundry CLI.

SAP Leonardo sapml CLI.

SAPML resource plan Id list.

GitHub Repository.