Deploying large ML models to the SAP BTP, Cloud Fo...

JBrandt · ‎02-08-2023

You are interested in Natural Language Processing and want to see an example of how to utilize large multi language models for text classification?

You are interested in deploying Machine Learning models to the cloud and want to learn about an option to directly run model inference in the SAP BTP, Cloud Foundry Environment?

Well, in both cases I can provide you with some answers in this blog post 🙂.

Introduction

I’m working as a Data Scientist at sovanta AG in Heidelberg, Germany where we develop cloud native applications for our customers that run on the SAP BTP, including machine learning models in order to automate specific processes or analyze the available data.

In this specific application it is necessary to automatically classify text reports in around 30 different languages as being relevant for a further analysis. Labeled training data was available mostly in German and English and to a very small extent in around 10 additional languages.

The target application is running on the SAP BTP, Cloud Foundry Environment of the customer as a Python based Flask REST API (see e.g. this blog post on how to set something like this up), where the final prediction pipeline is available for inference.

As we didn’t have access to one of the SAP provided AI platforms such as the Data Intelligence or the AI Core, we planned on running the prediction service directly in the Python backend in the Cloud Foundry environment while providing the model that was trained offline.

So much for the introduction to the project I was working on 🙂. Let’s next discuss which ML model we chose for this task and why. Then I will tell you about the complications that were arising with this setup and how we overcame them.

I don’t want to claim that the approach taken is the “golden path” in any form 🙂 but surely you can utilize some parts or concepts described later in your next project.

Multi language model

In this project we are facing the problem that we have texts in lots of different languages and we do not have enough training data in each language in order to train one model per language. So we can’t have one model per language but need one large model that covers all languages.

It is interesting to note that a “simple” NLU classification pipeline consisting of a TF-IDF-Vectorizer followed by a linear SVC trained on either German or English texts (where we had lots of training data) works quite well for our task. We achieved F1-Scores around 90% in the respective language, but are of course limited to these languages only.

Introducing:

Pre-trained multi-language models based on transformer neural network architectures

You can find further information about these models e.g. here and here.
And as a very brief summary:

Transformer based language models have been introduced to the NLP community in 2017 and have become quite popular and powerful (ChatGPT as one of the latest examples). They are special neural networks that are pre-trained on a massive amount of text data in different languages, so that the models have “learned” all of these languages and you can use them for tasks such as Masked Language Modeling or Next Sentence Prediction. You can also fine-tune the models on your data for your specific use case, such as text classification, as in our case.

We are using a model called `distilbert-base-multilingual-cased` in its Tensorflow implementation provided by the hugging face platform. This specific model is a smaller more optimized version of the BERT transformer model. It is pre-trained on a concatenation of Wikipedia texts in 104 different languages, including all the languages we needed to cover.

Training

In order to use the model for our specific task of text classification it is necessary to fine tune it with our data. This means feeding it our labeled training data in all available languages. The multilingual nature of the model will then allow it to classify texts also in languages not present in our training data.
The training was executed locally on two GPUs and still took about 2hrs (This is a large beefy ML model. I just want to deter you from the idea of performing the training directly in a Cloud Foundry runtime 🙂). You can find lots of tutorials describing the fine tuning process e.g. here.

The fine tuned model has very satisfying performance 🙂. It achieves similar very high F1-Scores for German and English texts that are most abundant in our training data and F1-Scores starting from 70% for the other available training languages.

General deployment idea and problems

OK, let’s talk about the next step. We have the model trained and we see that it is performing quite well, but how can we deploy it to the SAP BTP, Cloud Foundry environment?

First, we have to save the prediction pipeline with the trained model and best package it to something like a zip file. The resulting file has a size of around 500MB.
Second, we include an endpoint to the already existing Python Flask API that can be called to make predictions based on provided input data. We have to extend the requirements of our Python app to include Tensorflow, since the model implementation depends on that.
Third, we rebuild our multi target application and push it to the Cloud Foundry runtime on the BTP.

So far quite straight forward, right? The deployment should pose no problems, correct? I mean we still are well under the app size limit of 1.5Gb for Cloud Foundry apps…
Well, not really because that’s where we run into some major errors.

The deployment process of a python app to Cloud Foundry works something like this:

Upload the application package to a staging server

Create a so-called droplet out of it in which you download the python buildpack to install Python as well as all necessary dependencies listed in the requirements.

After successful installation, compress this droplet and upload it to another server

From there the droplet is downloaded to the Cloud Foundry space and the app is started.

If you want more information about the process, you can find it here.

Any guesses on where our problems come up?

Step 2

We run out of disk space during the installation of the Python dependencies 🙂. Tensorflow (or alternatively PyTorch) are quite large Python packages, in fact too large to install them in the droplet. (We also tried to install them, while our 500Mb model was not present in the app, but even this fails. There is just not enough space to download and install the package on the staging server.)

Well, this means we have to drop the dependency to Tensorflow of our model.

Step 3

If we don’t install Tensorflow and therefore make it to the end of the installation phase, we fail while compressing the droplet as it is too large. You probably guessed it, because our 500Mb model is present.

So, we can’t deploy the model directly with the app, but have to bypass the pinhole that is the deployment process and make it available in a different way.

Coming up with solutions

Let’s talk about the first issue: Our model depends on Tensorflow, so if we want to make predictions with it, we need Tensorflow installed, but that is too large. So we need a way to save and run our model independent of the framework that we used during training.

Introducing:

ONNX - Open Neural Network Exchange

ONNX does exactly what we need. It provides an open format for representing deep learning models. It makes it possible to train a model in one framework and run it in another, allowing for more flexibility and easier deployment to a variety of platforms. ONNX also provides a way to optimize the models for various hardware accelerators, such as GPUs or TPUs.

You can find more information about ONNX here, converting Tensorflow to ONNX here and the runtime here.

In our case, that means at the end of the training we are converting the Tensorflow model to ONNX in our pipeline, making the model independent of Tensorflow. We have to adjust the prediction pipeline to use the new ONNX runtime and of course gain a dependency to that python package, but we lose the Tensorflow dependency for the deployment.

You can find a really good tutorial for these steps on hugging face, here.

And the second issue? The model converted to ONNX is still large, something like 300Mb. This will cause the same problems as before. Well, we just need a way to store our model object somewhere else during the deployment. On the running Cloud Foundry instance of the app we will have disk space up to 4Gb available, plenty space for our model.

Introducing:

SAP BTP Object Store

The object store does exactly what the name suggests. It stores large objects of unstructured data. It is directly available in the BTP as a service that you can subscribe to (more information here). With the service plan s3-standard the underlying technology just consists of an AWS S3 bucket and with the Python library boto3 we have a way to connect to the bucket from our application. We just have to upload the model after the training is completed and then implement a function in our backend that downloads the latest model from the object store, saves it to the running app instance and loads it to memory from there.

You can create the object store service alongside your python application with the MTA-file and bind them together:

ID: demo

_schema-version: "3.2"

version: 0.0.1

modules:

  - name: demo-python-app

    type: python

    path: python-app

    parameters:

      disk-quota: 4G

      memory: 4G

      buildpack: https://github.com/cloudfoundry/python-buildpack.git

    requires:

      - name: demo-object-store

    provides:

      - name: python-app-binding

        properties:

          url: ${default-url}



resources:

  - name: demo-object-store

    type: org.cloudfoundry.managed-service

    parameters:

      service: objectstore

      service-plan: s3-standard

Then, you can connect to the S3 bucket with the boto3 library using the service key provided by the object store instance:

from cfenv import AppEnv

import boto3



cf_env = AppEnv()

credentials = cf_env.get_service(name="demo-object-store").credentials



s3 = boto3.resource(

    service_name="s3",

    region_name=credentials.get("region"),

    aws_access_key_id=credentials.get("access_key_id"),

    aws_secret_access_key=credentials.get("secret_access_key"),

)

Upload a file with potential metadata like this:

# Upload model to Object Store

# you can provide additional metadata if needed

s3.meta.client.upload_file(

    Filename="path_to_model_file",

    Bucket=credentials.get("bucket"),

    Key="name_for_uploaded_model",

    ExtraArgs={

        "Metadata": {

            "keys": "provide any additional information",

        }

    },

)

Or download the model from the Object Store to the running app instance on the SAP BTP, Cloud Foundry Environment like this:

# Download model from Object Store

# save it to disk on the running app instance

model_object = s3.Object(bucket_name=credentials.get("bucket"), key="name_for_uploaded_model")

model_object.download_file("onnx/model.onnx")

And that’s it! By converting the model to ONNX format and uploading it to an external object store we were able to overcome the problems we faced during the deployment and managed to deploy a large ML model directly to the SAP BTP, Cloud Foundry Environment 🙂.

Quick word on computing performance

Computing predictions with a large ML model is computationally expensive, so you will have to adjust the memory quota on the BTP for your application. A single instance of an app on Cloud Foundry can have a maximum of 8Gb of memory assigned to it. By tuning the batch size for predictions with the model it is possible to stay well below that limit, but you will probably need at least 2Gb of memory.

Summary

In this blog post I have presented one way how You can deploy large machine learning models directly to the SAP BTP, Cloud Foundry Environment. This is not a trivial task and You have to be aware of some limitations, as the Cloud Foundry runtime is not really designed to host a prediction service for a large computationally expensive machine learning model.

Nevertheless, it is possible with the approach presented here and even if you will not use it yourself, I hope you at least learned something or found it interesting or entertaining to read this blog post.

Best regards,
Jonathan