Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
KarimM
Advisor
Advisor
Metaflow helps data-scientists and developers to create scalable Machine Learning services and bring them to production faster. For a good overview on how to build and productize ML services with Metaflow see: https://docs.metaflow.org/introduction/why-metaflow.

The Metaflow Python library for SAP AI Core extends Metaflow's capabilities to run ML training pipelines as Argo Workflows.

In this blog you will learn how to leverage the Metaflow-Argo plugin for generating training templates for SAP AI Core:

  • You can experiment with local data right in your python environment and adjust your ML code iteratively.

  • Once ready for running the training pipeline on Kubernetes, generate the Argo template directly from your ML code.

  • Even the iterations for fine tuning the pipeline for SAP AI Core are supported by the Metaflow-Argo plugin, which makes productization easier and more fun!


Contents



  1. Prerequisites

  2. Install the Metaflow-Argo plugin

  3. Define the Training pipeline

  4. Run the Training pipeline in SAP AI Core


Prerequisites


You should be familiar with the following components of SAP AI Core:

  1. SAP Business Technology Platform (SAP BTP) account

  2. Docker

  3. Kubernetes (K8s)

  4. Argo Workflows

  5. AI-API


You have provisioned SAP AI Core for your SAP BTP account and configured the:

  1. Object Storage

  2. Docker registry

  3. Git repo


Install and configure Metaflow



  1. Install the sap-ai-core-metaflow package from the PyPi repository:
    pip install sap-ai-core-metaflow


  2. Configure Metaflow
    For running the same code locally and on the K8s cluster, Metaflow stores code packages into the S3 bucket.
    Create the following config file on your computer: ~/.metaflowconfig/config.json and fill in the bucket name registered for your SAP AI Core account:
    {
    "METAFLOW_DATASTORE_SYSROOT_S3": "s3://<my-bucket>/metaflow",
    "METAFLOW_DATATOOLS_SYSROOT_S3": "s3://<my-bucket>/metaflow/data",
    "METAFLOW_DEFAULT_DATASTORE": "s3"
    }


  3. Configure credentials for the S3 bucket
    Create the file ~/.aws/credentials and add the content below:
    [default]
    aws_access_key_id = <key>
    aws_secret_access_key = <secret>



Define the Training pipeline


Here is an example for a multi-step pipeline, where you can add your own training code to the individual steps (see @Step decorators):

  • Each step will be run

    • locally in a cpu thread or

    • on K8s in a separate pod.



  • You can assign a separate docker image for each step (see @kubernetes decorator).


from metaflow import FlowSpec, step, kubernetes

class TrainFlow(FlowSpec):

# @kubernetes(image="docker.io/<docker-repo>/<metaflow-data:tag>")
@step
def start(self):
print('Load data / Preprocessing')
self.next(self.train_model)

# @kubernetes(image="docker.io/<docker-repo>/<metaflow-train:tag>")
@step
def train_model(self):
print('Train a model')
self.next(self.end)

# @kubernetes(image="docker.io/<docker-repo>/<metaflow-evaluate:tag>")
@step
def end(self):
print('Evaluation / Metrics')

if __name__ == '__main__':
TrainFlow()

Save this metaflow script into a file with the name "trainflow.py".

Develop and test your flow locally, before running it on SAP AI Core's K8s cluster:
python trainflow.py run

Learn more about the Metaflow features in these great tutorials.

Run the Training pipeline in SAP AI Core


The engine for running training pipelines in SAP AI Core is Argo Workflows.

The Metaflow-Argo plugin relieves the user from mastering the Argo workflow syntax to write such a workflow in yaml / json. The next sections describe how the workflow template can be generated from the above python ML code.

Docker Images


The docker images for each step of the training pipeline have to be built and pushed to the docker registry prior to starting the workflow. However, Metaflow makes the life of the data-scientist a lot easier:

  • When creating the Argo workflow, behind the scenes metaflow copies your metaflow script (trainflow.py) to S3.

  • When the Argo Workflow is started in K8s, a piece of code added to the template copies your  metaflow script from S3 to the docker container. Thus it runs the same ML code as your local version!


Here is an example Dockerfile, which installs the Metaflow-Argo plugin and awscli for copying the code packages that allow your metaflow script to run in the container:
# Generates a docker image to run Metaflow flows in SAP AI Core

FROM python:3.8-slim

# Install metaflow library (updates your latest metaflow script during runtime)
RUN pip install sap-ai-core-metaflow awscli && \
pip install <additional libraries>

# SAP AI Core executes containers only in root-less mode!
RUN mkdir -m 777 /home/user

WORKDIR /home/user
ENV HOME=/home/user

In the "pip install" section you can add additional libraries required for your ML code.

Build the docker images using above Dockerfile and push to your docker registry:
docker build -t <docker-registry>/<metaflow-xyz:tag> -f Dockerfile .

docker push <docker-registry>/<metaflow-xyz:tag>

 

Generate the Training Template


The training template can be generated using command-line options. To list all options which are supported by the Argo plugin, use the --help option:
python trainflow.py argo --help

Argo Workflows accepts both YAML and JSON templates. (The plugin always creates JSON.)
With the option --only-json the resulting template is printed out and can be saved to a file.

When finished with local runs, uncomment the @kubernetes decorators for each step. The full command including all parameters required for generating the template is:
python trainflow.py --with=kubernetes:secrets=default-object-store-secret argo create --image-pull-secret=<AI Core docker secret> --label={"scenarios.ai.sap.com/id":"<Ml-Scenario>","ai.sap.com/version":"1.0.0"} --annotation={"scenarios.ai.sap.com/name":"<ML-Scenario-Name>","executables.ai.sap.com/name":"trainflow"} --only-json > trainflow.json

The SAP AI Core parameters are explained below:

Labels and Annotations


An SAP AI Core training template is a standard Argo workflow template + a set of labels and annotations processed by SAP AI Core.

These labels / annotations are mandatory:



















labels scenarios.ai.sap.com/id
ai.sap.com/version
annotations scenarios.ai.sap.com/name
executables.ai.sap.com/name

Note: the class name that you use in the ML code (e.g. TrainFlow") is used as the executable ID after being converted to lowercase (see template tag: metadata→name). Use exactly the same name also for the annotation executables.ai.sap.com/name.

SAP AI Core Secrets


Pulling the docker images and accessing the object-storage (S3) requires credentials. These are referenced as secrets in the SAP AI Core template:

  • image-pull-secret

  • default-object-store-secret


Preparing for Production


After the experimentation phase, when the ML code for training the model is finalized, it is not necessary to download the ML code into the running container. For production purposes the code can be included in the docker image with the following procedure:

  1. Generate the Training Template and search in the json file for the string "cp s3://". Here you can find the address of the "code package" in the S3 bucket.

  2. Download the "code package" from S3 using aws cli to your local computer and name the file "job.tar".

  3. Add the metaflow package to the docker image using this dockerfile:


# Dockerfile for embedding ML code into Production docker image for SAP AI Core

FROM python:3.8-slim

RUN mkdir -m 777 /user/home
RUN mkdir -p /user/home/.metaflow

RUN chmod -R 777 /user/home
RUN chown -R 65534:65534 /user/home

COPY job.tar /user/home/job.tar
RUN cd /user/home && tar xf job.tar

WORKDIR /user/home
ENV HOME=/user/home

Finally the --embedded option instructs metaflow to use the ML code inside the docker rather than downloading from S3:
python trainflow.py ... argo create ... --embedded --only-json > trainflow.json

 

Execute the Workflow in SAP AI Core


Push the resulting training template trainflow.json into the registered git repository. The template can then be executed via AI-API.

For SAP AI Core tutorials check out SAP Tutorials for Developers.

Summary


This blog post demonstrates how a Data Scientist can easily run the same ML code both locally and on a K8s cluster. This allows to keep the effort low when moving the code from experimentation to production on SAP AI Core. Debugging in a production environment is impaired by many restrictions, therefor it is important to iron out possible errors before pushing the code to production.

I want to thank elham and Roman Kindruk (co-developers of the Metaflow-Argo plugin) for their support in writing this post.

Please add your thoughts on this post in the Comments section below.
1 Comment