AI on mobile: Powering your field force - Part 1 o...

Vriddhi · ‎05-20-2022

ML on the web browser with PWA, Kubernetes & SAP HANA

When machine learning leaves the cloud mothership to where the action really is, you need many vehicles of technology to make the journey possible. Where does SAP figure in this grand adventure of implementing ML on the web browser, read on to find out. This is a collaboration blog series brought to you by Aditi and Anoop (backend developers), Naman & Deepak (frontend developers) and Sumin & yours truly (data scientists).

The advice on how fast a response you need for good user experience, has been the same for about 3 decades now. A system response of 100 milli seconds is considered the limit for having the user perceive that the system is reacting instantaneously. In the context of real time inference for machine learning, the system should be able to retrieve data, perform inference, respond with results in that time. For e.g., a classifier inferencing on structured data on Cloud, could possibly respond in 40 milli seconds. A classifier inferencing on images on Cloud, again with CPU, could possibly respond in 120 milli seconds. These are ballpark references. Depending on your model & deployment environment, you can see better or worse prediction latency. Latency could improve if you can use GPUs, but with CPUs you can see we are toeing the line with user experience.

Nonetheless, if the use case demands lower latency than the cloud affords, you can put the model closer to where the data is, which is on the device that receives the data first.

Where is our model running?

Our model runs on the device, but to understand that a tad better we need to clarify the difference between Edge AI and Endpoint AI. Edge AI involves computing using almost data center level hardware in the gateway between endpoint (the device) and cloud (the data center). This is hardware whose performance (in terms of compute & power consumption) is higher than the endpoint but lower than the cloud. So, when we talk of edge computing, we essentially talk of hardware – installed locally on say a few servers and a little storage. At the Endpoint, on the other hand, we are down to the basics with what the device itself offers. In this solution, we are putting our model into the web browser, so it uses the compute resources of the device that accesses the web browser. Ergo, our scenario of deploying the model on the web browser is Endpoint AI. Given a browser on mobile is low on compute, we “compress” our ML model to a size that carries the information necessary to provide the response without compromising on quality of the response.

Figure 1: The deployment options

How does ML model run on the browser?

To place the model physically on the browser we need to convert it to a format the browser can understand. Enter Web Assembly. Simply put Web Assembly is a “compilation target”. Essentially, developers can bring their own code - say C++ or Assembly Code - and Web Assembly compiles it to ByteCode (a sort of universal binary) to execute on the browser at high speed. This means you don’t need proprietary runtimes to run workloads - ML or otherwise. You can run it on the resources of the end user’s web browser. As much of the euphoria around it proves, Web Assembly is as industry changing as Docker was back in 2013. Interest in the technology has risen 43% on stack overflow since the start of 2021. Whether on e-commerce websites (like Shopify) or streaming services (like Amazon Prime, Disney +), web assembly has likely touched you at some point.

How does the user consume this on the front end?

Front end apps have typically been web apps or mobile apps, but progressive web apps (PWAs) have crept into the scene in recent years. PWAs are the best of both worlds between web apps and mobile apps. While mobile apps are built specifically for a mobile OS, web apps are built to run on any browser. So, a PWA looks and feels like a mobile app (in that you can install the app on your Home Screen and receive push notifications) but it functionally is like a web app. PWAs are easier and faster to develop, easier to deploy & maintain, but most importantly for this scenario, PWAs work well on the web browser - both online and offline. So, our ML model is accessible even when the internet is down.

What is the business case?

Conceptually this one is simple. Our user walks up to the meter device (something that records electricity or water consumption). The user takes an image of the meter. Subsequently, the actual number reading is extracted from the image. A little bit of math later, we have the bill amount that the user should be charged which is sent to the software that processes utilities information.

Where does SAP feature amidst all this?

This picture describes simply how the elements already described weave into the solution architecture. Details on how each individual component works and implementation details follow in the subsequent parts of this blog series.

Figure 2: Solution Architecture

What to expect in the upcoming parts of this blog?

This blog set the context of the use case. In the subsequent blogs we will describe how we built this out, so you have the information you need to customise this for your own business needs.

In Part 2, anoop.mittal describes the solution backend, particularly how the service was created with CAP & Spring boot

In Part 3, lsm1401 describes the intuition and steps behind creation of the ML model that will be referenced by the backend at run time

In Part 4, aditi.arora16 describes how she created the web assembly binary and how she loaded the model in browser for inference

In Part 5, namagupta describes how he built the front end that stitches all working parts together, particularly how the PWA app was built with Angular

If you have any questions or would like to discuss this in more detail, please write in to any one of us.

References:

Miller, R. B. (1968). Response time in man-computer conversational transactions. AFIPS Fall Joint Computer ConferenceVol. 33, 267-277.

Progressive Web Apps: https://web.dev/progressive-web-apps/

Web Assembly: https://webassembly.org/

Web Assembly on wiki: https://en.wikipedia.org/wiki/WebAssembly

Cloud AI, Edge AI, Endpoint AI – What’s the difference: https://www.arm.com/blogs/blueprint/cloud-edge-endpoint-ai

Credits:

aditi.arora16 for reviewing this blog & providing valuable inputs.

gunteralbrecht for his omniscience with all things tech.

namagupta, Deepak Chethan, Roopa Prabhu, Anoop Mittal for stitching the tech together.