Get creative using SAP Business Technology Platform, Kyma Runtime! Part 3

will_conlon · ‎10-04-2022

Get creative using SAP Business Technology Platform, Kyma Runtime! Part 3

If you read Part 1 of this blog series, you'll have seen how I've built a simple frontend user interface giving the user the option to select a file and trigger an upload using the Flask Python package and store it in a container in an SAP BTP, Kyma runtime pod on SAP Business Technology Platform. In Part 2, I've upgraded the frontend interface by leveraging low-code/no-code solution SAP Build Apps.

In Part 3, I'll be sharing an example of possibilities that can now be unlocked which can deliver enormous business value by performing OCR (Optical Character Recognition) on the uploaded file and bringing the extracted information to the SAP Build Apps frontend. This involves extending the code written in Part 1 and Part 2 with the addition of a new python file to perform the OCR of the uploaded file. I won't be showing any detailed rundown of running this locally, but simply put, get the docker container running locally first to test before adding to SAP BTP, Kyma Runtime.

Overview

I've created a completely fictitious form and companies for the purposes of this example as seen in Figure 1. The motivation behind this use case is where organizations find themselves with huge amounts of unorganized documents that contain important information that, if extracted, can be used within an ERP context, with especially high-value uses in analytics, compliance and business processes.

Figure 1. OCR and annotation of mock form to extract specific data

The python code from app.py in Part 1 gets a slight uplift by importing some additional dependencies and triggering the OCR function when a POST method containing a .PDF file is uploaded.

import json

import docExtraction

import os

from flask import Flask, request

from werkzeug.utils import secure_filename



UPLOAD_FOLDER = 'uploadFolder/'

ALLOWED_EXTENSIONS = {'txt', 'pdf', 'png', 'jpg', 'jpeg', 'gif'}



app = Flask(__name__)

app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER



def allowed_file(filename):

    return '.' in filename and \

           filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS



@app.route('/upload/', methods=['GET', 'POST'])

def upload_file():

    if request.method == 'POST':

        # check if the post request has the file part

        print('File type is ' + str(request.files), flush=True)



        if 'file' not in request.files:

            print('No file part', flush=True)



            # message for appgyver alert

            return 'No file part'



        file = request.files['file']

        # If the user does not select a file, the browser submits an

        # empty file without a filename.

        if file.filename == '':

            print('No selected file', flush=True)



            # message for appgyver alert

            return 'No selected file'



        if file and allowed_file(file.filename):



            # create a secure filename

            filename = secure_filename(file.filename)



            # save file to /static/uploads

            filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename)

            file.save(filepath)

            print('PDF saved to directory: ' + str(UPLOAD_FOLDER), flush=True)



            # call OCR function in docExtraction.py

            rois = docExtraction.process(filepath)

            print('OCR complete and saved to directory: ' + str(UPLOAD_FOLDER), flush=True)



            # json response

            return json.dumps(rois, sort_keys=True, indent=4)



        # message for appgyver alert

        print('The file name was not allowed.', flush=True)

        return ('The file name was not allowed.')



    return '''

    <!doctype html>

    <title>Upload new File</title>

    <h1>Upload File (Test View) </h1>

    <form method=post enctype=multipart/form-data>

      <input type=file name=file>

      <input type=submit value=Upload>

    </form>

    <p>This is a test page to test a file upload without using frontend</p>

    '''



if __name__ == '__main__':

    app.run('0.0.0.0','5000')

Note that I've imported OCR algorithm written in python called docExtraction.py which contains the logic to process this document.

Optical Character Recognition

I'd like to note that there are lots of superb ways of performing OCR, and of particular note is SAP's Document Information Extraction which is part of the AI Business Services portfolio. The example that follows could be used where there is a specific reason to do so. Most common example I've seen is where there is a industry or document standard and different organizations/departments create different forms that adhere to that standard BUT DO NOT have the same document structure, ESPECIALLY where documents are manually scanned causing rotation and degradation of quality!

There are 3 main packages used in my program to run the solution on top of what I've built in previous parts of the blog series. These are:

OpenCV - A cross-platform open source highly optimized library with focus on real-time computer vision applications. It is free for commercial use and is released under the BSD 3-Clause License. I use the opencv-python wrapper package for OpenCV python bindings under MIT License. Note - I'd likely consider the headless option for production usage which will strip away dependencies that wouldn't be needed.

pdf2image - A python module under MIT License that (quite simply) converts a PDF to a PIL Image object so I can process it with OpenCV above. Literally turns something complex into a single line of code.

Tesseract - An open source text recognition (OCR) engine available under Apache 2.0 license. I use the pytesseract python wrapper to read the text from the image I extract from pdf2image above. Again, turning something complex into a single line of code, even through I then have to filter through the text later to find what I want.

So now I'll put these components together in a python file called docExtraction.py as follows:

import pdf2image

import cv2

import numpy

import datetime

import pytesseract



def process(filepath):



    # setup regions of interest (ROI, x , y, w, h, text extract (blank until getOCR)

    rois = [['CompanyName', 1450, 2380, 2250, 88, ''],

            ['ACNARBN', 1450, 2475, 2250, 88, ''],

            ['Address1', 1450, 2570, 2250, 88, ''],

            ['TownCity', 1450, 2750, 1250, 88, ''],

            ['State', 1450, 2848, 650, 84, ''],

            ['Postcode', 2815, 2848, 600, 84, ''],

            ['Country', 1450, 2942, 2250, 85, ''],

            ['Phone', 1450, 3046, 650, 85, ''],

            ['Email', 1450, 3155, 2250, 88, ''],

            ['BlockList', 1650, 1945, 2050, 88, ''],

            ['Date', 1650, 5120, 500, 90, '']]



    # declare var to hold images

    images = []



    # add every page in pdf as an image

    images.extend(list(map(lambda image: cv2.cvtColor(numpy.asarray(image), code=cv2.COLOR_RGB2BGR),

                           pdf2image.convert_from_path(filepath, dpi=500))))



    # if more than 1 page in pdf, then add loop e.g. for i in range(len(image)):

    # since my example is a single page I'll only look at page 1, i.e. images[0]

    images[0] = draw_border(images[0])



    # extract text in regions of interest and add to our ROIS.

    images[0], rois = getOCR(images[0], rois)



    values = []



    for r in range(len(rois)):

        values.append([rois[r][0], rois[r][5]])



    return values



def getOCR(image, rois):



    for i in range(len(rois)):



        # set local variables for  region of interest rectangle

        x, y, w, h = rois[i][1], rois[i][2], rois[i][3], rois[i][4]



        # create new local image with just region of interest

        image_roi = image[y:y+h, x:x+w]



        # convert colour region of interest to grayscale

        gray = cv2.cvtColor(image_roi, cv2.COLOR_BGR2GRAY)



        # get the text from region of interest

        rois[i][5] = pytesseract.image_to_string(gray)



        # draw regions of interest on original image

        cv2.rectangle(image, (x, y), (x + w, y + h), (241, 196, 15), 2)

        cv2.imwrite("uploadFolder/Output.png",image)



    return image, rois



def draw_border(image):

    hImg, wImg, _ = image.shape



    cv2.line(image, (0, 100), (int(wImg), 100), (34, 126, 230), 5)

    cv2.line(image, (int(wImg) - 100, 0), (int(wImg) - 100, int(hImg)), (34, 126, 230), 5)

    cv2.line(image, (int(wImg), int(hImg) - 100), (0, int(hImg) - 100), (34, 126, 230), 5)

    cv2.line(image, (100, int(hImg)), (100, 0), (34, 126, 230), 5)



    # datetime object containing current date and time

    now = datetime.datetime.now()

    # dd/mm/YY H:M:S

    dt_string = now.strftime("%d/%m/%Y %H:%M:%S")



    cv2.putText(image, "Processed - " + dt_string, (140, 80), cv2.FONT_HERSHEY_PLAIN, 4, (34, 126, 230), 4)



    return image

Once a file is uploaded, the process method is called with the path to the file (now in SAP BTP, Kyma runtime pod) I declare my regions of interest (ROI's). In this case, they are hard-coded, and this would be where you could exit into your own code to define the logic around the ROI's in your own document and how to determine them. NOTE - It can make sense for many use cases to run OCR first to find key values, horizontal lines, boxes, date formats etc. and use these elements to define where the ROI's should be!

Each page of the pdf is then added to an array of images, given a border & timestamp (via the draw_border function using OpenCV) and sent to my getOCR function so the the region of interest (ONLY) undergoes text recognition using the pytesserract image_to_string() function. The extracted text then gets populated to the ROI's list and relevant information gets returned to app.py and ultimately the response to the POST method in JSON format.

Running this locally via the upload URL triggering the basic HTML page (instead of AppGyver) as seen in Part 1 should give a JSON response similar to Figure 2. NOTE - If you want to configure this locally, don't forget to add the necessary environment variables!

Figure 2. Extracted OCR information from uploaded file returned in JSON format.

Containerize Application and Deploy

Now I can containerize this and run on SAP BTP, Kyma runtime. However, my container and going to need some additional commands and requirements added to facilitate these new OCR features. The following dockerfile works well, though I'm admittedly not an expert here so its likely the container could be thinner and more secure:

FROM ubuntu:18.04



ENV DEBIAN_FRONTEND=noninteractive



WORKDIR /program



RUN apt-get update \

  && apt-get -y install tesseract-ocr \

  && apt-get install -y python3 python3-distutils python3-pip \

  && cd /usr/local/bin \

  && ln -s /usr/bin/python3 python \

  && pip3 --no-cache-dir install --upgrade pip \

  && rm -rf /var/lib/apt/lists/*



RUN apt update \

  && apt-get install ffmpeg libsm6 libxext6 poppler-utils -y

RUN pip3 install pytesseract

RUN pip3 install opencv-python

RUN pip3 install pillow



COPY . .

RUN pip3 install -r requirements.txt



EXPOSE 5000



CMD ["python3", "./app.py"]

and the requirements.txt needs the additional components also:

Flask~=2.0.3

Werkzeug~=2.0.3

pdf2image~=1.16.0

opencv-python~=4.6.0.66

numpy~=1.19.5

pytesseract~=0.3.8

For deploying this to SAP BTP, Kyma runtime I follow the same steps I did in Part 1 but with some minor updates to the YAML file, mostly giving additional storage and memory now that we've got a heavier container with OpenCV, Tesseract and Pdf2Image.

---

apiVersion: apps/v1

kind: Deployment

metadata:

  name: documentprocessing

spec:

  selector:

    matchLabels:

      app: documentprocessing

  replicas: 1

  template:

    metadata:

      labels:

        app: documentprocessing

    spec:

      containers:

      - env:

        - name: PORT

          value: "5000"

        image: /documentprocessing  # replace  with your Docker Hub account name

        name: documentprocessing

        ports:

        - containerPort: 5000

        resources:

          limits:

            ephemeral-storage: 2048M

            memory: 2048M

          requests:

            cpu: 100m

            ephemeral-storage: 2048M

            memory: 2048M

---

apiVersion: v1

kind: Service

metadata:

  name: documentprocessing-service

  labels:

    app: documentprocessing

spec:

  ports:

  - name: http

    port: 5000

  selector:

    app: documentprocessing

---

apiVersion: gateway.kyma-project.io/v1alpha1

kind: APIRule

metadata:

  name: documentprocessing-api

  labels:

    app: documentprocessing

spec:

  gateway: kyma-gateway.kyma-system.svc.cluster.local

  rules:

  - accessStrategies:

    - handler: allow

    methods:

    - GET

    - POST

    path: /.*

  service:

    host: documentprocessing-subd-node..kyma.ondemand.com # replace  with the values of your account

    name: documentprocessing-service

    port: 5000

This provides a deployment, a service and an API rule for the application. If you're looking to deploy something similar, remember to add your own host and cluster details to avoid issues. This can be tested in same way as shown in Part 1 using the base URL (without SAP Build Apps) from the Flask return code (HTML).

Updates to the SAP Build Apps frontend

So now that I've got the backend working, I'd like to improve upon the frontend. The upload functionality as shown in Part 2 allows me to upload the document, but there will be far more business value to be unlocked by bringing the OCR data extracts back to the frontend application. The process is quite straight forward - when a document is uploaded, it will trigger the docExtraction.py and populate the region of interest list (ROIS), then return as JSON. This JSON is parsed and populated into text fields in the SAP Build Apps application.

The low-code JavaScript flow object from Part 2 needs to be extended to utilize response.json() returning a promise which resolves with the result of parsing the body text as JSON and since I know the expected JSON structure, I can easily return each element from my custom JavaScript to an SAP Build Apps page variable (called 'List') as seen below in Figure 3.

Figure 3. SAP Build Apps flow for upload and page variable structure

NOTE - Everything below the component flow in Figure 3 is from the variable view and displayed in this way to simply demonstrate the variable structure.

The JS Upload File code is updated as follows:

//  Goal is take the output of the 'Pick Files' flow function and submit to Flask route using multipart/form-data encoding and populate Page Variable with output as parsed JSON response.



//  Declare 2 inputs.

//  - First is the endpoint URL where we want to upload our file. We've hard coded this value.

//  - Second is the file with it's 6 object properties from the output of 'Pick Files'

let { url, file } = inputs



//  Get the path of the file selected in 'Pick Files'. Note - Only allowed single file upload so hard-coded to first i.e. file[0]

let path = await fetch(file[0].path)

console.log('Path:' + path)



//  Transform path into blob

let blob = await path.blob()



//  Declare the form we'll submit with the payload

const formData = new FormData()



//  Append upload details into the formData

formData.append('file', blob, file[0].name)



try {



  // POST the formData and parse the text/html (utf-8) response into JSON format

  const response = await fetch(url, { method: 'POST', body: formData }) 

  const parsed = await response.json();



  return [0, { 

    CompanyName: parsed[0][1], 

    ACNARBN: parsed[1][1],

    Address1: parsed[2][1],

    TownCity: parsed[3][1],

    State: parsed[4][1],

    Postcode: parsed[5][1],

    Country: parsed[6][1],

    Phone: parsed[7][1],

    Email: parsed[8][1],

    BlockList: parsed[9][1],

    Date: parsed[10][1]

    } ]



} catch  {



  return [0, { result: "File Error" } ]

}

The last step here is binding the output of the JS Upload File flow to the page variable. As seen in an earlier step, the page variable 'List' is an object with text components for each member of the dataset and needs to match the output of JS Upload File as seen in Figure 4, ensuring that the page variable is populated as "Object with properties". Once the page variable is populated, the data extracts can be used as Content to fill text components.

Figure 4. Binding configuration between JS Upload File and Page Variable

Testing upload, data extraction and display in frontend

Now I can try upload a file and get a response from the backend to populate my SAP Build Apps table in the frontend. I've used webapp front-end for SAP Build Apps via Android device for the example seen in Figure 5.

Figure 5. End-to-end test of file upload with OCR and response in frontend

This gives me quite a lot of flexibility to solve real world business problems by using OCR on files to extract key value pairs. Note that this is an extremely simple example which can now be built upon using OpenCV and Tesseract in Python.

I'd certainly recommend looking into SAP's Document Information Extraction solution in the first instance for use cases such as these since it has a very (very!) fast time to value for many requirements.

With data now flowing to and from SAP Build Apps and SAP BTP, Kyma runtime, a next step could be sending the PDF document & annotated OCR image showing ROI's to persistent storage and posting the extracted data to a database for downstream processing or analytics.

If you have found this has been helpful, or have some feedback to share on this topic, please leave a comment and follow my profile for future posts.

For more information please see:

SAP Business Technology Platform Topic Page
Ask questions about SAP Business Technology Platform
Read other SAP Business Technology Platform articles and follow blog posts

There are other links embedded in the Blog which may be of use.

Part 3 - Perform OCR on a .PDF using microservice hosted on SAP BTP, Kyma Runtime

Get creative using SAP Business Technology Platform, Kyma Runtime! Part 3

Overview

Optical Character Recognition

Containerize Application and Deploy

Updates to the SAP Build Apps frontend

Testing upload, data extraction and display in frontend

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win