Skip to Content
Technical Articles
Author's profile photo Yuliya Reich

SAP Data Intelligence in collaboration with Document Information Extraction Service


Introduction

Document Extraction Information is a BTP service, and helps you to process large amounts of business documents.

The purpose of this blog post is to demonstrate how you can combine it with SAP Data Intelligence. The use case is simple: we want to upload an invoice document into Document Extraction Services using Data Intelligence, and submit it for processing (to extract a document number and a currency code as an example). The list of line items and headers that can be extracted you will find here.

Preparation

To be able to execute next steps you should have running instance for both services. In this tutorial “Set up Account for Document Information Extraction and Go to Application” you will find the steps for activating a trial BTP account with the service. So, I’m not going to repeat it here.

Sample invoices that I’m going to use in this post you can download from the following tutorial page.

Workflow

Upload a document in Document Information Extraction directly.

I uploaded one invoice document by clicking on the “+” button and extracted some fields:

Upload a document

Select%20a%20document

Select a document

Select%20header%20fields%20for%20extraction

Select header fields for extraction

Select%20line%20item%20columns%20for%20extraction

Select line item columns for extraction

Overview

Overview

Result

Result

Document%20Information%20Extraction

The document is ready

Upload a document in Document Information Extraction Service with Data Intelligence

Create a connection

To connect these two services I’m going to use APIs provided by SAP API Business Hub.

Firstly, I created an OPENAPI connection in DI:

OPENAPI connection in Data Intelligence

Client credentials can be founded in a key file on BTP:

Credentials%20for%20connection%2C%20BTP

Credentials for connection, BTP

Create a pipeline

Upload a document in DI.

For this tutorial I uploaded a pdf file in “Files” in DI.

Upload%20a%20document%20into%20DI

Upload a document into DI

 

Create a Custom Python Operator

In this post I’m using Gen1 operators. I’m going to extract a document number and currency code from the document.

Python code for the operator:

import requests
import json
import pandas as pd

restConn = api.config.connection['connectionProperties']
base_url = "https://" + restConn['host']
token_url = restConn['oauth2TokenEndpoint'] + '/oauth/token?grant_type=client_credentials'
url = base_url + '/document-information-extraction/v1' + '/document/jobs'

headers = {}
var= {}
body = {}

# get token
api.send("debug", "--- get token ---")
r = requests.get(token_url, auth=(restConn['oauth2ClientId'], restConn['oauth2ClientSecret']))
api.send("debug", str(r.status_code))
var = r.json()
token = var['access_token']
api.send("debug", 'Token:  ' + str(token))

# get definitions of document endpoint
body['client_id'] = restConn['oauth2ClientId']
body['client_secret'] = restConn['oauth2ClientSecret']
body['type'] = 'client_credentials'

headers['Authorization'] = 'Bearer ' + token
headers['accept'] = 'application/json'
payload = {"payload": json.dumps(body)}
r = requests.get(url, data = payload, headers=headers)

api.send("debug", "--- GET ---")
api.send("debug", str(r.status_code))
api.send("debug", str(r.text))
 
# post document
options = {
    "extraction": {"headerFields": ["documentNumber", "currencyCode"]},
    "clientId": "default",
    "documentType": "invoice"
}
payload = {"options": json.dumps(options)}
file = {'file':('sample-invoice-2.pdf', open('/vrep/sample-invoice-2.pdf', 'rb'), "application/pdf")}
r = requests.post(url, headers = headers, data = payload, files = file)

api.send("debug", '--- POST --- ')
api.send("debug", str(r.status_code))
api.send("debug", str(r.text))

 

Graph

Graph

 

Output%20in%20Terminal

Output in Terminal

 

Let’s check the Document Information Extraction. We should see one more document there:

Document%20Information%20Extraction

Document Information Extraction

Conclusion

You see, how simple we can automate document uploading into Document Information Extraction  using another SAP BTP Service – Data Intelligence. Another use case could be document details extraction with DI into a database.

Please, be aware that this post is just my personal idea, how the collaboration of these services can be implemented.

Helpful links

SAP Discovery Center BTP Services

SAP Discovery Center Data Intelligence

SAP Data Intelligence Community

SAP Document Information Extraction Roadmap

 

Assigned Tags

      5 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Tim Huse
      Tim Huse

      Hey Yuliya,

      thank you, once again a great blog post from you 🙂

      I haven't worked with Document Information Extraction Service yet, but this is an interesting showcase.

       

      Best wishes

      Tim

       

       

       

      Author's profile photo Yuliya Reich
      Yuliya Reich
      Blog Post Author

      Hi Tim,

      you can test this service with SAP BTP Trial account 😉

      Regards,

      Yuliya

      Author's profile photo Paul PINARD
      Paul PINARD

      There's also this dedicated community page including a lot of learning materials on how to use Document Information Extraction.

      Cheers

      Paul

      Author's profile photo Yuliya Reich
      Yuliya Reich
      Blog Post Author

      Thanks, Paul 🙂

      Author's profile photo Manasij Biswas
      Manasij Biswas

      Thank you, once again a great blog post from you .

      Can you please post the configurations of the custom operator.

      can you please provide some input regarding "Another use case could be document details extraction with DI into a database."