Extract Purchase Order details using ML & OCR capabilities of Document Information Extraction service of SAP Business Technology Platform
In this blog post I would like to share my experience of using the Document Information Extraction service. This service is one of the AI Business Service available on SAP Business technology Platform. It enables extraction of relevant data from business documents like Invoices, Purchase Orders, Payment Advice etc. It leverages pre-trained Machine Learning models to identify the relevant fields in the header and line items in a business document. This service has been around for a while and I will point you to some existing resources where you can learn more about it and how to get started.
To get started, I would recommend these community blog posts from my colleague Joni Liu
When developing you automation scenarios, you would be dealing with the REST API interface to interact with this AI Business service. The UI Application within the Document Information Extraction service is provided to make it easy to interact – test your business documents for accuracy and manage schemas.
There are also few tutorials which are published for your to get started. The second tutorial also covers how you can enrich the data which is being extracted by this service
Use Machine Learning to Process Business Documents
Use Machine Learning to Extract Information from Business Documents and Enrich Data
The focus of this blog post will show you how to deal with business documents which have custom fields that you would need to accommodate. As of last week, there was a new feature released to create templates based on sample document files and on schemas. End users can select these templates to extract information from similar business documents.
Document Information Extraction service offers couple of schemas by default. As you can see below, there are schemas for Purchase Order (PO), Invoices etc. Schemas represents the data fields for each of the document types. In order to add a custom field, I have made a copy of the standard Purchase Order (PO) schema. I have removed some of the standard data fields in the header as its not required for my scenario.
In my sample PO document, I have a column in the line items for the brand. Hence, I have added a field called brand of data type string. I have saved and activated this schema.
The next step is to create a template based on the sample PO document and the newly created schema.
Once the system processes the template, you can edit it by selecting it
Click on the Annotate button to being annotating each of the data fields which are required
Hover you mouse over each of the value fields and select the corresponding header/line item field as shown below
For this example, I have annotated the header and 2 line items. Save and activate the template.
Now we are ready to test this template with PO documents of the same format. In the document menu, use the “+” icon to add a document for processing. Notice, that you have an option to auto detect the template based on the document which is provided. This is a cool feature especially when you have many templates and the service is smart enough to pick the right template based on the document which has been provided. I have selected the custom schema and uploaded my PO document.
Once this is processed, I can preview the result. Clicking on the “Extraction Results” button opens up the header and line items fields on the right hand side.
Against each data field, you can get to see the confidence range. In the below example, everything is in green highlighting the fact that the service was able to accurately predict those fields.
I hope this helped you understand how you can use the new template feature to extract data from similar business documents and also support custom data fields.
There are several out-of-the-box contents which you could use to quickly get started with to automate and extract data from business documents. Feel free to explore these contents too
- Using Intelligent RPA with Document Information Extraction service
- Using Cloud Integration with Document Information Extraction service