Document Extraction with SAP Intelligent RPA – Using Pre-Trained AI Model
SAP Intelligent Robotic Process Automation provides convenient and smart solutions to process large amounts of business documents that have content in headers and tables. We will use an example use-case, modelled after a business case to extract information from such business documents. After you provide the document and specify its type, it returns the extraction results from Header Fields and Line Items.
This is the second blog in the “Document Extraction with SAP Intelligent RPA” series. The goal of the series is to empower the community with step by step guide showcasing the Document Extraction capabilities within SAP Intelligent RPA. Its predecessor, regarding “Text Operations to Ease Data Capture” can be found here. Further information on integration touchpoints can be viewed in this blog.
- SAP Intelligent Robotic Process Automation platform (Trial / Full-Version)
- Installation(s) and configuration could be found in Help Portal
- Basic knowledge about Project(s), Automation(s). Tutorials can be found under: Tutorials
- Create a project in the Cloud Studio
- Add the following dependencies in the respective project:
- Document Information Extraction SDK
- PDF SDK
Please Note: Core SDK and Excel SDK will be automatically added to the project when an automation is created
As an organization, there are numerous invoices, purchase orders and payment advices from multiple vendors which go through different departments. For example, travel expense documents or equipment invoices from vendors.
Let us take an example of an invoice from an IT vendor who has provided the equipment for a new hire in the organization. Often, an organization will have multiple vendors for procuring this equipment and the document would contain information like Invoice Number, Sender Name, Item description, Net Amount etc. You can use the extracted information, for example, to automatically process payables, invoices, or payment notes while making sure that invoices and payables match.
Although, different vendor invoices could be structurally similar but manual data entry for an organization would mean a lot of man-hours put in data entry. Automating data extraction from business documents could be challenging too incase of a new vendor or discontinuation of an existing vendor.
To simplify this use-case we will be using the new “Extract Data (Pre-trained model)” activity along with some pre-existing activities.
Proposed Sequence of Execution
- Create an Automation
- Drag and drop the Extract Data (Pre-trained Model) activity. This activity accepts a machine readable or scanned document in PDF or Image format(s).
This activity requires two inputs viz. the type of document (Invoice, Payment Advice or Purchase Order) and the path to the document for extraction.Note: There are two additional non-mandatory fields which are beyond the scope of this use-case. They would be covered in future blogs in this series.
- We can now add a Log Message activity to view the output of the activity. The output of the activity is “extractedData” which contains various Header Fields and Line Item Fields.
- A custom message can be logged by clicking on the icon marked in red(see screenshot below).
Under “Variables” we are able to see the output “extractedData” which contains various Header Fields and Line Item Fields. These fields are dependent upon the type of document selected in the previous step.
- We can put the following message in the “Log Message” activity to view the extracted result.
"Invoice Number: " + Step1.extractedData.headerFields.documentNumber.value
- Test the automation to view the extraction result in the Test Console.
This result can be verified against the information provided in the sample document above.
After going through this blog post, you would have become acquainted with the new Extract Data (Pre-trained Model) activity and its usage. In addition, you would have an appreciation for the convenience of the activity with regards to information extraction from frequent business documents.
In the Proposed Sequence of Execution section we were able to log the invoice number from the invoice. Similarly, we could extract the other fields such as: Sender Name, Item description, Net Amount etc. These fields could then be transferred to a data source like MS Excel and stored in a shared location, like One Drive. Or we could use these extracted fields to process the invoice by extending the automation.
Thanks for reading and feel free to leave a comment with questions or feedback 🙂
Find more information on SAP Intelligent RPA:
Explore: Product Information