Skip to Content
Product Information
Author's profile photo sandeep Pantangi

Document Information Extraction Activities in SAP IRPA

Introduction:

Today we will try to understand the process of reading data from scanned or digital documents using the below activities provided by SAP as part of irpa_pdf sdk library 1.15.83.

  1. Extract Data Without Template : Input for this activity is Document Type, Document Path of PDF, Output is extracted data using standard schema of that particular document type.
  2. Extract Data With Template: Input for this activity is Document Template, Document Path, Output is extracted data using either  standard schema or custom schema.

 

Prerequisites to understand before using above activities:

  1. currently SAP only supports 3 Document types: Invoice, Purchase Order, Payment Advice.document%20types
  2. For each document type sap has provided schemas which can not be editable  eg: For document type Invoice, schema is  SAP_invoice_schema.(Schema is the list of fields (header, Item) used to identify the required information from corresponding document like invoice number, Total, subtotal, Tax ..)Schemas
  3. By copying the standard schema we can add or delete the  required fields from the schema and activate it..Customschema

 

Steps to design automation with Extract Data with Template:

We have to use this approach when the template complexity is high ,AI & ML models not able to determine the fields from the schema, By using the annotations functionality while creating the template we are training our invoices(we can upload max 5 sample invoices for annotating) ,hence next time same vendor invoice comes it will able to extract the data using this templates making the accuracy to 100 percent.

  1. How to create Template?
  •        After creating automation project just select the artifact create template

  •  Provide the meaning full Name , description of template , any document type as per your     requirement, select the schema either standard or custom here i am using standard template   and provide the document path and click on create
  • After this open the document in Document Information Extraction editor for annotation, like invoice number, PO number, total, subtotal. Next save and activate the Template for consuming this template in automation.
  • In automation pass the template name as vendor1 and path of the invoice with different data of same vendor.
  • Now the bot is able to understand this template and able to retrieve the required data, same has been printed in console.

 

Conclusion: For invoices which we are not able to get required field information using the activity Extract Data Without Template we have to use the activity Extract Data with Template using above steps.

Thanks for reading and please provide your comments and questions.

For More Info: SAP Help

Assigned Tags

      5 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Satinder Kaur
      Satinder Kaur

      Hi Sandeep,

       

      After upload is complete when opening in new tab getting below error, assigned IRPAProjectDelegate, IRPAProjectMember, IRPAOfficer, IRPAAgentUser, IRPAParticipant any other role or permission required?

       

      ERROR

      "Please contact your administrator to provide access to the application by assigning relevent roles.

      Note: If you already have permission, log out and log in once again."

      Thanks ,

      Satinder Kaur

      Author's profile photo sandeep Pantangi
      sandeep Pantangi
      Blog Post Author

      Hello Satinder,

       

      In irpa tenant, go to security tab and select role collection, there assign following roles, after that make logout and login again.

      if productive version

      Document_Information_Extraction_UI_Templates_Admin

      Document_Information_Extraction_UI_Admin_User

      else trial

      Document_Information_Extraction_UI_Admin_User_trial

      Document_Information_Extraction_UI_Templates_Admin_trial

       

       

      Regards,

      Sandeep.

      Author's profile photo Marcus Schiffer
      Marcus Schiffer

      Hi,

       

      we created a flow similar to that in the blog with a custom template (just 3 fields in the header for testing). The document extraction service is activated in the same tenant as the RPA .

      The automation containing the single step "Extract data with template"   however fails with "Cannot read property 'message' of undefined"

      What might be the reason ? Is there anything left to configure / customize ?

       

      Any help is highly appreciated.

       

      Regards

      Marcus

      Author's profile photo sandeep Pantangi
      sandeep Pantangi
      Blog Post Author

      Marcus Schiffer

       

      Can you please try same with changing the invoice template? and share the snap of template which currently you are using?

      Author's profile photo Marcus Schiffer
      Marcus Schiffer

      Hi Sandeep,

       

      thanks for the fast response !

      I have also tried with just one step of "Extract Data (Pretrained)" and used a standard pdf with an invoice. That also gives an error "Could not upload document for information extraction: 500 "undefined"".

       

      Is there anything I might miss in the setup ? A connection to the document extraction service ? Or a wrong agent version (I am using 2.0.18.63).

       

      The document exists and is accessible when using the pdf ocr activity.

       

      I also tried to create a binding from the document extraction service to the RPA App which is subscribed to the tenant but BTP tells me " No application found".

      Regards

      Marcus