SAP Intelligent RPA – PDF Extraction and posting to SAP ECC system using BAPI Bot via API Trigger
Co-Authored with Naresh R
Business Requirement/Use case
To extract recognized entity information from the pdf documents and post it to ERP system. The ERP system can be either SAP S/4 HANA or SAP ECC. The bot which is responsible to post extracted information into ERP system will be triggered via API.
There are many pdf libraries that would help to extract the content from the documents. This blog specifically covers the efficient way to extract the information from the specific PDF format. The reason to make specific format is to achieve betterment in quality of the extracted information.
- You must have access to IRPA Application, Incase if you don’t have any access to existing system. Please create a trial tenant of SAP IRPA by following this link
- You must have the service key details for SAP IRPA service.
If you are creating a trial tenant, you can get the service key details while configuring IRPA in trial tenant.
If you are using an existing system, you can get those details from admin.
Steps in Workflow Implementation
- Start the bot.
- Read required cloud variables which are configured in the system.
- Read all the outlook mails that has attachments with specific subject.
For the demo, we have read all mails with specific subject, but this can be extended by including certain periods and query parameters to querying the email.
- Save all pdf documents from the filtered mails to the local path
- Loop through all the pdf documents and extract the information from pdf documents. This complete step can be replaced with your own custom code like implementing SAP DOCX or any vendor provided OCR extraction method. For Demo, we have implemented the extraction mechanism in both ways to capture possible fields.
Extraction via text processing
Since scope of the bot is to work on specific formatted pdf, We have used inbuilt method to extract the text from the specific area. This method is suits well for extracting the header information, if the application has parent-detail structure/relationship.
For the demo, we have used only one format , incase if you have more than one format, you can configure respective co-ordinates as cloud variable and span it across multiple formats.
Below is the sample code snippet for text extraction with specific co-ordinates
Extraction via Custom OCR
Though we have got the required information via text processing, There could be a scenario where content in the pdf is NOT machine readable text , it could be an image. Henceforth, we have implemented OCR Processing as well.
The output of PDF via OCR would be of text in array of strings format. Combine the array strings into single text format and apply the possible custom regex to extract the required information.
Note: If it is a multi page pdf, it would be captured in 2D array format. So that it can be processed at once.
- Collate processed data in the above steps, by giving high priority to data that are obtained via text processing. If the data is not available via text processing, then verify whether that field is extracted via OCR. So that it can be included.
- After collating the data from both formats, final version of extracted object is pushed to an array. Transfer the control back to the Loop and repeat steps for 5 and 6 for all PDF documents.
- At this point, all PDF documents are processed. The final output created in format of array of object. Each object points to one record, which is in turn one pdf document.
- As the failsafe step, separate out the records with incomplete information to an array and notify the stakeholders so that they are aware that few records are not processed.
- The records with complete Information can be processed, saved to a file and posted into ERP System. We have triggered the Dynamic BAPI bot, via API. You can find this bot here.
Call SAP Business Application Programming Interface using SAP Intelligent RPA | SAP Blogs
API Triggers & Notifiers in SAP Intelligent RPA | SAP Blogs
API Triggers & Notifiers
Execute a Trigger of Type API
- End the bot.
Important points to be taken care while calling API Trigger
- Set the proper invocation context, Input and Output variables when making call to API Bot. For this implementation, we did not have any input parameters, since we gave input in format of excel.
- Get three parameters from service key , client id, client secret and URL (plain URL and NOT any of these sburl or apiurl)
- With above stated parameters, get the access token using oauth 2.0 access token, which in turn needs to be sent in header as Authorization parameter. Follow below steps to get access token.
- Append the URL with /oauth/token so that URL parameter can look like https://XXXX.authentication.eu10.hana.ondemand.com/oauth/token
- Go to Postman Application, Incase if you don’t find it in our system , please install it.
- Under Authorization Tab, select Type as OAuth2.0 give modified URL, client secret and client id as input to get the access token
- Create IRPA API KEY, while creating API Trigger in cloud foundry and make a note of it, as this also needs to be passed as parameter in header.
- With these setups, you are now ready to call automation bot like API.
Don’t get confused with URL in the screenshot. The word URL points to the one while creating API Trigger and NOT the URL that you have used for getting access token.
This concludes the implementation bot for Specific Format PDF Extraction and posting to SAP ECC system using BAPI Bot via API Trigger. Hope you have enjoyed the reading and found the content useful. Thanks for your time!