Technical Articles
SAP RPA 2.0: No Code – Low Code Project Demo for PDF Data Extraction Using Conditions and Loops
Introduction:
We are going to see how SAP Intelligent RPA can help simplify invoice consolidation process using simple steps. We will start from creating a project, building the RPA and even Testing it successfully.
For the demo, we are considering a use case of any business receiving multiple invoices over email, the process to consolidate into a single sheet to verify / review being a very tedious and time-consuming task sometimes leading to human errors as well. We are going to see the ease of use for complex functions like PDF read, Value Lookups etc., along with usage of loops, conditional statements in the workflow to show codeless consumption experience that SAP Intelligent RPA provides.
This framework can be used to build and test rapid prototypes. Any data from any number of PDF files can be processed within minutes from creating a project, building the RPA with easy, very readable configuration steps and test it to see results.
We can also see all the different options like the process, the web capture and automations that are available to us as a centrally accessible bundle from SAP Intelligent RPA 2.0.
PDF is one the most used formats by many individuals and organizations to exchange information. It is widely used to create business related documents and therefore plays an important part in most process automations
Intelligent RPA 2.0 introduced PDF SDK which allows you to extract data from the documents with the help of user-friendly and convenient activities. It’s part of the Cloud Studio and can extract text from machine readable/generated PDF’s.
As part of the SAP Community, Do participate in the SAP Intelligent RPA Tutorials Challenge 2021 (reference link: SAP IRPA Tutorials Challenge 2021) to share more and learn more!
Overview:
At the start “Get File Collection” returns all the files in a specified folder path then a “foreach” loop is applied to execute set of commands until every File from “Get file Collection” is processed for required data.
Inside the loop for the first file in the folder collection we use “Open PDF” to open an instance of PDF document then “Search Text Items” to search text items that matches the search string and returns the position of the string, if match is not found 0 value is returned, if matches is found values is extracted from that opened PDF else the PDF instance is closed and released
For fetching “Order Id” and “Grand Total” we use “Get Text After” step which will retrieve multiple words after the given search string by using “numWords” parameter which gets number of words that is entered by the user after the search string is found in the document.
This process continues for every file in the folder collection then the process ends getting us the required information from the documents.
The final automation workflow:
First Step: Project and Package
We will create a New Project and we generate a Package from the project
. Create a New Project
The SAP Intelligent RPA Core SDK is required. So make sure this package is added to your project. If the Core SDK package is not available on your tenant, you can acquire it from the Store
. Next Step: Select Dependencies – Manage Dependency and Add Dependency
. After Adding the Dependencies will get the packages List shown in the below image
. Next Step is Automation – Select Create – Automation
. The Configure agent version is displayed – select your latest version and Confirm
. Next step we will get the below screen and next we will drag and drop the required activities
Sample Document
. First Step in Automation – Drag and drop the “Get File Collection” from the Activities and give the Folder path in which the PDF is save to the input parameters
. Next step in Automation – Drag and drop the “For Each” from the Activities. Inside the Loop Open PDF to open an instance of PDF Document which open each file in the given folder.
. Next step in Automation – Drag and drop the “Search Text items (PDF)”. Give the search string value in the input parameters, to search text items in that PDF which returns the position of the string in the document.
. Next step in Automation – Drag and drop the “Condition” from the Activities. According to Condition Expression “Step4.textitems. length! =0” if expression is true then it executes the “Get Text After” workflow else it displays Message saying invalid document.
. Next step in Automation – Drag and drop the “Get Text After (PDF)” from the Activities and give the search string value in the input parameters and also give the number of words to get from the PDF document in “numWords” and same as the next Get Text After(PDF).
. Next step in Automation – Drag and drop the “Open Message Dialog” from the Activities is used to Display the Message
. Last step in Automation – Drag and drop the “Close and Release PDF” from the Activities
. Final Results of this Automation
Link to the running bot with details
Link to the running bot in simplified version
Conclusion:
We can extract the data that we are displaying on screen to multiple options like an excel spreadsheet or directly to an email etc., very easily using the SAP intelligent RPA bundle of features with codeless experience.
With this blog post, the intention is to get customers, business managers and RPA developers into thinking about using SAP Intelligent RPA 2.0 into various automation opportunities by showcasing the simple steps required to achieve great results. Also, to start the conversation about utilizing the various options available within SAP Intelligent RPA 2.0 to make these scenarios a reality.
This automation will benefit the roles business Process Lead, Business Process Analyst, Business Process SME, Process Executioner.
I hope you found this tutorial helpful. Please provide feedback in the comment section and feel free to ask any questions in the SAP Intelligent RPA Q&A area (link for reference: https://answers.sap.com/tags/73554900100800002142)
Thanks Ajeya! great blog!
I really like a blog with a great "how to do it" not just you can do it. Nice job.
What I wonder when I look at something pulling in a PDF is can it be done easier? I am in a very small IT group. So when we look at something like third party PDFs. It's so much easier to bring in a company to do it. There are several really good ones out there.
My point is simple. I love this blog for step by step, and perhaps a smaller integration. But I would also explore other available options. There companies out there that offer a service when it comes to PDFs. Some of them even have "SAP" in their name. 😉
Hi Ajeya,
Thanks for the article, it was helpful.
In case as per the sample pdf we have to read line items, then how are you going to achieve that?
In your sample - line item has just 1 line, but it could vary (3, 5, 7 etc.). It also, could go to next page in the PDF.
Have you tried this use case? Can you please share more details on this?
Regards,
Nikhil
Nice Blog. I would prefer use Machine Learning process for Data Extraction to extract data. Please build one with this process.
Thanks
Arghadip