Challenge Submission – How to Extract Data from a Web-page to Excel using SAP Intelligent RPA
This is a submission for the SAP Intelligent RPA Tutorial Challenge
In recent months, Robotic Process Automation (RPA) has been a hot topic among business process experts and helps to automate the repetitive tasks in an organization. SAP Intelligent RPA free trial account is available now and there is a chance to automate your scenario using the SAP Intelligent free trial account.
If you are starting fresh with SAP Intelligent RPA, I would strongly recommend to get familiar with the components and also activate your trial account here
If you are trying to install it for the first time, please use click here.
Many of us are aware of how to use Excel to extract data from webpages. It has always been a practice to write macros and run them as required. During my experience, I have often observed that we miss to capture data, filter and may have to do it multiple times based on the changes. I have also seen we miss to run the excel macro in a timely manner.
This blog post focuses on extracting basic data from the webpage and saving the data to an excel spreadsheet. Further, my next blog post demonstrates how to deploy cloud services and trigger this automation in a timely manner.
Pre-requisites for this blog post:
- Desktop Studio
- Desktop Agent
- Cloud Factory
- The zoom level of the web page (internet explorer) and Display (I.e. Change the size of text, apps, and other items) should be 100%.
Note: Trial accounts or services always come with limitations.
In this blog post, you will learn the following.
- To select a declaration on the web page.
- How to add an extension in the project.
- To create the workflow in the webpage.
- To build and debug the project.
- Extract data from the webpage using excel.
Before you begin, you may want to look at
- How to Build Bots with SAP Intelligent Robotic Process Automation
- SAP Intelligent Robotic Process Automation in a Nutshell
1. Creation of new project [File->New project]
2. Open https://help.sap.com/viewer/index in internet explorer.
3. Open Add Application in the Desktop Studio.
4. Select the above https://help.sap.com/viewer/index URL (it will appear in the screen). If URL does not appear, click on the refresh button. Edit and change the Name as highlighted below. Here it is “SAPHELPPORTAL”. Click on the Save option.
5. Right-click on SAPHELPPORTAL and click on Capture a new Page.
6. Select the URL- If the page URL is not coming click on the refresh icon. Don’t click on Scan and Capture.
Note:- If the target zoom level is more than 100 percent, it will not fully capture the page. The zoom level of the webpage (internet explorer) and Display (I.e. Change the size of text, apps, and other items) should always be 100% as mentioned earlier in the prerequisite.
7. Go to (https://help.sap.com/viewer/index ) which is already open in the internet explorer and Use CTRL Key + mouse hover over the page. The red outline will select the page. If the entire page is not selected check the zoom level (already mentioned in the prerequisite)
8. Click on Scan and Capture. Once the capturing is completed the below page will be seen.
9. Double click on the highlighted domain in the “Captured Data” panel to set the criteria. Here we have set Domain = help.sap.com as the criteria.
Once the criteria is set, the highlighted page will turn into green.
10. Select “Enter Keywords or a product name” search field by clicking on it. Upon selection, it will turn dark blue. After this oSearchKeywords is generated for “Enter Keywords or a product name”. It can be renamed also. The criteria should be selected for this. Here the criteria set is name = searchKeywords. Once the criteria is set, the highlighted page will turn into green.
Alternatively, it can be selected from DOM by selecting the “Both” radio buttons at the top
11. Select the Search icon by clicking on it. For this oInput is generated. It can be renamed also. The criteria should be selected for this. Here the criteria set is type = submit. Initially, it will be red after setting the criteria it will turn green.
12. Now capture the next page of help.sap.com
13. Follow the same thing as we did to capture the first page of help.sap.com
- Open the above page link
- Click on capture a new page
- Refresh and then select the page
- Use Ctrl key and hover over the page. The red outline will select the page
- Click on Scan and Capture
14. Once the capture is finished, it will be seen as shown below. The page captured can be rename in the parameter as shown below.
15. Set the criteria for the captured page highlighted in red. Once the criteria is set it will turn into green. To set the criteria double click on the captured data. Here we have set Domain = help.sap.com
16. Select the first div of the search in the green by clicking on it. It is not clear then select it by Document Object Model (DOM). Check the Both option and go to the DOM to select the first search div as mentioned earlier. Here the criteria is class=search-result ng-scope
17. To select the all the div search check the occurs checkbox.
18. Select the first link of the search result as highlighted below by clicking on the header. If it is not clear, then select it by Document Object Model (DOM). Check the Both option and go to the DOM to select the first search header. Here the criteria is contentEditable=inherit
19. To select all the search header, write in the parameter section ancestor=oSearch_ct_0
20. Now click on the workflow.
21. Right-click on SAPHELPPORTAL and then click on New Workflow
22. Now enter the Workflow Name and Comments and click on the Save.
23. Choose the workflow SAPHELPPORTAL and then in the Activities pane, select Start Application and drag and drop in the workflow to connect with the Start as shown below.
24. After this in the properties enter description and Application detail.
25. Now drag and drop the first page in the workflow and connect it with arrows.
26. Double click on the page activity.
27. Drag the set activity to highlighted item.
28. Set activity is done as shown below.
29. Click on the highlighted Set activity
30. Enter the below detail in the properties.
31. Right-click on the highlighted item (search icon) and select Click on ‘the item’.
32. “Click on Oinput(2)” will appear in Activities section.
33. Drag and drop “Wait exist” activity as shown below.
Note:- Don’t get confused between “Wait Unti” and “Wait Exist”. For the entire scenario “Wait exist” is used.
34. Double click on the ‘ Wait exists’ as highlighted in the above step and enter the below details.
35 Now drag and drop “Set irpa in oSearchKeywords(1)” inside “Wait until”
36. Now drag and drop ‘click on Oinput(2)’ activity inside ‘Wait until exist’ as shown below’.
37. Now drag the next page and connect it with an arrow to the first page.
38. Now double click on the page and drag the activity ‘Wait exist’ in the first header. After completing this, it will appear below Activity/Description.
39. Now double click on “Wait until exist” and enter the detail in the properties.
Note: Item may differ according to your project.
40. Similarly, drag the ‘Get Table’ activity in the first header.
41. Now drag the “Get Table” inside ‘Wait until exist’.
Now, the alignment will appear as shown below:
42. Now click on the project Web Page Excel Automation to return to the workflow.
43. Go to File->Edit Project->Libraries and check Excel Integration checkbox.
44. Now drag and drop “Initial Excel” and “Create Excel” from the activities in the workflow as shown ( do it in the same way for “Initial Excel” too)
45. Now set range table and connect in the workflow.
46. Double click on the Set range table and enter the below detail.
47. Now drag and drop “Release Excel” and connect it as shown below.
48. The Final workflow is below.
49. Now click on the build option to build the project.
50. Build phase is completed.
51. Now click on the debug option.
52. Debugging has started.
53. Now desktop agent will appear in the taskbar. Click on the highlighted Desktop agent.
54. Click on the Test WebPage_Excel_Automation.
55. The end result is below.
Url:- https://help.sap.com/viewer/index will open.
On the search bar, irpa will be entered and then the search icon will be clicked as shown.
It will select all the search headers.
Now, all the search results links will get imported into the excel.
56. Stop the debugger.
Conclusion:- I hope now it will be easy to extract the data from the web page using SAP Intelligent Robotic Process Automation using the above example.
In the next blog post, I will brief you about deploying the bots on SAP Intelligent Robotic Process Automation Factory. Stay tuned ?
Thanks for going through the blog post ?. Please feel free to drop comments.
Disclaimer:- I have learnt IRPA from open SAP course( https://open.sap.com/courses/rpa2/overview – How to Build Bots with SAP Intelligent Robotic Process Automation course). This blog post is built on one of the hands-on example discussed during the course.