Challenge Submission – How to Extract Data from a Web-page to Excel using SAP Intelligent RPA
This is a submission for the SAP Intelligent RPA Tutorial Challenge
In recent months, Robotic Process Automation (RPA) has been a hot topic among business process experts and helps to automate the repetitive tasks in an organization. SAP Intelligent RPA free trial account is available now and there is a chance to automate your scenario using the SAP Intelligent free trial account.
If you are starting fresh with SAP Intelligent RPA, I would strongly recommend to get familiar with the components and also activate your trial account here
If you are trying to install it for the first time, please use click here.
Many of us are aware of how to use Excel to extract data from webpages. It has always been a practice to write macros and run them as required. During my experience, I have often observed that we miss to capture data, filter and may have to do it multiple times based on the changes. I have also seen we miss to run the excel macro in a timely manner.
This blog post focuses on extracting basic data from the webpage and saving the data to an excel spreadsheet. Further, my next blog post demonstrates how to deploy cloud services and trigger this automation in a timely manner.
Pre-requisites for this blog post:
- Desktop Studio
- Desktop Agent
- Cloud Factory
- The zoom level of the web page (internet explorer) and Display (I.e. Change the size of text, apps, and other items) should be 100%.
Note: Trial accounts or services always come with limitations.
In this blog post, you will learn the following.
- To select a declaration on the web page.
- How to add an extension in the project.
- To create the workflow in the webpage.
- To build and debug the project.
- Extract data from the webpage using excel.
Before you begin, you may want to look at
- How to Build Bots with SAP Intelligent Robotic Process Automation
- SAP Intelligent Robotic Process Automation in a Nutshell
1. Creation of new project [File->New project]
2. Open https://help.sap.com/viewer/index in internet explorer.
3. Open Add Application in the Desktop Studio.
4. Select the above https://help.sap.com/viewer/index URL (it will appear in the screen). If URL does not appear, click on the refresh button. Edit and change the Name as highlighted below. Here it is “SAPHELPPORTAL”. Click on the Save option.
5. Right-click on SAPHELPPORTAL and click on Capture a new Page.
6. Select the URL- If the page URL is not coming click on the refresh icon. Don’t click on Scan and Capture.
Note:- If the target zoom level is more than 100 percent, it will not fully capture the page. The zoom level of the webpage (internet explorer) and Display (I.e. Change the size of text, apps, and other items) should always be 100% as mentioned earlier in the prerequisite.
7. Go to (https://help.sap.com/viewer/index ) which is already open in the internet explorer and Use CTRL Key + mouse hover over the page. The red outline will select the page. If the entire page is not selected check the zoom level (already mentioned in the prerequisite)
8. Click on Scan and Capture. Once the capturing is completed the below page will be seen.
9. Double click on the highlighted domain in the “Captured Data” panel to set the criteria. Here we have set Domain = help.sap.com as the criteria.
Once the criteria is set, the highlighted page will turn into green.
10. Select “Enter Keywords or a product name” search field by clicking on it. Upon selection, it will turn dark blue. After this oSearchKeywords is generated for “Enter Keywords or a product name”. It can be renamed also. The criteria should be selected for this. Here the criteria set is name = searchKeywords. Once the criteria is set, the highlighted page will turn into green.
Alternatively, it can be selected from DOM by selecting the “Both” radio buttons at the top
11. Select the Search icon by clicking on it. For this oInput is generated. It can be renamed also. The criteria should be selected for this. Here the criteria set is type = submit. Initially, it will be red after setting the criteria it will turn green.
12. Now capture the next page of help.sap.com
13. Follow the same thing as we did to capture the first page of help.sap.com
- Open the above page link
- Click on capture a new page
- Refresh and then select the page
- Use Ctrl key and hover over the page. The red outline will select the page
- Click on Scan and Capture
14. Once the capture is finished, it will be seen as shown below. The page captured can be rename in the parameter as shown below.
15. Set the criteria for the captured page highlighted in red. Once the criteria is set it will turn into green. To set the criteria double click on the captured data. Here we have set Domain = help.sap.com
16. Select the first div of the search in the green by clicking on it. It is not clear then select it by Document Object Model (DOM). Check the Both option and go to the DOM to select the first search div as mentioned earlier. Here the criteria is class=search-result ng-scope
17. To select the all the div search check the occurs checkbox.
18. Select the first link of the search result as highlighted below by clicking on the header. If it is not clear, then select it by Document Object Model (DOM). Check the Both option and go to the DOM to select the first search header. Here the criteria is contentEditable=inherit
19. To select all the search header, write in the parameter section ancestor=oSearch_ct_0
20. Now click on the workflow.
21. Right-click on SAPHELPPORTAL and then click on New Workflow
22. Now enter the Workflow Name and Comments and click on the Save.
23. Choose the workflow SAPHELPPORTAL and then in the Activities pane, select Start Application and drag and drop in the workflow to connect with the Start as shown below.
24. After this in the properties enter description and Application detail.
25. Now drag and drop the first page in the workflow and connect it with arrows.
26. Double click on the page activity.
27. Drag the set activity to highlighted item.
28. Set activity is done as shown below.
29. Click on the highlighted Set activity
30. Enter the below detail in the properties.
31. Right-click on the highlighted item (search icon) and select Click on ‘the item’.
32. “Click on Oinput(2)” will appear in Activities section.
33. Drag and drop “Wait exist” activity as shown below.
Note:- Don’t get confused between “Wait Unti” and “Wait Exist”. For the entire scenario “Wait exist” is used.
34. Double click on the ‘ Wait exists’ as highlighted in the above step and enter the below details.
35 Now drag and drop “Set irpa in oSearchKeywords(1)” inside “Wait until”
36. Now drag and drop ‘click on Oinput(2)’ activity inside ‘Wait until exist’ as shown below’.
37. Now drag the next page and connect it with an arrow to the first page.
38. Now double click on the page and drag the activity ‘Wait exist’ in the first header. After completing this, it will appear below Activity/Description.
39. Now double click on “Wait until exist” and enter the detail in the properties.
Note: Item may differ according to your project.
40. Similarly, drag the ‘Get Table’ activity in the first header.
41. Now drag the “Get Table” inside ‘Wait until exist’.
Now, the alignment will appear as shown below:
42. Now click on the project Web Page Excel Automation to return to the workflow.
43. Go to File->Edit Project->Libraries and check Excel Integration checkbox.
44. Now drag and drop “Initial Excel” and “Create Excel” from the activities in the workflow as shown ( do it in the same way for “Initial Excel” too)
45. Now set range table and connect in the workflow.
46. Double click on the Set range table and enter the below detail.
47. Now drag and drop “Release Excel” and connect it as shown below.
48. The Final workflow is below.
49. Now click on the build option to build the project.
50. Build phase is completed.
51. Now click on the debug option.
52. Debugging has started.
53. Now desktop agent will appear in the taskbar. Click on the highlighted Desktop agent.
54. Click on the Test WebPage_Excel_Automation.
55. The end result is below.
Url:- https://help.sap.com/viewer/index will open.
On the search bar, irpa will be entered and then the search icon will be clicked as shown.
Url:- https://help.sap.com/viewer/search?q=irpa&language=en-US&state=PRODUCTION&format=standard,html,pdf,others will open.
It will select all the search headers.
Now, all the search results links will get imported into the excel.
56. Stop the debugger.
Conclusion:- I hope now it will be easy to extract the data from the web page using SAP Intelligent Robotic Process Automation using the above example.
In the next blog post, I will brief you about deploying the bots on SAP Intelligent Robotic Process Automation Factory. Stay tuned ?
Thanks for going through the blog post ?. Please feel free to drop comments.
Disclaimer:- I have learnt IRPA from open SAP course( https://open.sap.com/courses/rpa2/overview – How to Build Bots with SAP Intelligent Robotic Process Automation course). This blog post is built on one of the hands-on example discussed during the course.
Excellent blog Sonu...!!
Very Informative and Useful !
Hi Sonu Agarwal
I have set my browser zoom to 100%, still facing issue in capturing Fiori pages.
Kindly help in resolving this issue.
The Scale and layout setting maybe 150% in your system.
Go to Display settings >> Scale and layout make it 100 %. By default, it is 150%.
After this restart your Desktop Studio and capture the page again.
Yes the problem was with system display setting, thank you for helping me resolve this issue.
In step 46, I am not getting option to fetch data from pHttpsHelpSapCom1. Need to configure something for that?
Right-click on pHttpsHelpSapCom1 and click on Create Item. Store everything in the new item created.
Very Informative ! Thanks.
Very Nice Article.
Great Article! Very informative and helpful for anyone starting with iRPA .
I was able to open the webpage but the value is not getting populated and button is also not working. What would be the issue ? Thanks in advance
Did you add the excel library in the desktop studio?
Please check excel library is added.
How to extract a particular attribute of an element like href link from the webpage?
Thanks in advance.
I´ve followed your steps and I just face a problem that I´m not able to solve.
Even after I have set a criteria (DOMAIN = help.sap.com), my page doesn´t turn green.
Do you know what I´ve probably done wrong, or what I am supposed to do?
Thank you very much!
Hi! I'm facing the same issue, did you manage to solve it?
I was wondering the same and I just added the TITLE criteria to both the Pages and then it turned Green. I suspect its due to the fact that both page's DOMAIN criteria is same and TITLE (or any unique tag) just adds uniqueness to it.
step : 44 . create excel :
step 45. set range table
why my studio have no activities : create excel and set range table ? is it a trial version ?
I think Set Values is similar to Set range table in the newer version.
You can find inside Excel Lib -> Data -> Set Values.
when i change it to lower version , it is display okay .
I set the ie zoom with 100% and can capture all the screen,
but there is a issue then:the fields position can not match on the captured screen.
If I set 125% then the position is OK but can not get all the screen.
In step 15 when setting same domain as criteria again, if you face the problem of text not turning green. Follow this link.
In step 34, since the ct_0 appears on the next page, it is better to give the search button as the wait until option.
Si yo necesito extraer información de una etiqueta a el href como podría realizarlo.
Esta es la página:
Siguiendo los pasos, puedo obtener la información que esta visible (el texto), pero en el proceso se necesita la URL del href