Challenge Submission: Web data extraction and Sentiment Analysis using SAP Intelligent RPA
Welcome All to my first tutorial blog post in the SAP world!
This is a submission for the SAP Intelligent RPA Tutorial Challenge.
In this tutorial, we will be creating a bot which logs into Twitter, extracts a tweet from the account and assigns it a sentiment polarity number. Even though we can download a mass dump of the tweets, the tutorial intends to use front-end of the application to utilize the different features of the tool and implement its different functionalities.
Before you begin, please ensure you have the following prerequisites:
- Desktop Studio (On-premise component)
- Cloud Factory
- SAP IRPA extensions enabled in the browser (IE/Chrome whichever you plan to use)
- All environment variables have been set
- A Python script to perform sentiment analysis which takes as input a text and outputs the sentiment polarity in a TXT file (You can download the script from the GitHub link https://github.com/RichaPandit/SentimentAnalyser)
- Open Desktop Studio and press Ctrl+N to create a new project.
- Enter the project details and click on Save.
- Click on ‘+Add Application’. Meanwhile, open in an Internet Explorer window this URL – https://twitter.com/login?lang=en.
- Select technology as Web, click on the Refresh icon to show the list of web applications. Select the Twitter login page window, enter a valid name for the application and click on Save.
- Right click on the application which you created and select ‘Capture a New Page…’ to open a Capture Page pop-up.
- Click on the Refresh icon to select the IE page. Use Ctrl + Mouse Hover on the same IE page till you see a red border around the page. Now move back to the Capture Page pop-up and click on Scan And Capture.
- Once the page is scanned, you can see the screen in the Capture Panel. Double-click on the ‘DOMAIN=twitter.com’ in the Captured Data panel to move it to the criteria and you can see the page color change from Red to Green in the Recognition Tool Panel (i.e., the Application sub-tree).
- Select the fields Username, Password text-boxes and the Login button with the appropriate criteria and ensure that their names’ color changes to green from red which means reliable attributes to recognize the elements have been selected.
- Now, login to your account manually and create another page for your home page to extract the tweets. Here, after traversing the DOM tree, a DIV tag was found which houses the tweet text. Selecting appropriate criteria for the HTML control ensures its color changes from red to green again in the Applications tree.
- For extracting patter-based data, since this element occurs multiple times in the page, go to the Parameters section for this control and select the checkbox against ‘4 – Occurs’.
- Click on ‘+Add Application’. Meanwhile, open Command Prompt window. In the Capture Page pop-up, select the technology as UIAutomation. Click on Refresh to select the command prompt window and hit Save.
- Right click on the application which you created and select ‘Capture a New Page…’ to open a Capture Page pop-up. Click on the Refresh icon to select the window. Use Ctrl + Mouse Hover on the same window till you see a red border around it. Now move back to the Capture Page pop-up and click on Scan And Capture.
- Go to the parameters section of the application and page to ensure the parameter ‘Launch Path’ is populated with the right value. If not, fill it manually.
- Once the page is scanned, you can see the screen in the Capture Panel. Double-click on the ‘Name= C:\Windows\System32\cmd.exe’ in the Captured Data panel to move it to the criteria and you can see the page color change from Red to Green in the Recognition Tool Panel (i.e., the Application sub-tree). Capture the text-area control of the prompt as was done earlier for the web-controls by selecting appropriate criteria.
- Move to the ‘Workflows’ section and right-click on the application created. Select ‘New workflow…’.
- From the Activities panel, select the Start activity under Application and connect it to the workflow.
- Set the parameters for this activity by selecting the application.
- After opening the web-application, the next action you would want to perform is entering the credentials. So, go to Pages panel and drag and drop the login page to your workflow area and sequence it.
- Double-click the newly-created activity to enter its scope. Add activities in the sequence of actions that need to be performed. For example, here I am waiting for the username text-box to appear and then enter the credentials followed by clicking on the Login button. You can fetch the credentials from the Get credential activity but here, a variable $vPassword$ has been created and its value has been hard-coded in the bot itself.
- Go back to the workflow, add similar page for your home page and get the data using the Get activity. Your workflow would somewhat look like below.
- Go to File -> Edit Project…. Select ‘Libraries’ tab in the pop-up. Check the Excel Integration check-box, and click on Save button.
- From the Activities panel, select the Start activity under Application and connect it to the workflow. Set the parameters for this activity by selecting the Command Prompt application. Here, additionally, I wished to open the prompt in the path where my python script exists and hence, the value in the arguments field.
- After opening the Prompt, I would want to call the Python script to give me the sentiment polarity for the text I have extracted from my twitter division. So, go to Pages panel and drag and drop the Command Prompt page to your workflow area and sequence it. Double click the page and use the Set command to enter value in the text-area you identified earlier in the application scope and hit ENTER.
- The Output of this workflow would be stored in the directory you would have mentioned in the second argument while entering text into the Command Prompt. By the end of these actions, your workflow would look somewhat like below.
- Build your solution by pressing Ctrl + B. And debug it to view the output. The Output trace would be as shown below with your twitter account logged in and the python script run via Command Prompt.
In a nutshell, we have used web and windows automation for this tutorial and a lot more can be added to this existing functionality which are listed in the following section.
- Adding a loop to iterate through all the tweets
- Adding a machine learning component to train and self-adjust the polarity
- Maintaining a corpus of the tweets and update it regularly for the ML part in a database
Looking forward to your feedback/queries/comments!
Video of the bot run: