Website to Visuals – Scraping Website into Data within SAP Lumira
Digital business demands dictate an ever increasing need for the extraction of information and useful deductions from live/ real time data on the web. Be it manufacturing or sales or any other department, they all require to base their decisions on real time data. Building visualizations from static data however, is very simple. But this does not provide a solution, for example, to a sales management executive who wants to visualize real time sales by product category. This is where there arises a need for a method for extraction and visualization of live data. While SAP Lumira helps best with the visualization, we need a method for extracting this live data into SAP Lumira.
Import.io seems to be the best option to fit the requirement. It is free, easy to use and powerful with lot of features that we can use for extracting the data into SAP Lumira. Read on to find out how to scrape a website and bring in live data for creating visualizations.
I had come across this great blog post: http://scn.sap.com/community/lumira/blog/2014/06/08/using-importio-to-extract-any-data-from-the-web-into-lumira-shuffle-up-and-deal-edition by Ronald Konijnenburg about using import.io in Lumira to extract data from the web where you can follow to setup and expose Data as an API inside import.io
We would need to have live data. Every time we need Lumira to fetch the updated data from the website and so the CSV method would not work. The DA extensions of SAP Lumira would help us connect to the Data source. So let’s build a DA extension for SAP Lumira. For our simplicity of understanding, we are taking the same WSOP data from http://www.wsop.com/players/index.asp?pagenum=1
The process of DA extensions is simple enough to fetch the API and bring the data into Lumira. The data is converted into CSV and is then consumed by Lumira for Visualization. The extension is then installed into Lumira.
Figure 3: Print data
You can then build the extension into binary file and use it in Lumira.
You could check out the following Lumira – Open Source Data Access Extensions to learn about DA extensions and developing DA extensions for SAP Lumira.
Installing and Executing the Extension
1. Installation of the Extension is simple. Place the binary file in the Desktop\daextension folders located inside the SAP Lumira install folder.
Figure 4: installing extension
2. Open SAPLumira.ini and add the following two lines at the end
-Dhilo.externalds.folder=C:\Program Files\SAP Lumira\Desktop\daextensions
Figure 5: Editing SAPLumira.ini
3. Open a new Document inside SAP Lumira. Select External Datasource and click on Next
Figure 6: Accessing the External Datasource
4. Select the importio-da-extensions and click on Next
Figure 7: import.io extension
6. You will be prompted for the number of pages to be fetched. Often websites have pagination that contains data across multiple pages. So you can set how many pages you want to fetch.
Figure 9: Page Number
7. You will be fetched with all the Data. This data can then be created as a Data source in Lumira.
Figure 8: creating dataset
Creating the Visualization
Here, we create couple of visualizations that give us an insight into the World Series of Poker rankings.
This tree chart shows the comparison of the number of bracelets and rings won by players in WSOP and the tag cloud displays player name by their WSOP ITM. These two charts give the viewer an idea of the players with maximum victories in Poker circuits and the World Series itself.
Figure 9: Tree chart- Bracelets vs Rings
The DA extension for import.io is completed. You can feel free to download the code and the executable from my github https://github.com/sgsshankar. Now Website to Visuals is just a button away with SAP Lumira and DA extensions.