First of all I would like to thank Mayank Mishra for the inspiration and the background knowledge of World Bank API described in his article ““Socio-Economic Analysis” on World Bank’s Live Data” on SCN as part of Data Geek III challenge. I recommend you to read it before you go any further.
This article has been also submitted to the Data Genius Contest 2015 in the ‘Health & Social Impact Alliance’ allegiance.
“Make sense of the world around you” – the value of knowledge that we have
I think that we can avoid many mistakes as humans leaving in XXI century if only we start using the knowledge that we already have. I don’t mean of course that we should give the steering wheel of the future to data scientists.. that would be exaggeration. On the other hand it’s not wise to rely just on ‘common knowledge’ or opinions presented by media or politicians.
What we need is… reliable facts and figures presented in an easy to understand way. One of the potential candidates for reliable source of facts is The World Bank. Why? Well … just read the The World Bank Group A to Z.
Easy integration and fast processing is key
I found that nowadays the biggest challenge and the most difficult part of drawing the knowledge from figures is data integration and fast processing of big amounts of data followed by visual presentation of results in a human readable way. I found SAP Lumira with its great user experience and capabilities one of the best tools that you can use to take the challenge.
Therefore when I read an article by Mayank Mishra a few months ago I said to myself “Huston, we have a … tool”
As I said the article by Mayank was an inspiration. The proposed data extension was good, was working and had many other advantages. But … hmm… well I was missing something in user experience. Primary one was not able to choose indicators other than these defined by Mayank directly from within the GUI. It was possible to find indexes on your own and then update the configuration file, that’s true.. but well… it’s time taking and is not easy if you don’t know where to look for.
Last but not least … I couldn’t find the working extension from Mayank released for public use.
“Simplicity is the ultimate sophistication”
For my own use I consider simplicity to be the thing that differentiates the nice-to-know solutions from these that are handy on a day-to-day basis. The world is reach of tools and you have to choose what works for you best.
So I decided I will do my own World Bank data extension for Lumira a little different.
I started with the similar approaches as Mayank. I used the same API for indicators from the World Bank as described in Mayan’s article (section 1). Similar was also the architecture (section 2). I used the code sample from the DAE development guide.
Note: SAP has recently released a new version of DAE framework – V2. My solution was written using version 1 of the framework.
I changed the workflow. I was not keen of predefining indicators in a configuration file. Instead I decided that one should have freedom to choose what exactly to analyze each time you run it.
Step 1. Read configuration file
Read ‚config.properties’ file from the same directory as DAE executable -> get proxy settings and debug flag. The file is self-explainable, sample content below:
Step 2. Make http request to grab list of indicators
In case DAE is in preview mode then we make an http request and we get one page of all records, therefore I added the ‘per_page’ parameter equal to 15,000 (currently in the World Bank database there are 12,917 indicators, so 15,000 should be enough with some space for growth).
Step 3. Show GUI and let user chose indicator he would like to grab data for
In case DAE is in preview mode and we grab the list of indicators then we display a graphical user interface and let user choose the indicator he would like to grab data for.
On top there is a search filed that filters the list of results. With more than 12,000 entries it was a must-have feature.
When you select the indicator you click ‘Submit’ button.
Step 4. Make http request to get the data for the selected indicator
Once you submitted a valid indicator then DAE makes an http request toward:
‘http://api.worldbank.org/countries/all/indicators/’ + <Selected_Indicator> + ‘?page=1&per_page=32500&format=xml’
The target XML file is then processed and preview (ca. 300) of the records is generated for Lumira.
Step 5. Process XML data file and send results to standard output
Once you click ‘Create’ the same XML file containing indicator’s data is processed in REFRESH mode. In this mode all the rows are send to standard output. Once it is done you can start playing with the data in Lumira.
Interesting is that you can also create a new datasets and then merge them using the ‘country_ID’ and ‘date’ fields.
This will let you integrate data from different indicators. That way you will be able to create your own indicators based on calculated fields in order to perform analysis of different perspectives.
Checking the working solution … still not perfect
See working solution in action on the YouTube™ video
You can get the solution from here: https://ideas.sap.com/ct/getfile.bix?a=OD5268&f=E9649E4A-AF25-43C9-99BC-61616BCBF06D
The proposed tool is not perfect. Some issues are still present:
a) Not all country names from World Bank data are understood by Lumira. There is manual workaround for this, but I will try to improve this in the next article.
b) The solution was written using Java 1.8 (JavaFX 8) and therefore it requires jre 1.8.x. Unfortunately Lumira comes with bundled jre 1.7. In order to make it working I had to compile the jar file to exec file with an option to use jre from the hardcoded path: C:\Program Files\Java\jre1.8.0_45.
c) Merging data is not ideal solution. I am looking forward to using Lumira 1.27 which should have data blending features.
d) The GUI could be a little better, specifically would be great to be able to choose more than one indicator at a time, present description for selected indicator, but well .. I am not a developer in any meaning and it was a budget solution.
The solution I publish as a .jar file. You can use it for any non-commercial purposes.
In order to use it in SAP Lumira you have to create executable out of it – I recommend using Lunch4j for that purpose. Keep in mind to select the main class “application.WbdDAE’ in Lunch4j as in the picture below.
Next week I will try to write an article about improving geo hierarchy creation in Lumira. I hope I will have time for this.
Thanks for reading and have a nice day and weekend!