Data Geek III : “Socio-Economic Analysis” on World Bank’s Live Data
Hello,
This is my entry to Data Geek III challenge under House of Spirit “Caring for the Social Good”. I am very new to SAP Lumira about a month old. Lumira allows us to experiment by using Visualization and Data access extensions. It is the one of the best loosely coupled tool. Data access extension (DAE) allows us code in language of our choice (Java, C# python etc) and there is no limit to what we can do in Lumira. It enables us to access anything available on open internet and create amazing visualizations standard or custom.
Idea: My idea is to show case the power of data access extension (DAE) and get Live Data from World Bank RESTful APIs in XML format, create visualization using standard charts, info graphics and minor visualization extension and also few steps towards building a Generic Data Access Extension (DAE). Story is called “Socio Economic Analysis”. It allows you to do country wise analysis of Indicators Economy, Growth, Population and social development. Along with DAE, I have shown integration with ESRI, Visualization extension to load countries flag, lots of charts and Info Graphs.
1. World Bank Indicators & APIs:
The World Bank Indicators API lets you programmatically access more than 13000 indicators and query the data in several ways, using parameters to specify your request. Many data series date back 50 years, and can be used to create interesting visualization. APIs implement RESTful interfaces to allow users to perform queries of available data using selection parameters. For the Indicators API XML and JSON representations are available (I have handled XML format responses).
I have used only 25 Indicators for years 2000-2013 data out of 13000+ indicators (For details please refer to the attachment). This resulted into pulling out total 90300 records from World Bank Live into Lumira. (25 indicators X 3612 records for each indicator).
To see all available indicators from World Bank use below link.
XML Response: http://api.worldbank.org/indicators?format=xml
Website: Indicators | Data
e.g. Indicator NY.GDP.MKTP.CD will return GDP (current US$) for each country per year. (For details the details of indicator accessed live for building this analysis)
2. Data Access Extension Architecture
3. Data Access Extension Workflow
3.1 Steps 1 & 2 (Dynamic Generation of URL)
There are two configuration files config.txt and param.txt. Configuration files help in forming URL on run time and avoid hard coding URL in java program.
config.txt
URL=http://api.worldbank.org/countries/all/indicators/“VAR1“?page=”VAR2“&per_page=”VAR3“&format=”VAR4“&date=”VAR5“
VAR1=NY.GDP.MKTP.CD||NY.GDP.MKTP.KD.ZG||NE.EXP.GNFS.ZS||NY.GDP.PCAP.CD
VAR2=1
VAR3=300
VAR4=xml
VAR5=2000:2013
Double pipe is used to separate multiple values of VARs. So once program starts parsing the entry in config.txt files for VAR1, it generates four difference URLs to make http connection. Extension processes all VARs entries from top to bottom and accordingly generates URLs.
URL formed based on above config file entry:
- http://api.worldbank.org/countries/all/indicators/NY.GDP.MKTP.CD?page=1&per_page=300&format=xml&date=2000:2013
- http://api.worldbank.org/countries/all/indicators/NY.GDP.MKTP.KD.ZG?page=1&per_page=300&format=xml&date=2000:2013
- http://api.worldbank.org/countries/all/indicators/NE.EXP.GNFS.ZS?page=1&per_page=300&format=xml&date=2000:2013
- http://api.worldbank.org/countries/all/indicators/NY.GDP.PCAP.CD?page=1&per_page=300&format=xml&date=2000:2013
So far param.txt has only one entry. I am yet to think about other parameters which to make extension more generic. DEBUG=Y flags determine writing of log file for trouble shooting. Default is DEBUG=N
Log files Entries:
****Start of Logging: Mon Nov 10 19:28:59 IST 2014 *****
[StartBlock]: -mode preview -locale en -size 300
[ConnectBlock]: Success
[ParseBlock]: Success
[ConsoleWrite]: Success
****End of Logging: Mon Nov 10 19:28:59 IST 2014 ****
Note: Logging in DEBUG mode & error handling should be exhaustive to help us the troubleshooting of an extension lying in some remote desktop.
3.2 Steps 3, 4 & 5
Below step is to be repeated for all dynamically formed URLs.
URL url = new URL(urlFormed);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod(“GET”);
conn.setRequestProperty(“Accept”, “application/xml”);
//System.setProperty(“http.proxyHost”, “proxy”); — Optional
//System.setProperty(“http.proxyPort”, “8080”); — Optional
//System.setProperty(“http.proxyUser”, “ItsMe”); — Optional
//System.setProperty(“http.proxyPassword”, “Hellopassword”); — Optional
if (conn.getResponseCode() != 200) {
throw new RuntimeException(“Failed : HTTP error code : “
+ conn.getResponseCode());
}
3.2 Step 6
All XML responses are to be merged in single XML response and later converted into comma separated format for console writing. For merging we need to added child element from subsequent responses to the root node of first response. You may use third part jars available or write your own.
4. Lumira Story (Desktop Version used 1.19)
Selecting WorldBankClient.exe
Execution of extension
Page1: First is the welcome page. It shows the list indicator which we would be analyzing. Economy and Growth, Population, Social Development, Health & Gender, Agriculture.
Page 2: This page integrates Lumira with ESRI geo charts and shows the income level spread of all countries. Africa has the highest concentration of Low Income Countries and Europe High income countries.
Page 3: This is the main dashboard for economy and growth indicators based on the country selected. Visualization extension loads the country flag depending on the country selection. Highlights->
- Show the current GPD in USD, GDP growth%, GDP per capita.
- Draws a trend of GDP growth % for a country and compares it with aggregated values at region and world level. E.g. For Australia it shows trend comparison between the country, East Asia pacific and World.
- Shows the trend of export/import % contribution to the overall GDP.
Page 4: Shows the new indicator population’s cloud.
Page 5: This is the main dashboard for Population. Highlights->
- Show the total, urban and rural population.
- Draws a trend of population growth % for a country and compares it with aggregated values at region and world level. E.g. for India it shows trend comparison between the country, south Asia and World.
- Shows the stacked bar age wise % of total population. Age tires are 0-14, 15-64 and 64+.
Page 5 InfoGraph: Info Graph, Shows the Social development of African country Ethiopia’s social development score card. It mainly projects indicators Life expectancy at birth, HIV prevalence %, Access to water sources in urban/rural areas, access sanitation facilities, health expenditure, cellular subscription etc.
References:
Thanks for reading it. Hope you found it interesting.
Best Regards,
Mayank Mishra
Hi Mayank
Very inspiring story you have made!
Br Michael
Thanks a lot Michael.
I just used 25 indicators out of 13000+ indicators provided by World Bank. We could do wonders with that amount of meaningful data of last 50 years. I am planning to create few more stories. Would publish them soon. 🙂
Best Regards,
Mayank
Hi Mayank,
Could you possibly share the java code of the dae that you created? I am not best in java programming and your solution is perfect when you need to do some interesting comparission of data.
/Marcin
Let me know the step where you are facing problem.
My code is pretty complex to begin with. I soon plan to write a blog on creating DAE with both server and client side components to help community experiment and understand DAE. In that I would add the java code for both sever and client class. I should be able to publish it by mid of next week.
Thanks.
Hi Mayank,
That would be very helpful. In DAE developers guide there is no info regarding which IDE to use for java based dae and how to convert java class into exe file.
Thanks, Marcin
very nice!
Thanks Jens.
Really Nice use of the WB data and smart analytics shown by you.
Thanks Santosh.
Hi Mayank,
This is really great .... now power of Lumira is getting visible.
Regards,
Deepak
Thanks Deepak.
Nice blog Mayank ! very nicely explained.
Thank you.
Nicely done
Thanks Miha.
Hey Mayank,
nice to read the comprehensive blog.
Lot to learn.
Best Regards
Vaibhav
Thanks for your feedback.
Good one Mayank....Nice to know Lumira lets has such placeholders for wirting open source progam interfaces with data systems. It's a compelling capablity beyond just the visualization.
Thanks Sudheer.