Data Geek Entry – Literacy Rate Comparison in India
SAP Lumira is an easy to use BI tool that helps you to manipulate and enrich data and create visualizations by using one of the multiple charts available in the Visualization Pane. I used SAP Lumira to analyse the Literacy rates in India based on data for years 1991, 2001 and 2011. For this, I have used the data available publicly from the site http://data.gov.in/ .
In this blog, I share my findings and how I used SAP Lumira’s data enrichment and manipulation capabilities to do the analysis. The dataset contained literacy rates of Males and Female for various states and union territories and India for years 1991, 2001 and 2011 as shown below:
1. Gender Wise Literacy Comparison:
The first visualization I decided to create was to draw comparison between Literacy Rates for Male and Female. But after acquiring the data, I realised that the data type for columns 1991 Male and 1991 Female of my dataset was Text instead of Number. So, I needed to correct this to Number and bring it in line with other literacy rate columns. I could do it very easily with SAP Lumira. I right clicked on the column and choose the option Convert to Number as shown below:
This created a new column with the same data but with data type as Numeric.
After correcting or cleansing my data, I created the below visualization to analyse the gender-wise literacy growth in India from 1991 to 2011. 2 things can be inferred from this visualization at a glance:
- The Male literacy rate in India is higher than the female literacy rate for all the years.
- The Female literacy rate has grown more than the male literacy rate during the period 1991 to 2011. The Female literacy rate has grown from mere 39 in 1991 to 65 in 2011 while male literacy rate has grown from 64 in 1991 to 82 in 2011.
For generating this visualization, I used a column chart with 2 Y-Axis and kept the Measures for Female Literacy Rates(renamed from 1991,2001 and 2011 Female) for years 1991, 2001 and 2011 on Y Axis 1 and those for Males on Y-Axis 2. I kept State/Union Territory attribute (renamed to Location) as X- axis and filtered the values for Location = India.
2. State/Union territory wise Literacy Growth
Next, I wanted to analyse the overall growth of literacy in different states in India. For this, I needed the overall (average of male & female) literacy rate for these states.
I used the data manipulation capabilities of SAP Lumira for this. I selected the 1991 Male(2) column (which was generated when I converted the column of type text to number) and entered the below condition in the Formula Bar which is above the Data Pane and clicked Do It:
It generated a new Column which contained the average of literacy rates of Male and Female for each state/union territory for the year 1991. I repeated the same step for years 2001 and 2011. This gave me the overall (average) literacy rate for all the time periods.
Next, I created a measure for each of these columns and renamed them to Literacy in 1991, 2001 and 2011 for the corresponding years.Then I plotted a line chart with Literacy in 1991, 2001 and 2011 in Y-axis and State/Union Territory (renamed to Location) in X-axis.
This chart gave clear comparison of literacy rates between different states. One can easily find out which state has highest or lowest literacy, which states have higher improvements in literacy and in which states more efforts are needed to increase the level of literacy.
Next, I created a Geo Choropeth Chart for literacy rate of India in 2011. For this, I used the data enrichment capability of Lumira to create a geographic hierarchy – By Names for State/Union territory attribute (renamed to Region) as below:
This opened a pop-up for Geographic Enrichment. SAP Lumira retrieves the values from its internal Geographic database which helps in identifying the geographic location. Since, I chose the option of creation of hierarchy by names, it provides me the option to select the identifying criteria from Country, Region, Sub-Region and City. Since my column contains data for State/Union Territories, I selected the option to identify the location by Region:
On clicking Ok, Lumira displayed me the following result. It was able to identify most of the states as 100% and couple of them as 90% and as expected could not find India in the Region. I chose the suggested option for the 2 locations for which I had 90% match and clicked Ok.
This created the geographic hierarchy and updated it in the Hierarchies Section. Then I choose Geo Choropeth Chart and added the Hierarchy I had just created as Geography and Literacy in 2011 as the Measure to the chart. This displayed the below visualization:
This chart provided a better visualization to reach to analyse which states/union territories in India have good literacy rates and in which states/union territories, more efforts need to be put to increase the literacy rate.