Data Geek Challenge – Aadhaar Data Analysis Using SAP Lumira
From the time I saw Data Geek challenge, I was thinking to participate into it with my data story and today is the time I am going to share it with you.
I choose Aadhaar Dataset which is available at Aadhaar public data portal at https://data.uidai.gov.in/uiddatacatalog/dataCatalogHome.do
Reason I selected this dataset is I was looking for data which will let me do geographical analysis 🙂 as well as I want to get answers to below few questions such as,
- Which states are generating more number of aadhaar cards?
- Which Enrolment agencies are doing better?
- What is the number of residents providing email, mobile number?
- Top districts where enrolment numbers are high?
- Aadhaar generated by States and gender?
- Top districts in Maharashtra state where enrolments is high? and etc.etc…
To get answers, First I downloaded “Enrolment Processed in Detail” data set from https://data.uidai.gov.in/uiddatacatalog/getDatsetInfo.do?dataset=UIDAI-ENR-DETAIL
Here I provided date range from 1st August till 15th August and selected dataset with maximum number of records which I got for 6th August with around 1 million records!
After importing this data, I selected option to create geography hierarchy where I found out that at state and district level, names resolution did not found proper propositions and hence I cleaned data at least for Maharashtra State. After data cleaning activity, I re imported CSV file as below.
With below screen, it is clear that number of residents providing mobile number is quite high than those providing email id.
Maharashtra state is having maximum number of residents providing email during aadhaar registration.
But with the combination of enrolment agency, registrar and state, Karnataka is having maximum number of residents providing email.
With below screenshot, we can easily identify top districts where enrolment number is high. Warangal is topmost district followed by Pune. for this view, Lumira took lots of time. I guess this is because of big number of rows.
Within Maharashtra state, Pune is at the top of the list!
Now if we go one level down i.e. at district level, In the Pune district, Pune City sub-division is having maximum number of residents which provides email. again as Pune city is having maximum number of residents, I am not surprised by this outcome.
In Maharashtra state, Mumbai is the district where maximum number of residents are providing email.
Please note – This analysis is based on data generated on 6th August 2013.
I also tried my hands on with SAP Predictive Analysis and was curious to know the difference between SAP Lumira and SAP Predictive Analysis. and below is what I can say about it.
SAP Lumira = Prepare + Share
SAP Predictive Analysis = Prepare + Predict + Share
And hence SAP Predictive Analysis = SAP Lumira + Predict 😉
I hope you enjoyed reading this blog. Please feel free to share your comments! Have a great day and Happy Analyzing 🙂 🙂