Data geek entry – Analyze crop production in India using SAP Lumira
Even though India has made huge progress in various sectors like industries and services in the past few decades, Agriculture still remains an important part of Indian economy. It’s still one of the chief contributors to the Indian GDP. So being an Indian, I decide to analyze few aspects related to this sector 🙂 . In this blog, i have tried to analyze the crop production and the crop covered area across various states and districts in India since the year 1998 to 2010 with the help of SAP Lumira.
Before going further, i will like to define the 2 terms i am going to use extensively in this blog:
i) Production : It refers to crop production in tonnes.
ii) Area : It refers to crop covered area in hectares i.e. the area in hectares where a crop is grown.
Production helps us in identifying the yield and Area helps in identifying the crop growing pattern.
I had the following questions in my mind which drove me to analyzing this data:
1. What is the trend of crop production and crop covered area in India? Are they increasing every year? Are they always in line with each other?
2. Which is the crop that has highest production in India?
3. Which state contributes the highest to crop production in India?
4. Which crop has the maximum covered area?
5. What is the impact of season and region on crop production?
SAP Lumira helped me to find answers to all these questions with ease and in quick time. For my analysis, I downloaded the publicly available data “District-wise, season-wise crop production statistics from 1998” from the site http://data.gov.in/ . The dataset contained Area and Production data corresponding to different seasons, districts and states for years starting from 1998 to 2010 as shown below:
My first concern on previewing this data was the number of records in file – 1,62,092 records. I was not sure how much time will Lumira take to acquire these records. But when i clicked on Acquire button, the entire data was acquired within few seconds 🙂 So, now i was all set to find answers to my questions.
1. Trend of crop production and covered area:
I created measures for Production and Area for analyzing this. I placed these measures on the 2 Y-axis and year on X-axis in a ‘2 Y-axis line chart’. I got the below visualization for this:
I could easily identify from the chart that the conclude that the crop production and crop covered area did not had any relation with time i.e. they did not increase or decrease with every year. Also, for all years other than 2009, the crop production and crop covered area were inline with each other i.e. either both increased or both decreased during an year.
So, i was curious to know why was it that in 2009, the production didn’t decrease despite decrease in covered area? I decided to compare the crop production and covered area for years 2008 and 2009 across different states of India to find the reason for this. I plotted a ‘column chart’ having Production and Area as measures for 2 Y-axis, state as X-axis and year as the Legend Color for analyzing this. Below is the chart output:
3 things can be easily inferred from this chart:
i) Even though the covered area in Andhra Pradesh decreased in 2009 compared to 2008, the production increased there.
ii) The covered area in Bihar decreased drastically in 2009 to almost nil and the production in Bihar wasn’t significant in 2008 also.
iii) For all other states, there wasn’t any significant change in production and covered area.
Due to these 3 factors, I could understand why the production in 2009 did not decrease much despite the decrease in covered area.
2. Crops that has highest production and covered area in India
To determine this, i plotted a ‘Tree Map’ with measure Production as the Area Weight and Area as the Area Color and attribute Crop as the Area Name. It generated the following visualization from which i could easily conclude that ‘Sugercane’ was the most produced crop in India followed by coconut, rice and wheat & ‘Rice’ was the crop with most covered area followed by wheat, bajra and cotton.
Further, i could find the 10 states which had contributed the most to the production of Sugarcane in the last 5 years and their production with the help of ‘3D Column chart’ as below:
For creating this visualization, i used a filter on crop and ranked values of Production measure as shown in the visualization. Similarly, i could view the 10 states which had grown ‘rice’ the most and the covered area in these states where ‘rice’ was grown.
3. Which state contributes the highest to crop production in India?
Now, I wanted to find out which state contributes the most to crop contribution in India. For this, i did not want to check the data for all years from 1998 to 2010. I was more keen to find the contribution in recent years say, for last 5 years. I could determine it very easily by creating a ‘Tag Cloud’ visualization which clearly indicted that Andhra Pradesh was, by far, the state with most crop production (as it has the maximum word weight) for years 2006 to 2010.
For creating this visualization, i used the Production measure as the word weight, state as the Word and added year as the filter to limit the data for the 5 years.
Now, the previous visualization showed me which state has the most production. But if I wanted to see year-wise production in all the states(for last 5 years), I can do it easily by using the ‘Geo Pie chart’ as below:
I created a geographic hierarchy by name for attribute State and used it as the geography and year as the overlay data to create this visualization.
Then , i wanted to check crop production in Andhra Pradesh in the last 5 years. I create a ‘Pie with variable slice depth’ as the visualization to view the details:
It was clear from this that the majority of the crop production in Andhra Pradesh is ‘Coconut’ and ‘Rice’ is the crop which has maximum covered area.
Also. i can view the relative contribution to production by different districts in Andhra Pradesh and the crops grown in these districts with the help of the ‘Heat Map’ visualization as shown below:
4. Impact of season and region on crop production
To analyze this, i plotted an ‘area chart’ with Production as Y-axis, State as X-axis and season as the Legend Color:
Looking at this visualization, i could easily deduce that:
i) The Kharif season is the most productive season.
ii) Andhra Pradesh, uttar pradesh, tamil nadu, maharashtra, karnataka and gujarat are the states that are productive for the whole year.
iii) Most of the states does not have notable production in winter, summer and autumn seasons.
iv) Delhi, Goa, Jammu & Kashmir, Meghalaya, Mizoram, Nagaland and Chandigarh are not very conducive states for crop production.
Now, since Kharif is the most productive season of the year, let us see which 10 districts has the most production in this season and which crops are produced during this time. I can do that pretty easily by drawing a column chart with District and Crop as the 2 columns on X-axis as shown below:
Also, i could view the crop covered area of different states with the help of ‘Geographic Bubble Chart’ as shown below:
This visualization showed me the relative crop covered area for all the states of India.
Thus, with the help of various charts/patterns provided by SAP Lumira, i was able to analyze the crop production in India and find answers to all the questions i had related to it 🙂
I hope that you like this blog and I am looking forward for your comments and suggestions..