# Data analysis using R

With the advancement in technology we have seen a never before explosion of data be it terms of its volume, velocity, variety, veracity or value (the 5 V’s). Research shows that the data generated since the year 2009 is way more than the data that was generated since the entire history of mankind. This data deluge presents a sea of opportunities which if leveraged well can prove game changers and help businesses run like never before. This is nothing but the Big data challenge which is a buzz words these days.

R which is a free software programming language and environment for statistical data analysis and graphics can be used to explore datasets and gain insights. Though I was initially skeptical about being able to comprehend R, I took a few tutorials on R and found it interesting and thought of sharing my learning experience. You can check http://www.r-project.org/ for detailed information on R.

I would like to explain a simple analysis-visualization example for a predefined dataset in R – UKLungDeaths {datasets} – Three time series giving the monthly deaths from bronchitis, emphysema and asthma in the UK, 1974–1979. You can check out the dataset at the following link: http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/UKLungDeaths.html

1. Launch RGui, and load the dataset, in our case as it is a predefined dataset it is already loaded into memory.
2. You can check what the dataset looks like by typing the following in the R console.

ldeaths # dataset of Monthly Deaths from Lung Diseases in the UK – both sexes

mdeaths # dataset of Monthly Deaths from Lung Diseases in the UK – Males

fdeaths # dataset of Monthly Deaths from Lung Diseases in the UK – Females

3. To plot the visualization, type the following:

par(mfrow=c(1,3)) #combine multiple plots into one overall graph

plot(ldeaths, xlab=”Year”, ylab=”Both sexes”, main=”Monthly Deaths from Lung Diseases in the UK – both sexes”) #plots line chart for monthly death for           both sexes

plot(mdeaths, xlab=”Year”, ylab=”Males”, main=”Monthly Deaths from Lung Diseases in the UK – Males”) #plots line chart for monthly death for males

plot(fdeaths, xlab=”Year”, ylab=”Females”, main=”Monthly Deaths from Lung Diseases in the UK – Females”) #plots line chart for monthly death for females

The plot looks as shown below:

Thus you can see for yourself how simple it is to visualize a dataset in R. With above visualization you can draw various insights like:

• The highest and lowest monthly death was recorded in 1976 (total including both sexes)
• There has been constant fluctuation in the numbers over the months(in various years) but the overall behavior has been consistent
• The trend towards 1979 end shows the numbers to be the decreasing side

This is however a very simple example of data visualization using R.

The R language has a large number of packages which support various statistical methods and functions which can be used for complex scenarios and use-cases.

SAP Hana already has integration with R.

Hope this post encourages you to explore more on R and come up with interesting analysis and visualizations.