# Data analysis using R

With the advancement in technology we have seen a never before explosion of data be it terms of its volume, velocity, variety, veracity or value (the 5 V’s). Research shows that the data generated since the year 2009 is way more than the data that was generated since the entire history of mankind. This data deluge presents a sea of opportunities which if leveraged well can prove game changers and help businesses run like never before. This is nothing but the *Big data* challenge which is a buzz words these days.

** R** which is a free software programming language and environment for statistical data analysis and graphics can be used to explore datasets and gain insights. Though I was initially skeptical about being able to comprehend R, I took a few tutorials on R and found it interesting and thought of sharing my learning experience. You can check http://www.r-project.org/ for detailed information on R.

I would like to explain a simple analysis-visualization example for a predefined dataset in R – **UKLungDeaths **{datasets} – Three time series giving the monthly deaths from bronchitis, emphysema and asthma in the UK, 1974–1979. You can check out the dataset at the following link: http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/UKLungDeaths.html

As a prerequisite you would have to download and install ** R**, and then follow the steps:

- Launch RGui, and load the dataset, in our case as it is a predefined dataset it is already loaded into memory.
- You can check what the dataset looks like by typing the following in the R console.

*ldeaths* # dataset of Monthly Deaths from Lung Diseases in the UK – both sexes

*mdeaths* # dataset of Monthly Deaths from Lung Diseases in the UK – Males

*fdeaths* # dataset of Monthly Deaths from Lung Diseases in the UK – Females

3. To plot the visualization, type the following:

*par(mfrow=c(1,3))* #combine multiple plots into one overall graph

*plot(ldeaths, xlab=”Year”, ylab=”Both sexes”, main=”Monthly Deaths from Lung Diseases in the UK – both sexes”)* #plots line chart for monthly death for both sexes

*plot(mdeaths, xlab=”Year”, ylab=”Males”, main=”Monthly Deaths from Lung Diseases in the UK – Males”)* #plots line chart for monthly death for males

*plot(fdeaths, xlab=”Year”, ylab=”Females”, main=”Monthly Deaths from Lung Diseases in the UK – Females”)* #plots line chart for monthly death for females

The plot looks as shown below:

Thus you can see for yourself how simple it is to visualize a dataset in R. With above visualization you can draw various insights like:

- The highest and lowest monthly death was recorded in 1976 (total including both sexes)
- There has been constant fluctuation in the numbers over the months(in various years) but the overall behavior has been consistent
- The trend towards 1979 end shows the numbers to be the decreasing side

This is however a very simple example of data visualization using R.

The R language has a large number of packages which support various statistical methods and functions which can be used for complex scenarios and use-cases.

SAP Hana already has integration with R.

Hope this post encourages you to explore more on R and come up with interesting analysis and visualizations.

Thanks for reading!