Discovering patterns of “Melbourne Pedestrian Volume” using Predictive Analytics
Written By: Shivali Aggarwal – https://au.linkedin.com/in/shelly-aggarwal-558b86a7 and Ankita Mehani – https://www.linkedin.com/in/ankita-mehani-412bb62b
We would like to start with a quote-
“The holy grail of marketing is to proactively pounce upon every individual customer opportunity by predicting beforehand who will respond and to preemptively intervene each customer loss by predicting who will defect” – Dr. Eric Siegel.
We are very fortunate to be Graduate students of Victoria University where we are supervised by very intellectual and hard-working mentors. One of them is our lecturer, Shah Miah who made us to write this blog as part of our course assessment which has helped us to build our skills to another level.
According to Albert Einstein – “We can’t solve problems by using the same kind of thinking we used when we created them”- it means we need to think beyond and for future to have an idea of the consequences we are about to get.
This blog is about predictive analysis and how it can be used by various companies to achieve better results.
Predictive analysis has been around for decades but its demand rose tremendously in the past few years. More and more organizations are turning to predictive analytics to go beyond knowing what has happened and provide a better assessment of what will happen in the future.
In order to understand the importance and benefits of predictive analysis it is important to understand what predictive analysis actually is?
Predictive analysis is an advanced analysis, which is used to make predictions about unknown future events. It uses many techniques, for example: data mining, statistics, modelling, machine learning, and artificial intelligence to analyse current or historical data to make predictions about future. There are 3 types of analysis first one is Descriptive analysis that describes relationships between current and historical data. Second one is Predictive analysis, which performs various techniques such as data mining, statistics modelling, machine learning and Artificial intelligence on current and historical data to predict about future. At last the third one is Prescriptive analysis, which goes way beyond the Predictive analysis. It focuses on full workforce optimization and controlling the future events.
Predictive analysis has proven itself very useful in many of the organization’s biggest concerns such as improving operations, reducing risks, optimizing marketing campaigns, detecting fraud and the most important one, increased revenue. In short it can be said that, predictive analysis is a key for competitive advantage.
The biggest strength of predictive analysis is that it can be used by any industry to understand their customers, business value, and the competitive market. Predictive analysis relies highly on the current and historical data and has three aspects regarding the same. Data exploration is the first step of data analysis, which means summarizing the main characteristics of the dataset. The second step is Data manipulation, which means rearranging, resorting or changing the appearance of the data without changing it fundamentally. And last one is data representation, which means communicating the identified information using various tools such as diagrams, distribution charts, histograms and graphs.
We have used an Example to show how Predictive Analytics works using R language – In this example we are assuming a scenario where a Government department along with a construction company wants to decide if they should make a bridge because of the increasing pedestrian population. We are using an open dataset called “Melbourne pedestrian volume” in order to understand patterns between the data using R language.
This decision requires two main questions to be answered, which are –
- a) If the bridge should be constructed at all? and
- b) If yes, when would be the right time to start the construction so that it does not interfere with the pedestrians and vice versa?
This is the structure of the dataset we are going to use. It contains Date, Year, Month, Day, Time and Sensor name which actually means by “Location” and Hourly counts from January 2012 to December 2015. As you can see it is very difficult to identify patterns in this data, as this is a very huge dataset with over 1.5 million rows.
So we imported this dataset into R-studio to analyse it using R. Since the dataset contained pedestrian volume for each hour for all the years we had to aggregate it to find the total volume by various dimensions such as Year, Month, Day and hour.
After aggregating the data firstly it was plotted against years using inbuilt function GGplot in R to understand the pattern in the dataset by year. This will answer the question If Melbourne pedestrian traffic is going to increase in the coming years and if a bridge is required or not.
As it shows, Number of pedestrians in 2012 was around 100 million, which increased roughly to 120 million in 2013, I70 million in 2014 and 210 million in 2015. These numbers are increasing every year, which makes it a pattern and can lead to the initial prediction that it will increase more in coming years and a bridge is required, but it would definitely require further drill down before making any decisions.
Next we plotted it against months, which will give us a clear idea which months are the busiest and can help answer the question, if a bridge is required what the right time to start the construction is.
As you can see, January, February and March are the busiest months compared to the others whereas October November and December are debatable. May, June, July and August are the least busy months, which can be a good time to start the construction of the bridge.
Even though May to August has the least pedestrian traffic comparatively, they are still in millions. So we plotted it against the days of week, which will provide us with the information regarding busiest and least busy weekdays.
As per this chart Friday is busiest day of the week, and least busy days are Saturday, Sunday and Monday. So these days can be scheduled for the construction.
For further drill down, we plotted it against the time. This will provide us with the suitable time for the construction.
This chart here shows that it is usually busy almost all day from 7 am to 11 pm which means it would be easier to plan the construction work at night from 11 pm to 7 am.
This was the example of the basic analysis using R, which provided us with the answers to previously mentioned questions that,
Yes, Melbourne requires a Bridge.
It would be easier to construct it from May to august
Saturday, Sunday and Mondays can be efficient days for the construction
And at last, Night time would be suitable for the construction.
This dataset can be further analysed based on location according to the requirements. Some more related datasets can be combined with this dataset to increase the level of information. This example showed the basic type of analysis that can be provided by Predictive analysis, which is capable of doing much more in real time.
This blog shows our work – how predictive analysis can be used by Construction Company to understand the environment before making decisions and planning. It can be used in any industry small or large, to understand, plan and make better decisions which would lead to competitive advantage and the ultimate goal, increase in revenue and sales growth.
Thanks for Reading.