DataGeek III Challenge: Bicycle Trend Analysis in Vancouver | By: Ren Horikiri & Trevor Duong
Preface:
Vancouver is targeting to become the greenest city in the world by 2020 so naturally we wanted to determine how that goal is progressing in one of the most visible elements, traffic. It is widely known that cars are a major source of pollution in the world, whereas the bicycle generates no pollution (except maybe a sweaty shirt or two).
As a recently converted bicyclist, I wanted to know if the bicycle usage year over year is increasing significantly or staying relatively flat. Another question of mine is whether weather has any effect on the usage of bicycles across the year because I myself do not bike in weather less than ~10 degrees Celsius.
Staring at an endless table full of data is not the best way to determine trends, thus we turn to Lumira to solve the questions asked above.
Finding the Right Data:
The most critical element of a successful Lumira analysis is having a reliable source of good, clean data to work with. Firstly, we needed a statistic to measure the trends of bicycle usage and the best source for this was the City of Vancouver’s statistics on separated bike lanes found here. There are four bike lanes in total which the City of Vancouver installed devices to count the number of bicycle users and this would prove to be almost perfect for our analysis.
The second important set of data is the weather data, more specifically the temperature averages for each month in Vancouver for the past three years. Through a quick Google search, we were able to uncover accurate temperature data provided by Environment Canada found here. Temperature was chosen since there is an uptrend and downtrend through the year, whereas measurements such as rain are more sporadic and harder to capture accurately.
Preparing the Data:
One of the more tedious tasks of analyzing the data was cleaning the data to a point where it could be pulled into Lumira. In regards to the bike lane data, we needed to convert the PDF file provided by the City of Vancouver, into XLSX using Adobe Acrobat. The next step was to clean the date by deleting extra text, converting the month field to numerical month of the year (1,2,3… so that it would show in the right order), and ensured that each bicycle lane count was mapped to the correct month.
PDF file provided by the City of Vancouver on bike lane usage:
Cleaned bike lane usage data and average temperature:
Determine the Best Chart to Use:
Since there were multiple bike lanes, we wanted to show each bike lane as a distinct object so we could compare the trends in each. The ability to try different chart types with just a click of the mouse was very rewarding, as we could see and compare which chart gave us the best view. In the end, we agreed that the line chart was the easiest to compare bike lane usage by month. To incorporate temperature data, we chose a “Line Chart with 2 Y-axes” to plot the bicycle lane usage counts and average temperatures.
2 Y-Axes showing bike lane usage and average temperature:
Challenges:
The need to rely on Excel was higher than expected so our hope in the future would be that Lumira could take a larger role in cleansing and organizing the data prior to analysis.
Another challenge was with the Line Chart with 2 Y-Axes, the line colors for the bike lane usage counts no longer showed as distinct but instead it was shades of blue.
1 Y-Axes showing only bike lane usage (with distinct colors):
Next Steps:
First challenge is to gather Vancouver SAP employee data on how many people use the bike room per month over a year period and then compare it to the Vancouver biking community.
The second is to challenge the City of Vancouver to take multiple bike usage counts along each of the bike lanes. With this data, the physical bike lane can be mashed up with the local map and overlay the bike lane usage per month onto it.
In either challenge we expect Lumira visualization to help convey the message that biking is good for you.