Data Geek III – Analysis of Delays on Chicago Buses
This blog post is my third and final part of my entry for the 2014 Data Geek Challenge.
As described in my previous blog post about the Bus Tracked Dataset (step 2), the City of Chicago in general and the Chicago Transit Authority in particular have made major efforts to leverage IT and deliver a better experience for their customers. In addition, these data are available for developers like us. Let’s analyze bus delays in the city of Chicago (the final infographic is available as an attachment).
100% of buses are on-time!
I have collected information from the position and status (delayed / on time) of buses in Chicago over the month of October 2014 (see step 2). My data collection system wasn’t really reliable, so the data is pretty spotty. To sum it up, out of 4,261 checks, only 36 buses were delayed. Believe it or not, this represents only 0.85, which means that on average 100% of the buses were on time!
Which routes? What day?
Drilling-down into the available information, we need to figure out if these were exceptions or if any outlier could be found. If you select all the delayed vehicules, group them by route and visualize them in a bar chart, you’ll notice that there are no real outliers. This would have potentially helped identify a troublesome intersection or district, but it’s not the case here.
However, if you reproduce the same analysis by day, the result clearly identifies Sunday as a sour spot. Looking a little closer, we can see that only 3 out of 7 days are showing. In this particular instance, I believe the reason is that the source data is inconsistent. Since I worked mostly on Sundays on the data source, most data were collected on this day, which doesn’t mean most delays were on week-ends.
I leveraged the Google Maps Custom Extension detailed in step 1 to identify the location of incidents. There’s an old saying in the windy city: “There are only 2 seasons: snow and construction”. Therefore, I searched for pockets of delays that could be explained by local events. Again, as you can see here, there are no clear clusters of delays in the data I collected.
Since I couldn’t find a clear pattern either in the routes, or in the days, or in the location of the delays, I tried to perform a text analysis using the service bulletins. There is no measurable correlation between delays in bus transit and the events described in the bulletins, but we can easily see here that “Reroute” and “Bus Stop Relocation” are the events with most impact.
Conclusion and Learning Points
Overall, this data visualization exercise didn’t reveal unexpected insights, except for the fact that buses in Chicago are almost always on time. That’s a pretty good news.
On the other hand, I believe the idea of the Data Geek Challenge is less in the insights than in the learning process, which was my biggest achievement this year. Here are some of my learning points:
- In every data analytic projects, data is king. In this particular case, I was able to identify holes when drilling by day, but was limited by the data quality,
- The best feature of Lumira in this situation is the ability to simply update the data and refresh the analysis. I will improve my data collection system and rerun the analysis next month and maybe I will be able to identify something else,
- Custom visualizations are very powerful on the desktop, but cannot be shared with the Lumira Cloud. They can however be shared within an organization by leveraging the Lumira Server,
- It was my first time using pictograms on the Compose section of Lumira. I was surprised to discover it needs SVG files. You can either download data sets like SVG Map Icons or create your own with a good old notepad (like I did) or use a specialized graphics tool like Inkscape
- I am by no means a designer, so I used a great trick I learned from Mico Yuk ‘s webinars: check some design websites and find inspiration or buy a theme or template. My favorite is GraphicRiver
- The Lumira Team is amazing and was always there to help me when I ran into some bumps, especially with the custom extension.
I would encourage everybody to give the next Data Geek Challenge a try. It’s a great way to have fun while learning.