This blog post is the second part of my entry for the 2014 Data Geek Challenge.
Part 1: SAP Lumira Extension: Google Maps
Let’s play timo.elliott's Buzzwords Bingo Game: in this blog post, watch out for the following keywords and once you reach 5, shout “Bingo!” from your cubicle. See your prize in the conclusion. Enjoy!
Chicago, City of Big Data
My company, Alta Via Consulting, is headquartered in Chicago. Every time I wind up downtown, I take the opportunity to visit the Chicago Architecture Foundation. Their current exhibition is called “Big Data Chicago” and is definitely worth a look: http://bigdata.architecture.org/
This event is part of a larger initiative by the City of Chicago, embracing the open data movement. One example can be found in the Chicago Transit Authority (CTA) that communicates the real-time position of the buses and trains, publishes all service bulletins and even enables developers to access these data via API. Sounds like an opportunity for geospatial analytics and text analytics with Lumira, right?
More information here:
The business questions
While looking at the available data, one question stuck with me: I see the picture of transit in real-time but how do the events develop over time? How can I find patterns if all I can see is “right-now”? For instance, focusing on delays: is there a correlation with weather? With construction? With time of day? Should some route schedules be changed if they constantly run late?
Data Collection
To answer these questions, what we need is to collect the real-time feed into the cloud on a regular basis and, later, analyze them with Lumira. Here’s the process:
The CTA Bus Tracker API documentation details all the potential entities that are shared. I collected some of them into a MySQL database through a Php script. A WebCron task is called every hour to read the vehicle positions. Then, I created Php scripts to export the data as csv files.
Links:
Technical Limitations
While building my architecture for this project, I had to go around several technical limitations, like:
Conclusion
Overall, this data collection process took way more time than I expected and I’m lucky the Data Geek Challenge deadline has been postponed :smile: . Feel free to use the data from the Costing Geek links but before you publish anything, please refer to the terms and conditions from the CTA Bus Tracker API Documentation. Also, please share you results in the comments so I know sharing was worth it ;o)
Before I forget: if (and only if) you found all the keywords, here’s your prize: http://goo.gl/KmkP6u
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
8 | |
7 | |
5 | |
4 | |
4 | |
4 | |
4 | |
3 | |
3 | |
3 |