Big data and the NFL
The National Football League has officially hit pre-season. Around the US, fantasy football drafts are being assembled. Personally, I think the rise in fantasy football leagues has been a big driver in the amount and availability of statistics for professional football. But I digress.
Did you see this article over on TechCrunch? http://techcrunch.com/2013/08/04/how-data-changes-preconceptions-about-nfl-football-the-weather-and-the-parallel-universe/ Author Alex Williams was trying to evaluate the impact of inclement weather on the outcome of NFL games. He analyzed data from “471,392 plays in 2,898 games played since 2002.” And, he brought in climate data as well from the Climate Data Center. You can check out the article to see the process that they went through. (I like to think the data never lies, but it’s hard to believe the Frozen Tundra of Lambeau Field doesn’t amount to more competitive advantage in late December.)
I love this example of using historical data and seemingly unrelated data sets to posit some correlations. Imagine what you could do if you extended the data captured? You could answer questions like the following:
- Do stadiums with retractable roofs completely alleviate any weather advantage/disadvantage?
- What kinds of concessions historically track higher or lower depending on both the weather outside and the temp in the stadium?
- How does the weather impact the degree of no-shows for the tickets? Are scalping prices impacted?
- Are there significant differences in injury reports during inclement weather in the last 15 years (benefiting from modern equipment) than the previous 15 years? Or is there simply a higher quantity of injuries being reported, due to changes in league rules?
- Do specific coaches prepare better for inclement weather? What are the cumulative records as they move from team to team, stadium to stadium?
There are a million more interesting questions to ask. And not just about football. Every company can rattle off questions like this about interesting correlations where they are not sure the exact impact. The only way to answer these questions is data. Data, data, data.
Not only do you need reliable access to the data, you also need to model it. You need to store it in a structure that can support massive, ad-doc query by your data scientists. That’s just the beginning. You also need to clean and transform the data from all of these different sources into relateable chunks of data. (What does retractable mean? Is it the same in every case, etc?)
Luckily, we have some technology to help you with that. Check out SAP HANA, SAP Data Services, and SAP Lumira for starters. Then, perhaps, do some cool things with big data and enter the Data Geek challenge!