Kicking Off College Football with Analytics
With the start of College Football around the corner, I thought that I would kick off the season with some football analytics. Unlike my previous baseball app, college football analysis has proven to be much more challenging. For me, the first set of challenges stemmed from the following:
* No Single Source of Data. Unlike Sean Lahman’s baseball database, college football doesn’t seem to have a freely available source of raw and detailed data. Getting access to detailed game-level college football statistics over the course of many years involves going into many different individual websites, compiling and aggregated its data, and then combining it together.
* Data Management Issues. Like most organizations, football has its fair share of data management issues. For example, there’s no single definition of a school’s name. A school like TCU in one source can be ‘Texas Christian University’ in another. Additionally, slowly changing dimensions, such as teams changing conferences, impact conference hierarchies and roll-ups.
* Lack of advanced analytics and insights. Most college football websites provide basic metrics and basic views of the data, such as number overall standings and win/loss records. But they don’t provide any advanced insights on key metrics that drive success for these different teams.
Fortunately, modern analytics tools, like SAP Analytics Cloud, resolve many of these issues.
Who Are the Best Teams?
One source of data that I was able to compile was every school’s all-time records since 1935 – when the first official AP poll was opened. Over this time, we can rank the top 10 teams. The majority of these teams are from the Big Ten and SEC.
Fun Fact #1: One of these teams doesn’t seem to belong. Boise State has the 5th highest winning percentage and is the only team is the top 5 that doesn’t have a long history of success.
Fun Fact #2: Alabama is the only team that is in the Top 5 across all four of these categories – and can probably be deemed the most successful team in College Football history. Interestingly enough, USC and Ohio State have proven to be the most successful at driving player success in the NFL.
Who Are The Best Players?
Since 2000, we can compare the top QBs, RBs, and WRs – based on TDs and Yards-per-Game.
Fun Fact #3: Success in college doesn’t equal success in the NFL. Despite these prolific college statistics, only a small group of these QBs, RBs, and WRs have had success at the NFL level (Jordy Nelson, LaDainian Tomlinson, and Matt Forte).
Why Do Teams Win?
While the history of football is interesting, the game has changed a lot over the past decade – making it more relevant to look at recent statistics. Over the past 8 seasons, we can see the following in the data.
You need to score more than the other team to win
Teams that are the most efficient (yards-per-play on offense and defensive) win.
Turnovers, Time of Possession, and Penalties (while important) matter much less.
Fun Fact #4: If you can manage a differential of 2 yards-per-play (Yards-per-play-gained – yards-per-play-allowed), you will win.
Does Home Field Matter?
The obvious answer is yes, but let’s look at the numbers. Home teams win 58% of the time, score about 2 points more, average 0.4 yards more per play, and there’s a slight difference in turnovers and penalty yards.
However, these statistics are aggregated across all teams. If we can drill into this data, we can see that it’s less about playing on the road and it’s more about the team that you’re playing.
Fun Fact #5: Good teams seem to win wherever they play whereas bad teams just seem to lose wherever they play.
What Does This All Mean?
For me, the analytics told me the following:
(1) College Football has a data problem. Who would have thought that common data challenges like slowly changing dimensions, many-to-many joins, and master data would live in college football. But like most enterprises, everyone has data challenges that need to be resolved.
(2) Old Stats Can Be Invalidated. When I watch football, announcers preach things like win the time-of-possession, win the turnover battle, and minimize penalties. While these are obviously important, offensive and defensive efficiency is far more important. Statistics like completion percentage, yards-per-pass, and yards-per-rush (for and allowed) end up being far more important.
(3) People Need Analytics not Data. While our culture loves college football and there are many sports sites that deliver statistics, they provide very little analytics and insights into this data, like we’re seeing above. This prevents fans from seeing the whole story in the data.
Got an analytics questions or just want to know more?
Follow or message me on LinkedIn or drop a comment below?
Nice blog and fun analytics, Jason!
If I understand it right, you looked at all games through the seasons. Are there any different conclusions if you focus only on late season, playoff, or championship games when teams are more equally skilled?
That's correct. I could break it down by different times, early/late season, recruits, etc. I just built this pretty quickly to show the capabilities with SAC. But all possible. Good question.
wow, interesting information!