Can We Use Machine Learning to Predict Box Office Success?
Machine learning, sentient artificial intelligence, humanoid robotics—all of a sudden these terms don’t feel as strictly ‘sci-fi’ as they once did. Films like Her and Ex Machina offered visions of a digital future that felt almost close enough to touch, in the sense that the very same technology could feasibly be in our own hands soon.
Machine learning in particular has seen strong progress in recent years, with the likes of Google, Amazon and SAP breaking new ground in creating algorithms that learn from data. In the spirit of Toronto International Film Festival (TIFF) and SAP’s Our Digital Future film series, why don’t we get down and dirty with a little machine learning to help us predict the success of a soon-to-be-released movie?
Let’s use La La Land, a comedy drama in which a jazz pianist falls for an aspiring actress in Los Angeles, as our test case. Ahead of the film’s Canadian premiere at TIFF ’16 and its full release in December, how can we determine the biggest factors in how it will perform at the box office?
The answer is in using predictive analytics, an aspect of machine learning that depends greatly on historical data. In today’s world, we can pull historical data about movies from various sources. Some of the key data points for our test include the starring cast, genre, the film’s MPAA rating (in this case PG-13), production budget, country of origin and runtime.
Another factor is the film’s critical reception, both from the media and from movie database users. There are also technical data points such as sound mix, aspect ratio, camera, laboratory, negative format, cinematographic process and printed film format. The target variable is movie revenue. Using the above data, I created a classification model using SAP Predictive Analytics. Here is one of the most crucial outputs of the model—contributing variables:
La La Land is written and directed by Damien Chazelle of Whiplash fame, is of the same genre, has a similar MPAA rating and has some of the same technical data points. With star power being the most important variable, however, we are left with the burning question: Has the director put together the right cast, and was he right to pair Emma Stone with Ryan Gosling?
To find out, I used a technique called social network analysis. I began by scraping data using publicly available Twitter APIs for mentions of #EmmaStone. I then filtered the data to show only male lead actors as part of the hashtag. Below is the graph I created to show the strongest recommendations to play Emma Stone’s love interest.
The width of the line from Emma Stone to Ryan Gosling doesn’t lie—it’s a perfect match. Is it simply that the casting director is a genius, or has he been making use of machine learning himself? Either way, our little glimpse into the world of machine learning has resulted in the prediction of success for La La Land. Now we just have to wait for the film’s release to test this theory out. This article first appeared on SAP BusinessObjects Analytics blog: http://blog-sap.com/analytics/