Skip to Content
Business Trends
Author's profile photo Surya Kunju

Predictive Thursdays: Can We Use Machine Learning to Predict Box Office Success?

by Surya Kunju, Senior Product Expert, Predictive Analytics, CoE

man_merge_robotMachine learning, sentient artificial intelligence, humanoid robotics—all of a sudden these terms don’t feel as strictly ‘sci-fi’ as they once did. Films like Her and Ex Machina offered visions of a digital future that felt almost close enough to touch, in the sense that the very same technology could feasibly be in our own hands soon.

Machine learning in particular has seen strong progress in recent years, with the likes of Google, Amazon and SAP breaking new ground in creating algorithms that learn from data. In the spirit of Toronto International Film Festival (TIFF) and SAP’s Our Digital Future film series, why don’t we get down and dirty with a little machine learning to help us predict the success of a soon-to-be-released movie?

Let’s use La La Land, a comedy drama in which a jazz pianist falls for an aspiring actress in Los Angeles, as our test case. Ahead of the film’s Canadian premiere at TIFF ’16 and its full release in December, how can we determine the biggest factors in how it will perform at the box office?


From Lionsgate Movies and Writer/Director Damien Chazelle, “La La Land” with Ryan Gosling and Emma Stone (Source: )

The answer is in using predictive analytics, an aspect of machine learning that depends greatly on historical data. In today’s world, we can pull historical data about movies from sources like IMDb (an online database for movies)and others. Some of the key data points for our test include the starring cast, genre, the film’s MPAA rating (in this case PG-13), production budget, country of origin and runtime.

 Another factor is the film’s critical reception, both from the media  and from movie database users. There are also technical data points such as sound mix, aspect ratio, camera, laboratory, negative format, cinematographic process and printed film format. The target variable is movie revenue.

 Using the above data, I created a classification model using SAP Predictive Analytics. Here is one of the most crucial outputs of the model—contributing variables:


La La Land is written and directed by Damien Chazelle of Whiplash fame, is of the same genre, has a similar MPAA rating and has some of the same technical data points. With star power being the most important variable, however, we are left with the burning question: Has the director put together the right cast, and was he right to pair Emma Stone with Ryan Gosling?

To find out, I used a technique called social network analysis. I began by scraping data using publicly available Twitter APIs for mentions of #EmmaStone. I then filtered the data to show only male lead actors as part of the hashtag. Below is the graph I created to show the strongest recommendations to play Emma Stone’s love interest.

lalaland_Emma_Stone_costarThe width of the line from Emma Stone to Ryan Gosling doesn’t lie—it’s a perfect match. Is it simply that the casting director is a genius, or has he been making use of machine learning himself? Either way, our little glimpse into the world of machine learning has resulted in the prediction of success for La La Land. Now we just have to wait for the film’s release to test this theory out.

Learn More

Assigned Tags

      1 Comment
      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member

      You are aware that Emma Stone and Ryan Gosling previously starred on Crazy Stupid Love, and that is probably generating the data volume via twitter? Analytics is also about interpreting the outcomes...