I spent two years in Africa and was given the chance to learn how to play rugby in an “international team” – easy as it was the only one across the country 😉 .
Now, 2015 Rugby World Cup is coming in just 46 days!
I was wondering if we can find out the reasons why a player can become a rugby all-star during the World Cup.
From the site here, I extracted the statistics for all the players that ever participated to a world cup.
This represents 2440 players, 50 all-star players but also unsung heroes from Ivory Coast or Portugal.
The value of my target variable is 0 for a normal player, 1 for an all-star player.
(Apologies to my colleague Pierpaolo VEZZOSI for not shortlisting Italian players 😕 )
This becomes a classification problem. I am going to see how the different input variables can help explain and predict the output variable. The full final dataset is attached to the post.
I open SAP Predictive Analytics 2.2, click on Modeler and Create a Classification/Regression Model.
I load the Excel file containing all the players and their stats.
I need to describe the data. Here is the final screen, with all the information filled in (I attached the description file to the post).
I set the target variable and exclude the Player variable at it is not useful for the model.
I check the Compute Decision Tree check box.
The model overview gives me useful information:
- The Predictive Power and Prediction Confidence indicators are both very good.
- 6 variables only were kept in the model.
There is one suspicious variable Matches Won – as this is alone a very strong predictor of the output. SAP Predictive Analytics is warning us about the strong correlation between the input and the output variable.
The more you and your team win World Cup matches, the higher the chances are that you become a rugby legend. Makes sense, right?
Let’s move to the variable contributions.
Some more facts:
- The more matches you start in the field as a main player, the more chances you become a rugby legend. This is true if the player participated to 8, 9 or 10 matches, and even truer if the player did more than 11 matches (which means that he probably participated to several world cups). To be noted: to start the matches, you must be a consistent, strong player.
- “History is written by the victors”. It’s not only about starting the matches but about winning them.
- As an individual player and an all-star you have to bring some decisive points to your team. For exceptional individuals, this would be more than 26 points and could even reach 277 points for one all-star player (quiz time: who is this guy?)
- Self-explanatory, right? What can make you a star is to bring back the Webb Ellis cup (aka “Bill”) home. And if you can do it two times…
- Scoring tries is also key to being recognized an all-star player. Like Jonah Lomu who scored the record number of 15 tries in two World Cup editions.
- Being a captain and leading the team to victory is key to staying in people’s hearts & memories.
I move to the Decision Tree and use the two more important variables in my model.
Here is my interpretation:
- On top, we see the overall player population, including 50 all-star players (2,05% of the overall).
- If the player starts at least 11 matches, the percentage of legend players in the population of 93 individuals is climbing to 29%. 27 out of my 50 greatest fall into this category.
- If I add the criteria “player has won at least 10 matches”, the percentage is climbing to 44% and 22 greatest players are part of this category.
I will keep this first blog post short and leave the rest for the second part (teaser: it’s all about predictions 😉 ).
Thanks for reading! Your comments are welcome!