Who Could Be the Top Rugby World Cup Players in 2015?
This is the fist part of a three-blog story. Part 2 is here. Part 3 is there.
I spent two years in Africa and was given the chance to learn how to play rugby in an “international team” – easy as it was the only one across the country 😉 .
Now, 2015 Rugby World Cup is coming in just 46 days!
I was wondering if we can find out the reasons why a player can become a rugby all-star during the World Cup.
From the site here, I extracted the statistics for all the players that ever participated to a world cup.
This represents 2440 players, 50 all-star players but also unsung heroes from Ivory Coast or Portugal.
The value of my target variable is 0 for a normal player, 1 for an all-star player.
My 50 all-star players originates from 9 countries, basically the Six Nations (Italy set apart) and the 4 nations.
(Apologies to my colleague Pierpaolo VEZZOSI for not shortlisting Italian players 😕 )
This becomes a classification problem. I am going to see how the different input variables can help explain and predict the output variable. The full final dataset is attached to the post.
I open SAP Predictive Analytics 2.2, click on Modeler and Create a Classification/Regression Model.
I load the Excel file containing all the players and their stats.
I need to describe the data. Here is the final screen, with all the information filled in (I attached the description file to the post).
I set the target variable and exclude the Player variable at it is not useful for the model.
I check the Compute Decision Tree check box.
The model overview gives me useful information:
- The Predictive Power and Prediction Confidence indicators are both very good.
- 6 variables only were kept in the model.
There is one suspicious variable Matches Won – as this is alone a very strong predictor of the output. SAP Predictive Analytics is warning us about the strong correlation between the input and the output variable.
The more you and your team win World Cup matches, the higher the chances are that you become a rugby legend. Makes sense, right?
Let’s move to the variable contributions.
Some more facts:
- The more matches you start in the field as a main player, the more chances you become a rugby legend. This is true if the player participated to 8, 9 or 10 matches, and even truer if the player did more than 11 matches (which means that he probably participated to several world cups). To be noted: to start the matches, you must be a consistent, strong player.
- “History is written by the victors”. It’s not only about starting the matches but about winning them.
- As an individual player and an all-star you have to bring some decisive points to your team. For exceptional individuals, this would be more than 26 points and could even reach 277 points for one all-star player (quiz time: who is this guy?)
- Self-explanatory, right? What can make you a star is to bring back the Webb Ellis cup (aka “Bill”) home. And if you can do it two times…
- Scoring tries is also key to being recognized an all-star player. Like Jonah Lomu who scored the record number of 15 tries in two World Cup editions.
- Being a captain and leading the team to victory is key to staying in people’s hearts & memories.
I move to the Decision Tree and use the two more important variables in my model.
Here is my interpretation:
- On top, we see the overall player population, including 50 all-star players (2,05% of the overall).
- If the player starts at least 11 matches, the percentage of legend players in the population of 93 individuals is climbing to 29%. 27 out of my 50 greatest fall into this category.
- If I add the criteria “player has won at least 10 matches”, the percentage is climbing to 44% and 22 greatest players are part of this category.
I will keep this first blog post short and leave the rest for the second part (teaser: it’s all about predictions 😉 ).
Thanks for reading! Your comments are welcome!
Very nice article! That's the only reason why you are forgiven for not adding Italian players 😉 (but we will review this after the World Cup)
Thanks Pierpaolo. My #1 pick for Italy would be Sergio Parisse, just type "best italian rugby player" in Google and you will see all its information.
The RWC is an example of where stats don't tell the whole story; for example, the first one was run in Australia and New Zealand in 1987 (won by the good guys, of course), but in Rugby terms the most notable thing was the absence of South Africa, due to the international sporting sanctions then in place. In fact, they also missed the 1991 World Cup, so South Africa (and South African players) have only played at four of the six Rugby World Cup tournaments.
Another issue is the selection of the statistics used, both the time frame (the RWC is "only" 2 months of a four cycle), and the types of statistics that appear to Front Row forwards (think Line backer) against Winger (think Wide Receiver). As a rabid All Blacks supporter, even though I live in Australia, I do have to note that their results in the World Cup very rarely match the expectations raised by the rugby played in the 46 months of the four year cycle.
And of course, there is the real reason for sporting statistics...
Even though there are specialist positions, (Rugby is justly famous for having a position for any body shape), overall, Rugby is a team game, so however great you are as a player, if you play for, say Italy, who have never progressed past the pool round, you would be hard pressed to make a claim as THE greatest player using the methodology. You need to include a statistic that measures the quintessential italianness of people like Sergio Parisse 😛
PS One of my favorite rugby videos, starring a teacher from my son's school.
Thanks for your comment, for retracing some of the RWC history and for sharing this impressive video!
Here are two videos I like, the first one to go back to some RWC finest moments, the second one just because I am French 🙂
Funnily, the RWC 1987 final you mention in your answer involves 8 of my all-time greatest players, 5 being from New-Zealand, 3 being from France.
As for statistics, here are some assumptions that I made when building this example:
- The players have to be present during at least one RWC edition. Bad luck for South African that was not part of RWC 1987/1991.
- The selection of greatest players is done according to my tastes (even if I managed to keep it strict and select only 7 French players out of 50 😉 ) . I think you are right that my perception of Sergio Parisse (and Italian players in general) is influenced by the overall performance of their team. The same might be true for Fiji or other teams that are not part of the major rugby nations.I integrated a country variable though inside the dataset, interesting to see it's not coming out in the final model as a relevant one.
- No information about the positions was integrated in the dataset mostly because I was missing them from the dataset source and in just one day to work, I would have had a hard time integrating this information by-hand on the 2440 rows. But this could be a nice addition.
Thanks again, so what are your predictions:
- for the team that will be winning the RWC this year? (All Blacks, right?)
- for the greatest players that will reveal or confirm this year?
(BTW, it seems the official Twitter for RWC is asking to vote in order build a "RWC dream team". See https://twitter.com/rugbyworldcup)
Interesting Analysis Antoine