To Be or not to Be…a Rugby World Cup 2015 Star!
This is the second part of my blog related to predictive & rugby.
In part 1, I explained the data preparation, the creation of the predictive model and the findings. Part 3 details the application of the predictive model to new data (RWC 2015 players).
The tragedy of modern rugby players is that despite their amazing performances, few of them will be remembered on the long-run.
Didier MAZOUE pointed my attention on the fact that a performance data set is not capturing 100% of the reasons why certain players become legends.
Call it charisma, leadership, fair-play, beauty or sometimes evil character traits (this is rugby after all!), but its participates to our collective memories.
Sebastien “Caveman” Chabal is a retired French player that was featured a lot in the sport medias over the past years.
I will first test my model on that player, to see if its fame is coming from its performance or its from its “beardy” look 😉
I fill the 6 variables that are part of the model, using the data I collected.
The score I get is quite low (0,2 out of 1), which means that Mr. Chabal’s popularity will probably never exceed France’s boundaries in the coming decades.
Now the fun part starts – trying to predict the future 😎
I am going to place a bet on Jonathan Sexton becoming an all-star of the next world cup, so as to please my Dublin colleagues 😉 and also because I feel he deserves it!
Let’s back this up with some degree of rationale:
- Jonathan Sexton was nominated for the player of the year in 2014. The guy is young and talented.
- Ireland is the third nation in the world as per the IRB rankings.
- Google and the Irish Times seems to agree with me (oh, that would be a tie with Paul O’Connell, right?)
Let’s be positive and say Ireland will reach the semi-finals (I don’t want to know what that would mean for France 🙁 ).
If such is the case, Jonathan Sexton will play 6 additional matches, win 5 of them and unfortunately lose the 6th one.
Let’s pretend he will score an average 8 points per match, that means a total of 48 points.
Here are its current RWC stats:
- 2 matches started in the field
- 21 points scored
- 4 matches won (with the team).
- He never scored a try during a Rugby World Cup.
- He is not a captain and will not be this year – this goes to Paul O’Connell
- He never won the Rugby World Cup.
Jonathan Sexton’s stats after the World Cup would be
- 8 matches started in the field
- 69 points scored (in the two RWC editions)
- 9 matches won (with the team).
- Let’s assume he would do 3 tries.
- He is not a captain and will not be this year .
- Ireland (and him) would not win the Rugby World Cup (in that scenario)
There is slightly one chance out of two that Jonathan Sexton will become one of my greatest players of all time for the rugby world cup.
Let’s now unleash fantasy and explore different scenarios.
- Two time more points per game. It means Mr Sexton would reach 117 points in total, 16 points per game. The variation of the score is very little (0,53).
- Let’s say he is also scoring two time more tries. The score becomes better (0,58) but not there yet.
- Total Fantasy Scenario (or…could that be 😕 ? that’s the glorious uncertainty in sports). Ireland is beating England and New Zealand and becoming the world champion for the next four years. Mr Sexton is a key artisan of the victory with 15 points per game and 6 tries overall. Go Ireland! After such a successful campaign, I will certainly do some room for Jonathan Sexton in my personal rugby hall of fame!
Thanks for reading this blog post, I hope you have enjoyed it and please comment on who you think will be the next RWC 2015 rock-stars 😎
I wish all of you a nice summer break, see you on SCN in September!
Antoine
Hi Antoine
This is an interesting analysis. I am not much knowledgeable about Rugby and a newbie to PA as well; so be patient if I make a silly statement. I would try to compare this with sports I play. I read your both parts. I feel your prediction is little influenced by number of victories and that is a fair assumption. However, how would you tackle the high potential candidates - I mean candidates who could not feature in a lot of victories(but losses) but they are as skillful as the top-rated players are.
Another interesting aspect would be the expected dip in the performance (loss of form, etc.) which is kind of impacted by age, ability to handle fame, competition and technology. The prediction could be incorrect if we see the performance trend - which could be excellent leading us to believe that the player is going to be phenomenal in the upcoming tournament. What is being missed out is the point of inflection.
Great work though!
Hello Gajendra,
you make a very good point here. The analysis perfomed by Antoine could provide a list of 'good players' only looking in the space of variables he has chosen. If we had more variables to the original dataset (something about potential, aging or other) the model could improve because it can rely on additional information which, maybe, influences more the making of a 'good player'. Moreover, apart from simple variables you might think of calculated variables (aka derived attributes) which could influence the result (e.g. think of a the [played matches this year-played macthes last year] which could show if the player is rising or going downhill.
Usually before building a model and choosing a dataset you need to discuss with business people who can tell you what could be the variables and possible calculations to take into account.
If we now come to SAP Predictive Analytics there is some good news:
1. There is a specific module called "Data Manager" which enables you to create derived attributes very easily. In a few clicks you could create hundreds of calculations on the existing variables.
2. In PA Automated Analytics the more variables you use, the better the model could be without raising complexity in the debriefing and cost in the preparation. This is not true for all predictive analytics tools in the market. Most tools require you to use the smallest possible number of variables as an input because you need to correctly prepare them first and you have to debrief the model afterwards (and if you have many variables it could be complex to do). WIth Automated Analytics the data preparation phase is mostly automated so you don't spend too much time on it and in the learning algorithms the tool automatically finds the most important variables in your dataset. You could start with 1000 variables and PA, after building the model might tell you that the very important ones are only 20.
Anyway, you made a very good consideration. Welcome to the world of SAP Predictive Analytics
Thanks for your valuable comment Gajendra and thank you for Pierpaolo for the reply!
Great start for Ireland against Canada... And for Jonathan Sexton as well! Go Ireland, let us see what France will do tonight !
<Teaser> Part 3 coming soon <Teaser>