Data Geek III – The universities continue
It took me some time to do another analysis with Lumira and to get my hands on predictive analysis. Based on my last post Data Geek III – Where should I send my children (if I had any)?, I’ll continue writing about and analyzing universities. This time, also based on the comment of Jelena Perfiljeva, I’ll look more at the reputation and different scores the universities got during this year’s ranking. The data is obtained from QS Top Universities that prove a worldwide ranking of universities based on different aspects. I assumed that this data might also be a good source for predicting values and clustering. The rankings can be found here: http://www.topuniversities.com/
The dataset I used is located here: http://www.iu.qs.com/product-category/wur/2014-wur/
All sets here: http://www.iu.qs.com/product-category/wur/
So let’s start with analysis.
For the preparation, I started by creating a geographic dimension for the country:
In addition, I created groups for AGE, FOCUS, RESEARCH and SIZE, just to make easier to understand, according to the descriptions provided by QS:
Furthermore, I created one custom calculated measure to have the improvement of ranks compared to the year before:
I first started to get an overview about the 20 best ranked Universities and comparing the different rankings with the total one. QS provides individual rankings for academic reputation, citations per faculty, employer reputation, faculty student, international faculty and international students.
In addition, I added the countries to have a look at the top rated countries. What is interesting here is the “huge” difference that universities show in the different rankings.
Taking the MIT for example, it is the best ranked University worldwide this year and in 2013. Nevertheless, if we look at the international student rankings, the University College London and the University in Lausanne for example are much better. Also comparing the academic rank, Cambridge is the better one. Nevertheless does the overall score computation lead to rank 1 for MIT. Looking at the countries, there are 2 UK universities among the top 3, 4 among the top 6 and 4 among the top 10.
Assuming that the overall score leads to better ranks the higher the score is, we could assume from the following picture that most top universities are based in the USA and UK:
A possible error here might be the number of institutions per country. If we have 100 times as much universities in the UK, but all with more low score, we might still score a high amount. So everything is up to the interpretation of data and diagrams 🙂
China, Italy and Taiwan did lose some ranks.
Breaking down this more and filtering on the USA, we see that especially the Washington University in St. Louis and the University of California , Davis gained most.
Interestingly to see that the UCLA, UCSD and NYU did lose some ranks.
In the UK, we have the Universities of London and Kent gaining most:
My alma mater, the University of Liverpool did lose quite some ranks:
But being on rank 123 worldwide still is not that bad:
It’s the IS rank I think that increases the score, compared to the MIT, when thinking about the IS rank, the UoL is quite close.
When I thought about the UoL here, I decided to make some analysis between the Russel group (20 universities, 18 of them among the top 20 in the UK) and the Ivy League (some of the top universities in the US).
The overall score of the Ivy League is quite high, besides two institutions (Brown and Dartmouth), the ranks are below 20, namely the ones that score below 80:
Looking at the UK, the ranking and scores are much more spread:
So I added a running average
Where my alma mater (the UoL) scored quite below the average or 79. But looking at the international students, it scored quite above the average:
I tried out predictive analysis with this dataset and started by predicting ranks with the InfiniteInsightRegression algorithm:
I did deselect overall ranking, country or institution, just to not influence the result from those values. Running the analysis brought a confidence of .99, so much above .95:
Looking at the variables that the prediction used, the overall score is most affecting the ranking:
Looking at the grid, based on the data, Harvard is seen as the second university. This can be explained that it has two 100 scores for the individual ranks, as well as the second ranked university, but is higher in this individual ranking.
This one can be compared to the one above, but did analyse the overall score:
The confidence here is almost the same
And the individual scores are main contributors to this value:
The data grid shows that there is not much difference between the actual and the predicted overall score:
The difference can be explained as above, lying in the ranks that are considered.
More interesting is clustering the data here, in my opinion.
Having a diagram with the overall score and the rank, the clusters are well spread
More interesting here is the spread or frequency of different sub-scores among the clusters. Looking at the grid and also the above diagram shows that the top ranked ones are in cluster 2, the minor ones in cluster 6.
Looking at those features, one can see that the academic reputation is much higher in the “high” cluster than in “lower” ones:
The scatter matrix here also shows some interesting grouping when comparing the AR rank, score and the overall score:
So much about university analysis and predictive analysis. I just went on trying this new feature and it seems to be quite interesting. I have some survey data left from my master thesis; maybe I could use this for trying more in predictive analysis.