# Data Geek III – Analyzing Games Of Thrones Data for the GoT Challenge

Note : I know I am too late for the competition, and unfortunately I have no excuse.

I will publish the blog anyway; in case some tips presented here could be useful for someone….I’ll just try to buy the T-shirt on Ebay afterwards 😉

This blog is meant as a tutorial / demonstration : nothing incredible, but all main steps are described, so that you can reproduce everything (all in Lumira except the last step in Predictive Analysis).

Also I wanted to highlight how easily you can create meaningful analysis with a very simple Dataset.

After some research, I found a very nice blog by Jordan Schermerhorn, with a free GoT data set, mainly extracted from Wikis. Jordan has already done some very interesting analysis of the data, so the goal was to find new ways to use the existing dataset.

http://jordanschermer.wordpress.com/2014/08/06/valar-morghulis/

So here is a sample of the data you can download (the file contains 366 characters): This blog will try to give you all the steps, so if you download the original file, you can use this as a tutorial

(The “affiliation” dimension is a regrouping of characters, similar to an extended family. )

Now let’s start (and if you have not read/seen GoT, beware as this blog may contain Spoilers!)

Step 1 : data preparation

• Modification of the default configurations of the Dimensions and Measures.
select the element on the left side of the screen : the  “object picker”, and go to “display formatting” and “change aggregation” options
• Creation of a CHARACTER measure to count occurrences (click on CHARACTER and choose “create calculated dimension”)
• Removal of the unnecessary elements

Before                                                                                               After

• In order to analyze the character by groups, we already have the “affiliation”.
But it would be interesting to extract the family name from the character, in order to have a real “family” dimension.

You just need to create a calculated dimension and enter the magic formula:
if Contain({Character}, “Frey”) then “Frey” else if Contain({Character}, “Stark”) then “Stark” else if Contain({Character}, “Lannister”) then “Lannister” else if …(you’ve got the idea I think…)…….. else “none”

• In order to classify the characters by age, we can also create Age Groups.
select the Age column, and then on the right-hand side (the “Data Manipulation”), select “group by selection”
The screen is very easy to use: sort the “values” column ascending, and just select the values with the mouse (shift+click for a range), select “Add” and give a name to the group. Then proceed with all the remaining groups by selecting “new Group”.
Very convenient: The “Wiser Adults” is simply creates last with the “Group Remaining Values as” field
Note: I just made up the Age Groups, please don’t take the range values too seriously….

Step 2 : Visualize

• First let’s compare the Age Group distribution amongst the families. We can create a Stacked Column Chart, and we get this :

Not bad, but the Age Groups are not sorted in the right sequence…..and I did not find a way to force the display order of the values (Child -> Teen -> Young Adult…)

So I just went back to the Prepare Room, selected the Age Groups, and did a “replace” for each value: Child becomes 1Child, Teen becomes 2Teen….
It took me less than 1 minute. Went back to the Visualize Room, and now: I can see that Starks are indeed much younger than Lannisters
(SPOILER
:  maybe because they often die young…)

• Then let’s see if the GoT world can compare to the real world: in which age categories do you have more dead characters than living ones.
you just need to create a simple Line Chart, with the Number of Characters measure, by Age Group, and take Dead as a legend (dead=1 meaning….you’re dead).
The result seems surprisingly realistic, considering the amount of murders in the books: the older you get, the more chances are that you are dead…

• Lastly, let’s try just one algorithm, as I have the SAP Predictive analysis tool.
I will use the InfiniteInsight Classification algorithm on my data directly (no preparation step). Here are the parameters used :

And the result is quite interesting : we have seen just before that the Death Rate seemed linked to the Age as expected….. but what we see here is that in fact the main factor for Death is the family, the age being only the second factor !

Now it would be interesting to go back and analyze this, but I’ll let you continue: your turn !

And remember: “Winter Is Coming” !

Eric

### Assigned Tags

1 Comment
You must be Logged on to comment or reply to a post.

Your blog are always very interesting! How about the Data Geak IV?