Data Tells The Titanic
Titanic sank in the North Atlantic Ocean in the early morning of 15 April 1912 after colliding with an iceberg during her maiden voyage from Southampton to New York City and it is one of the most tragic shipwrecks in history.
Let’s start to analyse the passenger data of Titanic and maybe the data that we extract from the dusty pages of history will tell us some details about the story.
I believe many of us have seen the movie and/or read the books about Titanic therefore we know the details of this tragic event. If we didn’t, can the data tell us some details? We will see.
The dataset contains passenger’s information and the result of event for them, in other word whether they survived or not.
You can see the variable descriptions of the passenger’s data below.
survived -> (0 = No; 1 = Yes)
pclass -> Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
name -> Passenger Name
gender -> Passenger Gender
age -> Passenger Age
sibsp -> Number of Siblings/Spouses Aboard
parch -> Number of Parents/Children Aboard
ticket -> Ticket Number
fare -> Passenger Fare
cabin -> Cabin No
embarked -> Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)
Now I will discover the data by using some algorithms. In order to do this I will use SAP Predictive Analytics tool.
Firstly I want to use clustering model and analyse the clusters. I think the variable ranges of clusters can say something to us.
After running the algorithm, 10 clusters (Which I defined the number during modelling) have occurred. Each cluster has unique group of passenger information based on they survived or not. I sorted the clusters by survival rate. You can see the rates below;
Cluster 8: %97
Cluster 6: %89
Cluster 5: %58
Cluster 2: %42
Cluster 9: %35
Cluster 10: %26
Cluster 3: %20
Cluster 7: %18
Cluster 4: %16
Cluster 1: %11
Cluster 6 and 8 have the highest survival rates. We can see the contents by looking at the variable ranges of clusters in the bottom left corner.
Cluster 8 contains 1st class female passengers. Rate is %97.
Cluster 6 contains 2nd class female passengers. Rate is %89.
So we can say that if a person is female and 1st/2nd class passenger, most likely she survives. If you remember the movie, they try to save the women and children first.
Unfortunately the survival rate of 3rd class female passengers is under %50.
Cluster 1 contains 3rd class young male passengers. As you can see above most of them died. So if a person is 3rd class young male passenger, most likely we lose him.
Cluster 1 members are young, male and stronger than the others. So I think that they tried to save others (women, children and old people) and in order to help others they couldn’t survive, or maybe they got stuck in the lower floors.
I have 2 datasets. One contains the survival information (Train Dataset), other doesn’t contain the survival information (Test Dataset). Now I want to use classification model. System will learn how to predict the survival status on train dataset and then I will apply the model to test dataset. The system will make a prediction on test dataset and give the result of whether the passenger survived or not.
As you can see above Gender and Pclass are the most related variables on survival status.
Now I am applying the model to test dataset which doesn’t contain the survival status. There are 418 passengers in test dataset.
You can see the decision column that contains the prediction of passenger’s survival status (0 = No; 1 = Yes).
I compared the prediction result with the real data. 338 of 418 (%81) were correctly predicted by the system. If you had a CEO with %81 correct prediction ability, would you hold him/her in high esteem? I would.
When I analyzed the data, I realized that all of the 1st class children passengers under 15 years old survived, except one. She was two years old, Helen Loraine Allison. Her cabin was C22 C26, with her father, mother and brother. Of the Allison family, only baby Hudson Trevor (11 months old) was saved.
From the data to real world. After a little search on google I found this photo of Baby Trevor and his mother.
Baby Trevor lost her sister, mother and father. He lost his entire family in this accident and unfortunately 17 years later Trevor died on 7 August 1929 at the age of 18 in Maine, USA of ptomaine poisoning and was buried beside his father in Chesterville, Ontario.