Analysing the heart disease data set from the IBM Intelligent Miner with SAP Lumira.
My First entry for the Data Geek Challenge 2.0 was the Analysis of the Titanic Accident with SAP Predictive Analysis. It became a featured content in the SCN Area Predictive Analysis and motivated me to create a second entry. This time with SAP Lumira.
I came in contact with the IBM Intelligent Miner earlier and remembered about a nice date set which describes serveral properties that might be occurrence of heart diseases.
The Data was recorded from 270 People (240 training data sets and 30 test data sets). When and where the data sets were collected, is not known.
But what is possible with the IBM Miner should be possible with SAP Lumira or SAP Predictive Analysis as well:
You can find the Data sets at the Homepage of the University of Applied Science in Wismar/Germany.
|Age||Age of the person|
|***||male or female|
|chest pain type||4 types are recorded. From low (1) to heavy (4)|
|serum cholestoral||cholesterol in mg/dl|
|fasting blood sugar||blood sugar > 120 mg/dl (yes or no)|
|Resting electrocardiographic results||0,1,2|
|Maximum heart rate achieved||Integer|
|Exercise induced angina||Yes or no|
|the slope of the peak exercise ST segment||1,2,3|
|number of major vessel||0 to 4|
Known as thalassemia diseases of the red blood cells are referred to, in which the hemoglobin is not sufficiently formed by a genetic defect or degraded increased.
Can damage the heart, kidney and liver of the Person.
|heart disease||“heart disease” is the target feature that indicates whether the person is suffering from heart problems or not.|
I opened the aquired data directly in SAP Lumira to get a better overview about the composition. After the enrichment of the data, the analysis could begin.
First of all I had to check how many people of the recorded data had a heart disease.
So 103 of 240 Person had a heart disease. Maybe it depends on their age. It is common that older people had heart diseases more often than younger people. Let’s check this by setting a filter with the value Y for the attribute heart disease.
As you can see we have a peak at the ages from 58 to 62. It seems like heart diseases have a relationship to the person’s age. But what else is responsible for this diagnosis? Maybe it depends on the gender of the persons.
This also appears to be true. A clear majority of the infected people is male. But how many of the women are part of the peak between 58 and 62 we had befor?.
While men have their majority at an age of 58, women have it at an age of 62. What else is responsible for heart diseases? Let’s check the relationship to a induced angina.
Lumira found a small majority for induced angina and heart diseases. But in detail only male persons are responsible for this majority. Women seem to be unaffected. Another attribute we have to predict if a person has a serious heart disease is the type of the pain in his chest. In variants form 1 (low) to 4 (heavy) the result is not suprising. A big majority with a heavy chest pain (4) has a heart disease.
If you look at the number of detected major vessels in relation to the gender of the persons affected, then you cannot recognize any peculiarities on the women side. For the male subjects the majority of major vessels is somewhere between 0 and 2.
Let’s check the responsibility of the diagnosis of thal – thalassemia a disease of red blood cells referred to by a genetic defect in which the haemoglobin is not sufficiently formed or degraded increased. It can cause heavy effects on heart, liver and kidneys of the person. Lumira shows the persons with heart disease in relation to gender and the genetic defect. At first glance the defect seems not to be responsible for a heart disease.
But it is striking that very many people are ill, by whom the genetic defect was repaired later. Judging by this graph it is possible that the method for eliminating the defect is weakening the heart, so it is more prone to such diseases, or being directly responsible for that. But this decision is left to people who are familiar with it.
To identify future heart disease at an early stage, it would be useful to carry out further analysis with SAP Predictive Analysis. But that is another story.