This example illustrates how a neural network can cluster iris flowers into classes based on flower’s topology, providing insight intothe types of flowers. I have used the ‘iris’ dataset that is delivered with R GUI. Each iris is described by four features:
1. Sepal length in cm
2. Sepal width in cm
3. Petal length in cm
4. Petal width in cm
This is an example of a clustering problem, where we would like to group samples into classes based on the similarity between samples. I have used R-NNet algorithm to create a neural network which not only creates class definitions for the known inputs, but will also help classify unknown inputs accordingly. I have also illustrated how the results from using Neural Network algorithm differ from that provided by clustering using RK Mean’s algorithm. We will also see how to make the neural network more effective in learning by enhancing its learning (changing algorithm parameters’) to predict better output.
Step 1 – Preparing the Data Set
For people who have no experience in R, here are the steps to get the ‘iris’ dataset on your desktop.
Assuming that you have installed and configured R that comes with SAP PA, go to R Console.
Type the below in your console:
Please note that \\ is used for file path.
Import the CSV file in SAP PA:
The Species column tells us the class to which the given Iris belongs to. The visualization of actual data is as shown below:
Step 2- Predictive Modeling and Analysis
I have used the R-NNet Neural Network component delivered with SAP PA 1.0.11. The package NNet created by Ripley uses feed-forward neural network with one hidden layer. I gave the below parameters and ran the model:
No advance properties were set. The number of Hidden Layer Neurons was 5. The results obtained are shown below. By looking at the graph below we see that there is considerable difference in classification of Versicolor and Virginica species. The main reason I see is that the characteristics of Setosa species are way different from Versicolor and Virginica. Hence a simple NN without any Softmax and Large Iterations will be able to classify it. The same results are obtained when we do Clustering using K Means.The results out of K Means Clustering with 300 Iterations are shown below. They are similar to those obtained by NN with 5 Neurons and no Softmax. Even after increasing the number of iterations to 500 and Initial Set to 3, I did not get much different results.
Output using Neural Network with 5 Neurons and No Softmax
Output using K Means Clustering (500 Iterationsand initial 1 set)
None of the above 2 models gave me results close to what was expected. Both the models were able to classify Setosa perfectly, but could not
give proper classification for other 2 as the characteristics of the other 2 Species are almost similar. Hence I decided to further train the network created
by increasing the number of neurons, enable Softmax and increase the number of iterations to 25. I changed the below mentioned parameters:
The results obtained are strikingly similar to the original data set with almost zero error (considering the small size of the data set)
Hence we can see that when it comes to clustering the NNet algorithm gives more accurate results compared to K Means algorithm, and the model can be used to predict other values with high level of accuracy. To test the predictive capability further I reduced the size of my data set with values to 100. For the other 50 entries I removed the actual values. It still gave the output with high level of accuraccy.
The results shown below again illustrate that how we can achieve better and accurate results using RNNet Component present in SAP PA
1.0.11. I am working on testing the component with a much larger data set and more independent variables and see how it works.
Results from Predictive Model with missing values in Initial Data Set