There is a well known example of a pattern recognition: the Iris data set. Let’s replicate it with SAP HANA.

This example is about classifying some plants.

According to the “Iris data set” site:

The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

This is the kind of pattern recognition problem that a simple perceptron (SP) can work with. The SP is intended to work with linearly separable data set. Specifically, the SP can classify only two classes, so… if we want to solve the iris plant recognition, we have to build three SP.

As we know, the SP is a neuron’s mathematical model. It is the most basic artifficial neuron.

Through the learning algorithm, it can classify two linearly separable classes. The training paradigm is “supervised”, i.e., we need to have a training set where we know the class of each element. After we iterate over the training set, the SP is able to separate new elements that it have never seen before. So, with the iris data set, the working path will be the next:

- Build three SP. One for each class.
- Build three training set. One to learn to identify each class.
- Build one function that will evaluate the belonging of each class, given an element.

The attribute information consist of:

- Sepal lenght in cm
- Sepal width in cm
- Petal lenght in cm
- Petal width in cm
- Class: {Iris setosa, Iris vversicolour, Iris virginica}

Thus, that’s the way that the original training set looks like, and we’re going to derivate the three training set that we need. For example, for the “train_setosa”, we make 0s all the rows that doesn’t belongs to the “Iris setosa” class. After that, we randomize the rows position, because originally it comes in blocks of 50 rows of each class.

For the HANA training, we need three tables for each class. Those are (for the Iris Setosa case): TRAIN_SETOSA, W_SETOSA, PARAM_SETOSA; the training set, the weights for the weighted sum needed in the activation function, and the final sintonized parameters.

With this, and this, and this file, we’re ready to learn to classify the Iris Setosa.

The TRAIN_x table is imported for the corresponding CSV file, the W_x and PARAM_x are initialized with the SQLScript file that is also available.

After the whole training process, and with the three perceptrons trained, if we want to know to which class belongs a given vector, we need to evaluate the three SP. That one who “responds” 1, represents the Iris-class.

All the necessary files are in this github repository.