In this blog (Part1) lets see how to implement ANN (Artificial Neural Network) using XI. The next blog (Part-2) will focus more of the technical implementation of BPN (Back Propagation Network), an ANN learning XOR gate using XI (Exchange Infrastructure) and the final blog (Part-3) will completely focus on the applications which use ANN and how to implement the same in XI from both technical and business perspective.
A neural network is a computer network designed to function in a similar way to natural neural structures such as a human brain. The processing ability of the network is stored in the inter-unit connection strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns. Computer models of such networks are often called “artificial neural networks”. Neural nets are used in bioinformatics to map data and make predictions.
Artificial Neural networks (ANN)
Artificial Neural Networks are relatively crude electronic models based on the neural structure of the brain. The brain basically learns from experience. It is natural proof that some problems that are beyond the scope of current computers are indeed solvable by small energy efficient packages. This brain modeling also promises a less technical way to develop machine solutions. This new approach to computing also provides a more graceful degradation during system overload than its more traditional counterparts.
A simple neuron
Within humans there are many variations on this basic type of neuron, further complicating man’s attempts at electrically replicating the process of thinking. Yet, all natural neurons have the same four basic components. These components are known by their biological names – dendrites, soma, axon, and synapses. Dendrites are hair-like extensions of the soma which act like input channels. These input channels receive their input through the synapses of other neurons. The soma then processes these incoming signals over time. The soma then turns that processed value into an output which is sent out to other neurons through the axon and the synapses.
A basic artificial neuron
In the above figure, various inputs to the network are represented by the mathematical symbol, x(n). Each of these inputs are multiplied by a connection weight. These weights are represented by w(n). In the simplest case, these products are simply summed, fed through a transfer function to generate a result, and then output. This process lends itself to physical implementation on a large scale in a small package. This electronic implementation is still possible with other network structures which utilize different summing functions as well as different transfer functions. In SAP-XI, we can implement this as a part of message mapping using integration builder where simple function can be used or even the basic mathematic functions provided by SAP XI can also be used.
A simple neural network diagram
Although there are useful networks which contain only one layer, or even one element, most applications require networks that contain at least the three normal types of layers – input, hidden, and output. The layer of input neurons receives the data either from input files (here in XI consider an XML file) or directly from electronic sensors in real-time applications where in this case the modules will convert it into XML stream and feed it into the Adapter Engine of SAP-XI. The output layer sends information directly to the outside world, to a secondary computer process, or to other devices such as a mechanical control system and for simple reasons we will use XML file here. Between these two layers can be many hidden layers. These internal layers contain many of the neurons in various interconnected structures. The inputs and outputs of each of these hidden neurons simply go to other neurons. In most networks each neuron in a hidden layer receives the signals from all of the neurons in a layer above it, typically an input layer. After a neuron performs its function it passes its output to all of the neurons in the layer below it, providing a feed forward path to the output. All these information from one layer to another can be completely implemented in message mapping of SAP-XI and the overall flow can be designed as a transformation in BPM (Business Process Management) of SAP-XI.
Backpropagation Network (BPN)
The Backpropagation Net was first introduced by G.E. Hinton, E. Rumelhart and R.J. Williams in 1986 and is one of the most powerful neural net types. It has the same structure as the Multi-Layer-Perceptron and uses the backpropagation supervised learning algorithm.
The type of BPN is feedforward and the input value types are binary. The activation function used is sigmoid. It is mainly used in complex logical operations, pattern classification and speech analysis.
A sigmoid function is a mathematical function that produces a sigmoid curve a curve having an “S” shape. Often, sigmoid function refers to the special case of the logistic function shown at right and defined by the formula:
Sigmoid functions are often used in neural networks to introduce nonlinearity in the model and/or to make sure that certain signals remain within a specified range. A popular neural net element computes a linear combination of its input signals, and applies a bounded sigmoid function to the result; this model can be seen as a “smoothed” variant of the classical threshold neuron. This sigmoid function can be directly used as a custom function or with basic mathematics in message mapping.
Rules for the problems
There are only general rules picked up over time and followed by most researchers and engineers applying this architecture of their problems
1. As the complexity in the relationship between the input data and the desired output increases, then the number of the processing elements in the hidden layer should also increase.
2. If the process being modeled is separable into multiple stages, then additional hidden layer(s) may be required. If the process is not separable into stages, then additional layers may simply enable memorization and not a true general solution.
3. The amount of training data available sets an upper bound for the number of processing elements in the hidden layers. To calculate this upper bound, use the number of input output pair examples in the training set and divide that number by the total number of input and output processing elements in the network. Then divide that result again by a scaling factor between five and ten. Larger scaling factors are used for relatively noisy data. Extremely noisy data may require a factor of twenty or even fifty, while very clean input data with an exact relationship to the output might drop the factor to around two. It is important that the hidden layers have few processing elements. Too many artificial neurons and the training set will be memorized. If that happens then no generalization of the data trends will occur, making the network useless on newdata sets.
In our case we are implementing XOR gate and the number of input and hidden neurons will be two and one output neuron.
The learning process
Once the above rules have been used to create a network, the process of teaching begins. This teaching process for a feedforward network normally uses some variant of the Delta Rule, which starts with the calculated difference between the actual outputs and the desired outputs. Using this error, connection weights are increased in proportion to the error times a scaling factor for global accuracy. Doing this for an individual node means that the inputs, the output, and the desired output all have to be present at the same processing element. The complex part of this learning mechanism is for the system to determine which input contributed the most to an incorrect output and how does that element get changed to correct the error. An inactive node would not contribute to the error and would have no need to change its weights. To solve this problem, training inputs are applied to the input layer of the network, and desired outputs are compared at the output layer. During the learning process, a forward sweep is made through the network, and the output of each element is computed layer by layer. The difference between the output of the final layer and the desired output is back-propagated to the previous layer(s), usually modified by the derivative of the transfer function, and the connection weights are normally adjusted using the Delta Rule. This process proceeds for the previous layer(s) until the input layer is reached. In SAP-XI, the delta rule can be applied by using different mapping interface having same message type and each transformation in BPM modifies the elements accordingly.
Back-propagation requires lots of supervised training, with lots of input-output examples. At times, the learning gets stuck in local minima, limiting the best solution. This occurs when the network systems finds an error that is lower than the surrounding possibilities but does not finally get to the smallest possible error. In our application implementing XOR gate in XI we dont have such limitations since the input pattern size is four with just two bits each and the each output pattern size is just one bit.
General uses of BPN
Typical feedforward, back-propagation applications include speech synthesis from text, robot arms, evaluation of bank loans, image processing, knowledge representation, forecasting and prediction, and multi-target tracking.
Given the input vector X=(x1, x2, , xn), the output from the hidden node will be as follows:
Where j=1..Ninput and aij is the weight of the ith node for the jth input. The outputs from the hidden nodes would be the input to the next hidden layer (if there is more than one hidden layer) or to the output nodes. The outputs of the output nodes should be calculated as follows:
Where k=1.. Noutput and bjk is the weight of the jth node for the kth output. The transfer function, mostly used a sigmoid or a logistic function, gives values in the range of [0,1] and can be described as:
The mean square error is the way of measuring the fit of the data and is calculated as:
where N is the number of patterns in the data set, K is the number of outputs of the network, zkn is the kth actual output for the nth pattern and tkn is the kth target output for the nth pattern.
In the learning mode, an optimization problem is solved to decrease the mean square error and it finds such a value for a and b to bring the E to minimum. By solving the optimization problem and knowing the slope of the error surface, the weights are adjusted after every iteration. As per the gradient descent rule the weights are adjusted as follows:
where style=’font-family:Symbol’>h is the learning rate and style=’font-family:Symbol’>m is the momentum value.
Algorithm for XOR using BPN
1. Perform the forward propagation phase for an input pattern and calculate the output error
2. Change all weight values of each weight matrix using the formula
weight(old) + learning rate * output error * output(neurons i) * output(neurons i+1) * ( 1 – output(neurons i+1) )
3. Go to step 1
4. The algorithm ends, if all output patterns match their target patterns
Planning: BPN using SAP-XI
A simple design
In the above mentioned design, there are some advantages and some disadvantages. Mentioning about the advantages, the XML format for training contains all the information from hidden layer weights to learning rates which makes it easy for a developer to debug if anything goes wrong or the BPM implementing the BPN is giving out wrong results. The disadvantage is that the production dataflow cannot know the modifying weights done simultaneously on training. Even the container variables do not come here for rescue since both these flows actually work independently. Hence the production input must know the weights somehow, and it comes along with the actual input or the production receiver must read the training XML to update the weights on memory before processing the actual input.
Design having separate weight sender and receiver
Here in the above modified design, there is an extra sender and receiver which take care of weights independently. Hence the weights are not pulled from the training data. Instead, it is got from a separate XML containing only weights.
Design to keep container variable in memory
In this design the main aim is to keep the container variables in memory rather than in the tags of XML. Hence no weights tags are needed in the input XML rather it will keep everything in BPM memory. According to me this is a perfect design but it too has its own advantages and disadvantages. Mentioning about the advantages, the production flow is completely independent to the training flow since the weights are fetched from the container variables rather than training XML or from a separate XML file. The main disadvantage is that we had to use transaction SWWL to kill the BPM process which is a little pain since the receiver and sender is within an infinite loop! Actually the server which I am using went down when I tried to delete the workflow objects for two days.
Layer connectivity using MM (Message Mapping)
Each message type can be a layer and the message mapping is the interconnectivity between two layers. All the calculations can be done using simple functions or even by using the mathematic function provided by SAP-XI itself.
Lets take the following formula.
Input to Hidden:
Hidden to Output:
The above formula gives the output value for the subsequent neuron, where the g(u) function is as follows.
From the above formulas, summation of (a.x) is done on the input to hidden mapping and g(u) will be calculated on the hidden to output mapping.
Input layer to Hidden layer mapping on production dataflow
In the above diagram, S is the sigmoid function.
Hidden layer to Output layer mapping on production dataflow
The above two mappings are actually same for production dataflow as well as training dataflow. Since output is calculated on both but for the learning process in training dataflow, Output to Hidden and Hidden to Input has to happen for backpropagation in order to update its weights.
Consider the following formulas
where style=’font-family:Symbol’>h is the learning rate and style=’font-family:Symbol’>m is the momentum value and backpropagation is all about updating the weights in hidden layer as well as output layer. In order to implement backpropagation after forwardfeed, data type must contain the training pattern since backpropagation occurs only during learning process.
Output to Hidden Layer backpropagation
Hidden to Input backpropagation
Once this step in message mapping in a transformation of BPM is complete, it means learning is complete for one pattern. These steps had to be done four times since there are four input patterns in XOR gate [0,0],[0.1],[1,0] and [1,1]. If the same process is done for all the patterns then one learning cycle is over. This learning cycle will be repeated until the error reduces to a significant amount. When the error reduces to a significant amount, if we give any one pattern as input then we will get the desired output just using the updated and trained weights.
These series of blogs (Part 1 to 3) is not aimed at implementing XOR gate but to prove that the BPM (Business Process Management) of SAP XI can be used in advanced fields like neural networks where data for applications involve fuzzy and predictions.
I hope this blog completely gives an overall idea of how to implement BPN, learning of XOR gate using SAP XI. The next blog (Part 2) will completely deal with the technical details and screen shots for implementing this BPN from any one of the mentioned designs.