## Practical recommendations on algorithms of neural network calculations. Backpropagation algorithm.

In this video, I will tell about the algorithm for calculating simple feedforward neural network, and about the backpropagation of errors algorithm for neural network training.

The simplest neural network circuit is shown in the figure. The network has an input layer, which receives data x1, x2, x3, one hidden layer, and the output layer. The input layer contains four neurons, and one of them is called a bias neuron. The bias neuron always expresses the same value, for example, one and is used to supply a constant bias to all subsequent connected neurons. The bias neuron can be switched off by setting its value to 0. Next hidden layer consists of three neurons, and, again, one of them is a bias neuron. This bias neuron is only connected with the subsequent output layer, and no communication is received from the input layer. Hence, the bias neuron does not change its state. The result of the neural network calculation is presented on the output layer, which contains two neurons.

Each subsequent layer of the neural network is connected to the previous layer by couplings with specific weights. The network may contain several hidden layers. The network is called a feedforward network, as the layers are connected successively, and there is no backward propagation of information, for example, from the output layer to the input layer. The networks with a feedback are called recurrent networks.

We denote the numbering of the input layer neurons by the index “ i ”. Its value varies from zero to three.

The numbering of the hidden layer neurons is denoted by the index “ j ”. Its value changes from zero to two.

Finally, we number the output layer neurons by the index “k ”. Its value varies from one to two.

For convenience, in all cases, the number of the bias neuron is zero.

The coupling coefficients of the input and hidden layers are designated as wij , and a matrix of these coefficients is called W . Therefore, w01 determines the coupling of the bias neuron of the input layer with the first neuron of the hidden layer, and w32 defines the connection of the third input layer neuron with the second neuron of the hidden layer.

The coupling coefficients of the hidden and output layers are denoted as ϵ jk , and a matrix of these coefficients is called E.

The values of the input layer xi can be represented by a vector X.

The values of the hidden layer neurons are denoted as a hj, and they constitute a vector H.

A vector O represents the output values Ok.

Information is processed sequentially.  First, the hidden layer values hj are calculated, and then the output layer values ok are calculated.

Equation (1) is the equation for calculating the hidden layer values.

Each neuron calculates a combined input, consisting of the sum of the products of the input value and the corresponding weight, and then the result is processed by the activation function of the neuron “ fa ” .

In our example, the activation function is the exponential logistic function.

The combined input is denoted as net j .

To illustrate, we calculate the value of the neuron “ h1 ” .

The bias is multiplied with the weight ohm w0 ,1, then x1 is multiplied by w1,1, and this calculation is repeated for each input. Finally, the result is summed up and is fed to the activation function.

This is an example of program code for calculating the ​​hidden layer values.

The weights are presented as the elements of the array W with indices in square brackets.

The template uses the loops “for”  in the Pascal programming language.

—-

Next, we analyse the process for calculating the values ​​of the output layer neurons.

Similarly to the previous layer, Equation (2) calculates the output values. The combined input is the sum of the products of the ​​ intermediate layer values hj by the weights ϵ jk . The result is fed to the activation function. Given example illustrates the equation for the first element o1.

In Pascal programming language, the code for calculating the output values ​​looks as follows.

The above calculations are called forward method? and allow determining the output values ​​of the network after input data is received.

For correct neural network operation, it is necessary to select the weight coefficients of the matrices W and E.

A method of back propagation of error is used to select these coefficients.

When feeding the training set of examples, the network operation result is compared with a target value, and errors in the output layer are determined and defined as the d (delta). Next, output layer errors are propagated backwards in the network, and errors in the neurons of the hidden layers are identified and defined as q (theta).  As the final step, the values of all weights are adjusted based on the values of found errors.

Let us review the algorithm in action.

Before training, we randomly set all weights, for example, in the range from -0.5 to 0.5. Examples of a random weights distribution of the matrices W and E are shown on the screen.

Let’s assume, we applied a training vector X to the input and received the value of the vector O at the output.

The target output value should be equal to the vector T.

To calculate the error for each output neuron, delta 1 and delta 2,

Equation (3) is used, where the difference between the target and actual values is multiplied with the value of derivative of activation function.

As our activation function is a logistic function, its derivative can be expressed through the value of the function itself. After transformation, Equation (4) is obtained for calculating errors of the output layer.

Equation (4) is only valid for the logistic activation function, and in the general case, Equation (3) should be applied.

In Pascal programming language, the calculation error of the output layer may look as follows.

Next, we find the errors of the hidden layer neurons denoted as theta 1 and theta 2. The error for the bias neuron is not calculated.

Equation (5) is the general equation for calculating hidden layer errors.

After the transformation, Equation (6) is used for the derivative of the logistic activation function. The error of a hidden layer neuron is a combination of the errors of all the neurons that the hidden layer neuron affects. The bigger the coupling ϵ jk , the more the output layer error dk (delta) affects the error of the hidden layer neuron. In this way, the error propagates backwards from the network output to network hidden layers.

In Pascal programming language, the calculation of the hidden layer error may look as follows.

Equation (6) is only valid for the logistic activation function, and Equation (5) should be used for other activation function.

If network contains multiple hidden layers, the same equations are applied, where instead of the output layer error d (delta) we substitute the errors of the subsequent hidden layer q ‘ (theta bar).

Adjustment of the weights of the arrays W and E is the final step of the method.

Variations of weights are calculated according to the Equations (9) and (10). Here, the change in weight between two neurons is proportional to the value of the first neuron multiplied by the error of the second neuron. Coefficient µ (mu) is called the learning rate? and determines the speed of learning. Its value, in practice, usually lies in the range of 0.1-0.4.

As a result, the flowchart of the network training algorithm using the backpropagation method looks as following

An array of training vectors Xn with the number of elements N is uploaded to the network.

The maximum number of epochs Z is introduced. An epoch is a one-time training of weights on all training vectors. Next, the initial distribution of the weights is set randomly, and then the weights are trained using nested cycles.

Typically, the larger the number of epochs, the better the learning outcome of the neural network. After the training, the network is tested on data not included in the training sets, and the classification accuracy of the neural network is determined.