By Roberto Lopez, Artelnics.

One of the hottests topics of artificial intelligence and machine learning are neural networks. Neural networks are computational models based on the structure of the brain. These are information processing structures whose most significant property is their ability to learn from data. These techniques have achieved great success in domains ranging from marketing to engineering.

There are many different types of neural networks, from which the multilayer perceptron is the most important one. The characteristic neuron model in the multilayer perceptron is the so-called perceptron. In this article we explain the mathematics on this neuron model.

As we have said, a neuron is the main component of a neural network, and the perceptron is the most used model. The following figure is a graphical representation of a perceptron.

In the above neuron we can see the following elements:

- The inputs \( (x_1, \ldots, x_n) \).
- The bias \( b \) and the synaptic weights \( w_1, \ldots, w_n \).
- The combination function, \( c(\cdot) \) .
- The activation function \( a(\cdot) \) .
- The output y.

As an example, consider the neuron in the next figure, with three inputs. It transforms the inputs \( \mathbf{x}=(x_1,x_2,x_3) \) into a single output \( y \).

In the above neuron we can see the following elements:

- The inputs \( (x_1,x_2,x_3) \).
- The neuron parameters, which are the set \( b=-0.5 \) and \( \mathbf{w}=(1.0,-0.75,0.25) \).
- The combination function, \( c(·) \), which merges the inputs with the bias and the synaptic weights.
- The activation function, which is set to be the hyperbolic tangent, \( \tanh(\cdot) \) , and takes that combination to produce the output from the neuron.
- The output \( y \).

The parameters of the neuron consist of a bias and a set of synaptic weights.

- The bias \( b \) is a real number.
- The synaptic weights \( \mathbf{w}=(w_1,\ldots,w_n) \) is a vector of size the number of inputs.

Therefore, the total number of parameters in this neuron model is \( 1+n \), being \( n \) the number of inputs in the neuron.

Consider the perceptron of the example above. That neuron model has a bias and 3 synaptic weights:

- The bias is \( b = -0.5 \).
- The synaptic weight vector is \( \mathbf{w}=(1.0,-0.75,0.25) \).

The number of parameters in this neuron is \( 1+3=4 \).

The combination function takes the input vector \( x \) to produce a combination value, or net input, \( c \). In the perceptron, the combination is computed as the bias plus the linear combination of the synaptic weights and the inputs,

$$ c = \sum_{i=1}^{n} w_i \cdot x_i, $$for \( i=1,\ldots,n \).

Note that the bias increases or reduces the net input to the activation function, depending on whether it is positive or negative, respectively. The bias is sometimes represented as a synaptic weight connected to an input fixed to \( +1 \).

Consider the neuron of our example. The combination value of this perceptron for an input vector \( \mathbf{x} = (-0.8,0.2,-0.4) \) is

$$ c = -0.5 + (1.0·-0.8) \\ + (-0.75·0.2) + (0.25·-0.4) \\ = -1.55. $$The activation function will define the output from the neuron in terms of its combination. In practice, we can consider many useful activation functions. Three of the most used are the logistic, the hyperbolic tangent and the linear functions. Other activation functions which are not derivable, such as the threshold, are not considered here.

The logistic function has a sigmoid shape. This activation is a monotonous crescent function which exhibits a good balance between a linear and a non-linear behavior. It is defined by

$$ a = \frac{1}{1+\exp{(-c)}}. $$The logistic function is represented in the next figure.

As we can see, the image of the logistic function is \( (0,1) \). This is a good property for classification applications, because the outputs here can be interpreted in terms of probabilities.

The hyperbolic tangent is also a sigmoid function very used in the neural networks field. It is very similar to the logistic function. The main difference is that the image of the hyperbolic tangent is \( (-1, 1) \). The hyperbolic tangent is defined by

$$ a = \tanh{(c)}. $$The hyperbolic tangent is represented in the next figure.

The hyperbolid tangent function is very used in approximation applications.

For the linear activation function we have

$$ a = c $$Thus, the output of a neuron with linear activation function is equal to its combination. The linear activation function is plotted in the following figure.

The linear activation function is also very used in approximation applications.

In our example, the combination value is \( c = -1.55 \). As the chosen function is the hyperbolic tangent, the activation of this neuron is

$$ a = \tanh{(-1.55)}\\ = -0.91 $$The output calculation is the most important function in the perceptron. Given a set of input signals to the neuron, it computes the output signal from it. The output function is represented in terms of composition of the combination and the activation functions. The next figure is an activity diagram of how the information is propagated in the perceptron.

Therefore, the final expression of the output from a neuron as a function of the input to it is

$$ y = a (b+w\cdot x) $$Consider the perceptron of our example. If we apply an input \( \mathbf{x} = (-0.8,0.2,-0.4) \), the output y will be the following

$$ y = \tanh{(-0.5 + (1.0·-0.8) \\ + (-0.75·0.2) + (0.25·-0.4))}\\ = \tanh{(-1.55)}\\ = -0.91 $$As we can see, the output function merges the combination and the activation functions.

A neuron is a mathematical model of the behavior of a single neuron in a biological nervous system.

A single neuron can solve some very simple learning tasks, but the power of neural networks comes when many of them are connected in a network architecture. The architecture of an artificial neural network refers to the number of neurons and the connections between them. The following figure shows a feed-forward network architecture of neurons.

Although in this post we have seen the functioning of the perceptron, there are other neuron models which have different characteristics and are used for different purposes. Some of them are the scaling neuron, the principal components neuron, the unscaling neuron or the probabilistic neuron. In the above picture, scaling neurons are depicted in yellow and unscaling neurons in red.

- Customer segmentation using advanced analytics.
- Retail store sales forecasting.
- Methods binary classification.