Understanding the perceptron neuron model

By Roberto Lopez, Artelnics. Revisited on March 21, 2020.

One of the hottest topics of artificial intelligence and machine learning are neural networks. These are computational models based on the brain's structure, whose most significant property is their ability to learn from data.

Neural networks are usually arranged as sequences of layers. In turn, layers are made up of individual neurons. Therefore, neurons are the basic information processing units in neural networks.

The most widely used neuron model is the perceptron. This is the neuron model behind perceptron layers (also called dense layers), which are present in the majority of neural networks.

In this post, we explain the mathematics of the perceptron neuron model:

  1. Perceptron elements.
  2. Neuron parameters.
  3. Combination function.
  4. Activation function.
  5. Output function.
  6. Conclusions.

1. Perceptron elements

The following figure is a graphical representation of a perceptron.

Neuron model

In the above neuron, we can see the following elements:

As an example, consider the neuron in the next figure, with three inputs. It transforms the inputs \( \mathbf{x}=(x_1,x_2,x_3) \) into a single output \( y \).

Neuron example

In the above neuron, we can see the following elements:

2. Neuron parameters

The parameters of the neuron consist of bias and a set of synaptic weights.

Therefore, the total number of parameters in this neuron model is \( 1+n \), being \( n \) the number of inputs in the neuron.

Consider the perceptron of the example above. That neuron model has a bias and 3 synaptic weights:

The number of parameters in this neuron is \( 1+3=4 \).

3. Combination function

The combination function takes the input vector \( x \) to produce a combined value, or net input, \( c \). The combination is computed as bias plus a linear combination of the synaptic weights and the inputs in the perceptron.

$$ c = \sum_{i=1}^{n} w_i \cdot x_i, $$

for \( i=1,\ldots,n \).

Note that the bias increases or reduces the net input to the activation function, depending on whether it is positive or negative. The bias is sometimes represented as a synaptic weight connected to an input fixed to \( +1 \).

Consider the neuron of our example. The combination value of this perceptron for an input vector \( \mathbf{x} = (-0.8,0.2,-0.4) \) is

$$ c = -0.5 + (1.0·-0.8) \\ + (-0.75·0.2) + (0.25·-0.4) \\ = -1.55. $$

4. Activation function

The activation function defines the output from the neuron in terms of its combination. In practice, we can consider many useful activation functions. Four of the most used are the following:

Hyperbolic tangent activation

The hyperbolic tangent is defined by

$$ a = \tanh{(c)}. $$

This activation function is represented in the next figure.

Hyperbolic tangent activation function

As we can see, the hyperbolic tangent has a sigmoid shape and varies in the range \( (-1,1) \). This activation is a monotonous crescent function that exhibits a right balance between linear and non-linear behavior.

In our example, the combination value is \( c = -1.55 \). As the chosen function is the hyperbolic tangent, the activation of this neuron is

$$ a = \tanh{(-1.55)}\\ = -0.91 $$

The hyperbolic tangent function is very used in the hidden layers of neural networks for approximation and classification tasks.

Rectified linear (ReLU) activation

The rectified linear activation function, also known as ReLU, is another non-linear activation function that has gained popularity in the machine learning domain. It is zero when the combination is negative and equal to the combination when the combination is zero or positive.

$$activation = \left\{ \begin{array}{lll} 0 &if& \textrm{$combination < 0$} \\ combination &if& \textrm{$combination \geq 0$} \end{array} \right. $$

The ReLU function is represented in the next figure.

rectified linear activation function

An advantage of the ReLU function is that it is more computationally efficient than other non-linear activation functions, due to its simplicity. The ReLU function is very used in the hidden layers of neural networks for approximation and classification tasks.

Linear activation

For the linear activation function, we have

$$ a = c $$

Thus, the output of a neuron with a linear activation function is equal to its combination. The linear activation function is plotted in the following figure.

Linear activation function

The linear activation function is very used in the output layer of approximation neural networks.

Logistic activation

As the hyperbolic tangent, the logistic function has a sigmoid shape. The logistic function is defined by

$$ a = \frac{1}{1+\exp{(-c)}}. $$

This activation is represented in the next figure.

Logistic activation function

As we can see, the image of the logistic function is \( (0,1) \). This is a suitable property because the outputs here can be interpreted in terms of probabilities. Therefore, the logistic function is widely used in the output layer of neural networks for binary classification.

5. Output function

The output calculation is the most critical function in the perceptron. Given a set of input signals to the neuron, it computes the output signal from it. The output function is represented in terms of the composition of the combination and the activation functions.

The next figure is an activity diagram of how the information is propagated in the perceptron.

Propagation

Therefore, the final expression of the output from a neuron as a function of the input to it is

$$ y = a (b+w\cdot x) $$

Consider the perceptron of our example. If we apply an input \( \mathbf{x} = (-0.8,0.2,-0.4) \), the output y is the following

$$y = \tanh{(-0.5 + (1.0·-0.8) + (-0.75·0.2) + (0.25·-0.4))} \\ = \tanh{(-1.55)} \\ = -0.91$$

As we can see, the output function merges the combination and the activation functions.

6. Conclusions

A neuron is a mathematical model of the behavior of a single neuron in a biological nervous system.

A single neuron can solve some simple tasks, but the power of neural networks comes when many of them are arranged in layers and connected in a network architecture.

Deep Neural Network

Although we have seen the functioning of the perceptron in this post, other neuron models have different characteristics and are used for different purposes.

Some of them are the scaling neuron, the principal components neuron, or the unscaling neuron.

Some neuron models only make sense when they are contextualized in a layer, and cannot be defined individually. Some of these are the recurrent, long-short term memory (LSTM) or probabilistic layers.

Related posts:

Subscribe To Our Newsletter