Artificial Neural Networks

Now here is where things are starting to get amped up.

Artificial Neural Networks are the most basic of Neural Network models, implementing layers of artificial neurons (read about them here to understand this article properly!) in conjunction in order to make a prediction or classification. ANN’s are feed-forward networks and are referred to as Dense Neural Networks.

They formed the basis for the more complex Neural Network models such as Convolutional Neural Networks and Recurrent Neural Networks which are used for Image/Video and Sequential data respectively.

The reason ANN’s are called feed-forward is because the network moves rom an input to an output layer.

The inputs to an ANN are represented by the feature vectors w (weights) and x (inputs) which when multiplied using the dot product return a linear function. These vectors are passed into the first layer of neurons.

Forward Propagation

With a layer of neurons who all receive the same input vectors, most will return a different output. This is because each neuron will end up looking at different features of the input data. A feature here represents different characteristics or trends within data.

For example in the case of an image classifier, one feature may be an eye, whereas another feature may be the nose.

Each neuron looks for different features as they all have their own weights and biases. Since these weights and biases are initialised randomly, the neurons are already each looking at different outputs from the same input data. This is an important way in which networks make predictions by looking at the presence of certain features within the input data.

(If you would like to understand the following please go back and read the Linear Classification Theory and Artificial Neuron articles if you have not)

Since each neuron has the same inputs but different weights and biases we can derive a general equation for the output of neuron n, allowing the output to be Z :

for n in the N where N is the number of neurons in the layer and where:

(I will be using the sigmoid function here however the general convention now is to use the ReLU function, read about them here)

Say we wanted to store the outputs of all of the neurons in a layer. We could easily accomplish this by thinking of the output Z as a vector as well.

We can take Z as a column vector of size N by 1. X, a column vector of size k (number of inputs) by 1. W as a matrix of size k by N (a vector of weights for each neuron of the layer).

Therefore WT is of size N by k.

When we do WTX the result is a vector of size N by 1. Which is the same as our output vector Z.

In order to add the bias we take the bias (B) as a column vector of size N by 1 (one bias for each neuron), allowing us to easily add it to the output of WTX.

Since activation functions like the Sigmoid function are element-wise operations, it will iterate through the vector and apply to each element (or output of each neuron) and will return a vector of the same size and shape as its input, which in this case is N by 1. This will then be equal to our output vector Z, of size N by 1.

Multiple Layered Models

Typically referred to as the ‘Multilayer Perceptron’ (MLP) model, this is the true meaning of a deep neural network. The more layers in a neural network, the deeper it is.

In a model with multiple layers, the outputs of the previous layer are taken as the inputs to the next layer. Therefore, for the second layer of a model like this, the equation for the output would look like this:

From here a general equation for the output of any layer in a network can be derived:

Where L is the layer for which the output is being calculated

Since the Sigmoid returns the probability of y being 1 given a certain input of x we can say that:

Regression

To perform Regression tasks using an ANN we don’t require a probability, we need the prediction itself, the value. And for this we remove the final sigmoid on the output layer, that simple!

This returns, as you may have guessed, a vector of outputs. However, that is not what we want for a single prediction. This is easily solved. By specifying the output layer to have a singular neuron, we only get a single output!


Each layer of neurons in this manner are usually referred to as a Dense layer and these Dense layers all perform what is known as feature transformation. This is because each layer looks for different features within its input data, so further layers look for features within the output of neurons looking for different features, aka combinations of features in the data!

Each layer of a neural network identifies features of higher complexity by looking at these combinations of features which are the outputs of the previous layer.

At this point you may have a question.

We have these networks with neurons all trying to make predictions, but how the hell can these predictions be right if they all have randomly initalized weights and biases?

The answer is they can’t, and they don’t.

This article was about how a basic Artificial Neural Network is structured and works. The question of how they learn is answered by an algorithm known as Gradient Descent. Gradient Descent modifies the weights and biases for every neuron in a network until they are perfectly tuned to make the most accurate predictions and classifications. However, this is not the focus of this article. I will go into a bit more depth on Gradient Descent in the next article; however, I will not explain the exact mathematics behind it but instead simply what it does, how it works, and the different versions of gradient descent.

If any of you are wondering about the purpose of activation functions such as the Sigmoid. Their purpose is to make the network non-linear. Without any activation functions, the final calculation of a neural network can be reduced to a simple linear function, which does not have much scope for finding complex relations and making complex predictions and classifications.

I would reccommend checking out playground.tensorflow.org which allows you to visualise a neural network learning to make predictions and classify and see the different connections between neurons and layers, it’s really fun.

If you enjoyed this article please consider dropping your email down below to receive updates whenever I post an article, the frequency seems to be on an average of every 4-5 days. Also do check out my other articles on my blog and share this with friends, colleagues, and family members.

Thank you!

2 thoughts on “Artificial Neural Networks

Leave a comment