Much like neurons in our brains, artificial-neurons are the core components of an artificial NN. There are several different models of artificial neurons, but they all share the following core parts:
- Inputs: This is the data we “feed” a neuron to interpret. What kind of data it is, and how many inputs we have can vary greatly depending on what kind of NN we’re training. Think of them as similar to the electro-chemical signals in our brain neurons.
- Weights: Every input has an associated weight that comes attached to it. These weights can be used to determine the importance of, or even simply the type of an input.
- Bias: This value represents an offset for the sum of our weighted inputs. It’s used to ensure said sum is above or below a certain threshold value.
- Activation function: This is what determines the final output that the neuron “fires.” It takes the sum of the previous parts, applies an algorithm, and sends off a value. Different activation functions are used depending on the kind of output we want.
- Output: The final output of the neuron. This output could feed another neuron or be the final output of the NN.
So to recap, each input is multiplied by a corresponding weight, then a sum of all those products, plus a bias value, are fed into an activation function. The result is a value that is our end result or feeds other neurons. Now let’s take a look at how these artificial-neurons are arranged to form a network.
Neural Network Architecture
Artificial NNs have a layered design:
- Input layer: The input layer is where we input our data. How that data is fed to the network will depend on the data. For example, let’s say our data-set consists of 28×28 pixel images. We would then have 784 input neurons; one for every pixel.
- Hidden layer(s): These layers of neurons are for processing the data. These layers will have corresponding weights, a bias, and an activation function applied to the data that they are fed, unlike the input layer. There’s a bit of artistry and science involved in deciding how many hidden layers to use. For now what’s important to understand is that the data can go through one or more layers of understanding. Having more than one hidden layer, is what makes an NN “deep,” hence the term “deep learning.”
- Output layer: The output layer, as you may have guessed, is the layer of neurons responsible for representing the output of our NN. For example let’s say we wanted to classify our images into 3 different categories(I’ll use the term labels going forward, as is common in this field). A simple output layer setup would be to have 3 neurons, each representing one of the three labels.
The process of feeding data forward through these layers is called forward-propagation.
So now we have a model that takes in data, analyzes it, and outputs some result. But right now this model doesn’t learn. Let’s get into what the “learning process” actually is. We’ll start by talking a little bit about the data.
There are two main components of datasets in for NNs:
- Features: The features are descriptions of the data we are looking at. This can take many forms. Let’s continue with the above example of images as a dataset. The features of each image would be the individual pixels.
- Labels: The labels are simply what we want our NN to output about our data. They are what our NN needs to see or predict.
So ideally we want our NN to take the features for a given image, process them, and output the label assigned to that image. Now what do we do when it guesses the wrong label?
This is where the magic of learning happens. When we have an incorrect label, our NN needs to go through a process of correcting the weights and biases at every layer. There are essentially 3-steps to this process:
- Cost function: This is a function that is used to calculate just how far from the correct label the output was. Think of it as assigning an “error cost” to the output. There are many different loss functions, and it will depend on what your NN is trying to do. For example, when making a classification model, we might choose a cross-entropy function. This is often also referred to as a loss function.
- Backward-propagation: This is the process of going backwards through our NN to assign individual error costs to each weight and bias. This is done use partial derivatives and something called the chain rule.
- Optimization: In this step we use a function(typically called an optimizer) to update each weight and bias based on the error costs from the back-propagation. Two popular functions for this are SGD and Adam.
So we calculate our error cost, backward-propagate it to every individual weight and bias, and then adjust said weights and biases to perform the next forward-propagation. This process is repeated until the error cost has been minimized as much as possible. And if we’ve done everything correctly, our NN should be able to perform it’s given task with high accuracy. A good practice is to have two separate datasets: one for validation and one for performing the training. This ensures that our model is capable of handling data it hasn’t already seen, and is thus ready for potential real world applications.
read original article at https://medium.com/things-i-could-never-make-up/neural-network-building-blocks-7ea6f8c790bf?source=rss——artificial_intelligence-5