If you could have any practical superpower, what would it be?
For many people, they’d say the ability to predict the future or make money easily (and no, this isn’t a make money quick scam).
Well, you can add another item to your “Crazy things AI has Accomplished List” because Artifical Intelligence can literally do both.
Mind B L O W N 🤯
Artificial intelligence, after mimicking the way our brain works, is actually capable of predicting the fluctuation in future stock prices. And it’s not crazy hard to do either!
The model I built’s prediction in comparison to actual prices.
This type of AI would be categorized as using Deep Learning. Deep learning is a subsection of machine learning, which mimics the way the human brain learns. Different types of artificial neural networks are being used to emulate parts of the human brain and create even stronger AI. If you need a refresher on deep learning, you can check out my article covering basic concepts
So, to understand Deep learning, and go deep into the type of model that the stock predictor is (Recurrent Neural Network), it only makes sense to remind ourselves how our own brains work.
If you happen to be a brainiac and want to skip this refresher, you can click here to jump into the types of RNNs.
For the lot who need to jog their memory on 6th-grade health class, let me Miss Frizzle it up for you, and take you on a trip into our brain.
The brain has 3 main parts, the up into 3: Cerebrum, Cerebellum, and Brainstem. Deep learning focuses on the Cerebrum.
The cerebrum is broken up into Temporal, Frontal, Occipital and Parietal lobes, which have been imitated through different types of artificial neural nets.
- Temporal Lobe: Receives sensory information process the information and learns through prior experience (long term memory) — Artificial Neural Networks
- Occipital Lobe: Visual Processing, basically image recognition — Convolutional Neural Networks
- Frontal Lobe: Short term memory, remembers info from previous observations and applies it forward — Recurrent Neural Networks
Why make our most powerful tech without a short term memory?
For many cases, it doesn’t make sense not to give these powerful programs this luxury. Take the example of reading this article. It’d be pretty sad if you literally couldn’t remember the previous sentence you read.
Recurrent neural nets have the ability by using neurons with short term memory. Basically, they can remember what was in the previous neurons and pass it within the network for future neurons.
RNNs arent a one size fits all though. There are a few different types:
1. One to many (one input and multiple outputs)
One Input (yellow) leads to multiple outputs (red)
With the one input of, say a picture, the network produces a cohesive sentence with multiple words as an output. The RNN is able to create a sentence that makes sense because it can base the next words in the sentence on previous words.
2. Many to one (multiple inputs and one output)
The multiple inputs (yellow) leads to one output (red)
Something like a sentiment analyzer is an example of a many to one RNN. With an input of multiple words/values strung together and the understanding of what the words mean, the network classifies or gives a single output.
3. Many to many
Multiple outputs(yellow) leads to multiple outputs(red)
Think about a translator, with multiple inputs (some text), that leads to multiple outputs (another set of text). You need short-term information about the previous words to translate the next word.
But what’s the catch?
This sounds good, a little too good if you ask me. Of course, there has to be a catch.
You could say for RNNs, our issue isn’t what is there, but its what’s not — aka The Vanishing Gradient
*Poof* a gradient disappeared into thin air 🎩
Ok, well not exactly.
The vanishing gradient is when the gradient ends up being very low, which causes inaccurate results and really long training time.
The gradient is the rate at which cost changes with respect to weights or biases. In most cases, the gradient at any point is the product of previous gradients till that point.
Cost (the difference between nets predicted output and actual output from labelled training data) is lowered by adjusting weights and biases over and over through training until the lowest value is obtained.
In the RNN, the recurring weights that connect the hidden layers end up in a temporal loop — these small weights are being multiplied again and again causing the small values to become even smaller.
The further you go through the network, the lower the gradient is and the harder it is to train the weights, which has a domino effect on all of the further weights throughout the network. Basically, these early layers are responsible to be the simple building blocks. If early layers get it wrong, later layers built from their outputs will be wrong as well.
The solution? Long Short Term Memory
With the problem of the vanishing gradient, the recurring weights are less than one, so they basically vanish. The goal of the long short term memory is to make the weights equal to one and filter the inputs in terms of what the RNN actually needs to know for the next output.
The LSTM cell has 4 gates to manage previous and the new input(s) to determine outputs. The choice to forget, learn, remember or use certain inputs is decided to see if they need to impact the output value or not.
Take the example of predicting stock prices. The prediction of a new stock price depends on :
- The upwards or downwards trend of previous days → previous cells state
- Price of stock on previous dates(in the short term) → output of the previous cell
- Other factors that may affect public opinion- a new company policy, a drop in profit, etc. → output at the current time
The LSTM identifies this and passes the relevant info on, allowing the network to evolve itself and deciding how to use its resources to best complete the task.
Building an RNN with LSTM
To test this out, I decided to replicate an RNN using LSTMs to predict the upwards and downwards trend of the Google stock price.
This model was trained using data from 2009–2017 and then predicted the trends of 2018 (and was then compared to the actual trends). You can see all my code on my GitHub page here.
The process to build this can be broken down into 3 steps:
1. Data preprocessing
2. Building the RNN (a stacked LSTM with dropout regularization)
3. Making the predictions and visualizing the results
Let’s do it!
We start with importing the libraries: Numpy allows us to make arrays, Matplot lib to visualize results using charts, and Pandas to import and manage the dataset easily.
# Importing the libraries
import numpy as np #allow to make arrays
import matplotlib.pyplot as plt #visualize results on charts
import pandas as pd #import dataset and manage easily
Then we import the training set. It’s important to notice we are only importing the training set right now(not the test set) because the RNN never actually uses the test set apart from comparing its final prediction.
The columns in the dataset are turned into NumPy arrays (using “.values” )so they can be used as input values in Keras
dataset_train = pd.read_csv('GOOGL_Stock_Price_Train.csv')
training_set = dataset_train.iloc[:, 1:2].values
Next, we have feature scaling. This can be done through standardization or normalization. In this case, we use normalization to scale the values to be between 0 and 1.
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0, 1))
training_set_scaled = sc.fit_transform(training_set)
#fit (gets min and max on data to apply formula) tranform(compute scale stock prices to each formula)
We create a data structure with 60 timesteps for each output. At each time (t), the network will look at 60 previous time steps to make a new prediction. The number 60 is an experiment based value — with only 1 timestep, it leads to overfitting and even 20 timesteps is still too low. The 60 timesteps would be the financial days of 3 months.
X_train = 
y_train = 
for i in range(60, 2168): # upper bound is number of values
X_train.append(training_set_scaled[i-60:i, 0]) #takes 60 previous stock prices from 60 past stock prices
y_train.append(training_set_scaled[i, 0]) #contains stock price learned to predict
X_train, y_train = np.array(X_train), np.array(y_train) # make into numpy arrays
We also have to reshape the arrays to add dimension — instead of only one indicator, the new dimension will allow for more indicators and be compatible with the input shape of the RNN.
# Reshaping- add dimension in numpy array
X_train = np.reshape(X_train, (X_train.shape, X_train.shape, 1)) #adds dimension in numpy array