Initial attempt to predict short-term stock changes using a Gated Recurrent Unit on Bitcoin stock price Data
Introduction
In the last blog post, I discussed the possibility of using a recurrent neural network to detect patterns within the stock market. In this blog post I will show my findings from training an RNN on stock data, the practicality with application in the real world, and a new method that could potentially yield higher accuracy.
Data Collection and Granularity
For this project I am pulling data from Yahoo finance
by using the “yfinance” API package. Stock data is retrieved by passing in a
ticker symbol, specifying a time frame, and declaring the level of granularity
for the data. Data can be retrieved down to the minute, however, at that
granularity only 30 days of data can be retrieved. Because RNN’s are data
hungry, I wanted to find a granularity and time frame combination that would
maximize the amount of data being pulled in while also staying within the scope
of analyzing short term market trends. As a result, the data is at a 2-minute
granularity for the previous 90 days. This gives us about 25,000 observations.
I also chose to analyze bitcoin data (BTC-USD) because I had a hypothesis that
cryptocurrency data may be more susceptible to showing patterns in the market
as opposed to traditional equity backed funds.
Data Cleaning
Because of the high volatility in cryptocurrency data,
I wanted to make it simpler for the model to ingest the data. To do this, I
passed the data through a “Savitzky-Golay” smoothing function. Below is an
example of what the function does on a segment of the stock data:
After the data is smoothed, we then split the data
into training data and testing data as seen below:
Because we wanted to be able to predict short term
patterns in the stock market as opposed to long term trends, I created a new
data type that consisted of 24-hour stock price windows as an array for the
x-value and an associated change in stock price over the next hour for the
y-value. Here is an example of the x and y value at one instance:
Model Setup
I then constructed a GRU model (Gated Recurrent Unit)
with three layers (512 nodes, 1024 nodes, and 1 node respectively). The model
trained over 50 epochs with mean squared error being used as the loss function
and the “adam” optimizer algorithm was used to iteratively update the network
weights.
Results
It quickly became apparent that the model was
struggling to learn anything from the data due to a very slight decrease in the
loss function over 50 epochs (MSE decrease from 0.076 to 0.073). Upon inspection
of the results, we can see that the model did not do particularly well at
predicting significant fluctuations in price. In the following visual the x axis
represents randomized indices in the testing data and the y axis represents the
increase in stock price over the following hour (normalized to a value between
-1 and 1). The black line represents the prediction and the blue line
represents the actual results.
As we can see, the model was incentivized to make
conservative predictions near 0 in order to decrease the MSE. One metric I
calculated was the percent of predictions that were in the wrong direction
(predicted positive and went negative and vice versa). On average 33% of the
results were in the wrong direction.
Next Steps
There was not any valuable information I was able to
glean from this model. However, I started to think deeper about how I could aid
the model in gleaning more substantive patterns in the data as opposed to
feeding raw sequences of values and expecting the model to find a pattern. For
example, what if we fit a high degree polynomial to the window and used
calculus and principles of statistics and data distribution to describe the
shape of the function in a tabular way? As I will discuss in my next blog post,
I was able to successfully implement and saw significantly better results.
Comments
Post a Comment