Initial attempt to predict short-term stock changes using a Gated Recurrent Unit on Bitcoin stock price Data



In the last blog post, I discussed the possibility of using a recurrent neural network to detect patterns within the stock market. In this blog post I will show my findings from training an RNN on stock data, the practicality with application in the real world, and a new method that could potentially yield higher accuracy.

Data Collection and Granularity

For this project I am pulling data from Yahoo finance by using the “yfinance” API package. Stock data is retrieved by passing in a ticker symbol, specifying a time frame, and declaring the level of granularity for the data. Data can be retrieved down to the minute, however, at that granularity only 30 days of data can be retrieved. Because RNN’s are data hungry, I wanted to find a granularity and time frame combination that would maximize the amount of data being pulled in while also staying within the scope of analyzing short term market trends. As a result, the data is at a 2-minute granularity for the previous 90 days. This gives us about 25,000 observations. I also chose to analyze bitcoin data (BTC-USD) because I had a hypothesis that cryptocurrency data may be more susceptible to showing patterns in the market as opposed to traditional equity backed funds.

Data Cleaning

Because of the high volatility in cryptocurrency data, I wanted to make it simpler for the model to ingest the data. To do this, I passed the data through a “Savitzky-Golay” smoothing function. Below is an example of what the function does on a segment of the stock data:

After the data is smoothed, we then split the data into training data and testing data as seen below:

Because we wanted to be able to predict short term patterns in the stock market as opposed to long term trends, I created a new data type that consisted of 24-hour stock price windows as an array for the x-value and an associated change in stock price over the next hour for the y-value. Here is an example of the x and y value at one instance:

Model Setup

I then constructed a GRU model (Gated Recurrent Unit) with three layers (512 nodes, 1024 nodes, and 1 node respectively). The model trained over 50 epochs with mean squared error being used as the loss function and the “adam” optimizer algorithm was used to iteratively update the network weights.


It quickly became apparent that the model was struggling to learn anything from the data due to a very slight decrease in the loss function over 50 epochs (MSE decrease from 0.076 to 0.073). Upon inspection of the results, we can see that the model did not do particularly well at predicting significant fluctuations in price. In the following visual the x axis represents randomized indices in the testing data and the y axis represents the increase in stock price over the following hour (normalized to a value between -1 and 1). The black line represents the prediction and the blue line represents the actual results.

As we can see, the model was incentivized to make conservative predictions near 0 in order to decrease the MSE. One metric I calculated was the percent of predictions that were in the wrong direction (predicted positive and went negative and vice versa). On average 33% of the results were in the wrong direction.

Next Steps

There was not any valuable information I was able to glean from this model. However, I started to think deeper about how I could aid the model in gleaning more substantive patterns in the data as opposed to feeding raw sequences of values and expecting the model to find a pattern. For example, what if we fit a high degree polynomial to the window and used calculus and principles of statistics and data distribution to describe the shape of the function in a tabular way? As I will discuss in my next blog post, I was able to successfully implement and saw significantly better results.


