Time Series Analysis: KERAS LSTM Deep Learning - Part 1
Written by Matt Dancho
Time series prediction (forecasting) has experienced dramatic improvements in predictive accuracy as a result of the data science machine learning and deep learning evolution. As these ML/DL tools have evolved, businesses and financial institutions are now able to forecast better by applying these new technologies to solve old problems. In this article, we showcase the use of a special type of Deep Learning model called an LSTM (Long Short-Term Memory), which is useful for problems involving sequences with autocorrelation. We analyze a famous historical data set called “sunspots” (a sunspot is a solar phenomenon wherein a dark spot forms on the surface of the sun). We’ll show you how you can use an LSTM model to predict sunspots ten years into the future with an LSTM model.
Articles In This Series
Tutorial Overview
This code tutorial goes along with a presentation on Time Series Deep Learning given to SP Global on Thursday, April 19, 2018. The slide deck that complements this article is available for download.
This is an advanced tutorial implementing deep learning for time series and several other complex machine learning topics such as backtesting cross validation. For those seeking an introduction to Keras in R, please check out Customer Analytics: Using Deep Learning With Keras To Predict Customer Churn.
In this tutorial, you will learn how to:
- Develop a Stateful LSTM Model with the
keras
package, which connects to the R TensorFlow backend.
- Apply a Keras Stateful LSTM Model to a famous time series, Sunspots.
- Perform Time Series Cross Validation using Backtesting with the
rsample
package rolling forecast origin resampling.
- Visualize Backtest Sampling Plans and Prediction Results with
ggplot2
and cowplot
.
- Evaluate whether or not a time series may be a good candidate for an LSTM model by reviewing the Autocorrelation Function (ACF) plot.
The end result is a high performance deep learning algorithm that does an excellent job at predicting ten years of sunspots! Here’s the plot of the Backtested Keras Stateful LSTM Model.
Applications in Business
Time series prediction (forecasting) has a dramatic effect on the top and bottom line. In business, we could be interested in predicting which day of the month, quarter, or year that large expenditures are going to occur or we could be interested in understanding how the consumer price index (CPI) will change over the course of the next six months. These are common questions that impact organizations both on a microeconomic and macroeconomic level. While the data set used in this tutorial is not a “business” data set, it shows the power of the tool-problem fit, meaning that using the right tool for the job can offer tremendous improvements in accuracy. The net result is increased prediction accuracy ultimately leads to quantifiable improvements to the top and bottom line.
Long Short-Term Memory (LSTM) Models
A Long Short-Term Memory (LSTM) model is a powerful type of recurrent neural network (RNN). The blog article, “Understanding LSTM Networks”, does an excellent job at explaining the underlying complexity in an easy to understand way. Here’s an image depicting the LSTM internal cell architecture that enables it to persist for long term states (in addition to short term), which traditional RNN’s have difficulty with:
Source: Understanding LSTM Networks
LSTMs are quite useful in time series prediction tasks involving autocorrelation, the presence of correlation between the time series and lagged versions of itself, because of their ability to maintain state and recognize patterns over the length of the time series. The recurrent architecture enables the states to persist, or communicate between updates of the weights as each epoch progresses. Further, the LSTM cell architecture enhances the RNN by enabling long term persistence in addition to short term, which is fascinating!
In Keras, LSTM’s can be operated in a “stateful” mode, which according to the Keras documentation:
The last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch
In normal (or “stateless”) mode, Keras shuffles the samples, and the dependencies between the time series and the lagged version of itself are lost. However, when run in “stateful” mode, we can often get high accuracy results by leveraging the autocorrelations present in the time series.
We’ll explain more as we go through this tutorial. For now, just understand that LSTM’s can be really useful for time series problems involving autocorrelation and Keras has the capability to create stateful LSTMs that are perfect for time series modeling.
Sunspots Data Set
Sunspots is a famous data set that ships with R (refer to the datasets
library). The dataset tracks sunspots, which are the occurrence of a dark spot on the sun. Here’s an image from NASA showing the solar phenomenon. Pretty cool!
Source: NASA
The dataset we use in this tutorial is called sunspots.month
, and it contains 265 years (from 1749 through 2013) of monthly data on number of sunspots per month.
Implementing An LSTM To Predict Sunspots
Time to get to business. Let’s predict sunspots. Here’s our objective:
Objective:
Use an LSTM model to generate a forecast of sunspots that spans 10-years into the future.
1.0 Libraries
Here are the libraries needed for this tutorial. All are available on CRAN. You can install with install.packages()
if you do not have them installed already. NOTE: Before proceeding with this code tutorial, make sure to update all packages as previous versions of these packages may not be compatible with the code used. One issue we are aware of is upgrading to lubridate
version 1.7.4.
If you have not previously run Keras in R, you will need to install Keras using the install_keras()
function.
2.0 Data
The dataset, sunspot.month
, is available for all of us (it ships with base R). It’s a ts
class (not tidy), so we’ll convert to a tidy data set using the tk_tbl()
function from timetk
. We use this instead of as.tibble()
from tibble
to automatically preserve the time series index as a zoo
yearmon
index. Last, we’ll convert the zoo
index to date using lubridate::as_date()
(loaded with tidyquant
) and then change to a tbl_time
object to make time series operations easier.
3.0 Exploratory Data Analysis
The time series is long (265 years!). We can visualize the time series both full (265 years) and zoomed in on the first 50 years to get a feel for the series.
3.1 Visualizing Sunspot Data With Cowplot
We’ll make to ggplot
s and combine them using cowplot::plot_grid()
. Note that for the zoomed in plot, we make use of tibbletime::time_filter()
, which is an easy way to perform time-based filtering.
At first glance, it looks like this series should be easy to predict. However, we can see that the cycle (10-year frequency) and amplitude (count of sunspots) seems to change at least between years 1780 and 1800. This creates some challenges.
3.2 Evaluating The ACF
The next thing we can do is determine whether or not an LSTM model may be a good approach. The LSTM will leverage autocorrelation to generate sequence predictions. Our goal is to produce a 10-year forecast using batch forecasting (a technique for creating a single forecast batch across the forecast region, which is in contrast to a single-prediction that is iteratively performed one or several steps into the future). The batch prediction will only work if the autocorrelation used is beyond ten years. Let’s inspect.
First, we need to review the Autocorrelation Function (ACF), which is the correlation between the time series of interest in lagged versions of itself. The acf()
function from the stats
library returns the ACF values for each lag as a plot. However, we’d like to get the ACF values as data so we can investigate the underlying data. To do so, we’ll create a custom function, tidy_acf()
, to return the ACF values in a tidy tibble.
Next, let’s test the function out to make sure it works as intended. The function takes our tidy time series, extracts the value column, and returns the ACF values along with the associated lag in a tibble format. We have 601 autocorrelation (one for the time series and it’s 600 lags). All looks good.
Let’s plot the ACF with ggplot2
to determine if a high-autocorrelation lag exists beyond 10 years.
This is good news. We have autocorrelation in excess of 0.5 beyond lag 120 (the 10-year mark). We can theoretically use one of the high autocorrelation lags to develop an LSTM model.
Upon inspection, the optimal lag occurs at lag 125. This isn’t necessarily the one we will use since we have more to consider with batch forecasting with a Keras LSTM. With that said, here’s how you can filter()
to get the best lag.
4.0 Backtesting: Time Series Cross Validation
Cross validation is the process of developing models on sub-sampled data against a validation set with the goal of determining an expected accuracy level and error range. Time series is a bit different than non-sequential data when it comes to cross validation. Specifically, the time dependency on previous time samples must be preserved when developing a sampling plan. We can create a cross validation sampling plan using by offsetting the window used to select sequential sub-samples. In finance, this type of analysis is often called “Backtesting”, which takes a time series and splits it into multiple uninterupted sequences offset at various windows that can be tested for strategies on both current and past observations.
A recent development is the rsample
package, which makes cross validation sampling plans very easy to implement. Further, the rsample
package has Backtesting covered. The vignette, “Time Series Analysis Example”, describes a procedure that uses the rolling_origin()
function to create samples designed for time series cross validation. We’ll use this approach.
4.1 Developing A Backtesting Strategy
The sampling plan we create uses 50 years (initial
= 12 x 50 samples) for the training set and ten years (assess
= 12 x 10) for the testing (validation) set. We select a skip
span of twenty years (skip
= 12 x 20) to evenly distribute the samples into 11 sets that span the entire 265 years of sunspots history. Last, we select cumulative = FALSE
to allow the origin to shift which ensures that models on more recent data are not given an unfair advantage (more observations) that those on less recent data. The tibble return contains the rolling_origin_resamples
.
4.2 Visualizing The Backtesting Strategy
We can visualize the resamples with two custom functions. The first, plot_split()
, plots one of the resampling splits using ggplot2
. Note that an expand_y_axis
argument is added to expand the date range to the full sun_spots
dataset date range. This will become useful when we visualize all plots together.
The plot_split()
function takes one split (in this case Slice01), and returns a visual of the sampling strategy. We expand the axis to the range for the full dataset using expand_y_axis = TRUE
.
The second function, plot_sampling_plan()
, scales the plot_split()
function to all of the samples using purrr
and cowplot
.
We can now visualize the ENTIRE BACKTESTING STRATEGY with plot_sampling_plan()
! We can see how the sampling plan shifts the sampling window with each progressive slice of the train/test splits.
And, we can set expand_y_axis = FALSE
to zoom in on the samples.
We’ll use this Backtesting Strategy (11 samples from one time series each with 50/10 split in years and a 20 year offset) when testing the veracity of the LSTM model on the sunspots dataset.
5.0 Modeling The Keras Stateful LSTM Model
To begin, we’ll develop a single Keras Stateful LSTM model on a single sample from the Backtesting Strategy. We’ll then scale the model to all samples to investigate/validate the modeling performance.
5.1 Single LSTM
For the single LSTM model, we’ll select and visualize the split for the most recent time sample/slice (Slice11). The 11th split contains the most recent data.
5.1.1 Visualizing The Split
We can reuse the plot_split()
function to visualize the split. Set expand_y_axis = FALSE
to zoom in on the subsample.
5.1.2 Data Setup
First, let’s combine the training and testing data sets into a single data set with a column key
that specifies what set they came from (either “training” or “testing)”. Note that the tbl_time
object will need to have the index re-specified during the bind_rows()
step, but this issue should be corrected in dplyr
soon (please upvote it so RStudio focuses on it).
5.1.3 Preprocessing With Recipes
The LSTM algorithm requires the input data to be centered and scaled. We can preprocess the data using the recipes
package. We’ll use a combination of step_sqrt
to transform the data and reduce the presence of outliers and step_center
and step_scale
to center and scale the data. The data is processed/transformed using the bake()
function.
Next, let’s capture the center/scale history so we can invert the center and scaling after modeling. The square-root transformation can be inverted by squaring the inverted center/scale values.
5.1.4 LSTM Plan
We need a plan for building the LSTM. First, a couple PRO TIPS with LSTM’s:
Tensor Format:
- Predictors (X) must be a 3D Array with dimensions: [samples, timesteps, features]: The first dimension is the length of values, the second is the number of time steps (lags), and the third is the number of predictors (1 if univariate or n if multivariate)
- Outcomes/Targets (y) must be a 2D Array with dimensions: [samples, timesteps]: The first dimension is the length of values and the second is the number of time steps (lags)
Training/Testing:
- The training and testing length must be evenly divisible (e.g. training length / testing length must be a whole number)
Batch Size:
- The batch size is the number of training examples in one forward/backward pass of a RNN before a weight update
- The batch size must be evenly divisible into both the training an testing lengths (e.g. training length / batch size and testing length / batch size must both be whole numbers)
Time Steps:
- A time step is the number of lags included in the training/testing set
- For our example, our we use a single lag
Epochs:
- The epochs are the total number of forward/backward pass iterations
- Typically more improves model performance unless overfitting occurs at which time the validation accuracy/loss will not improve
Taking this in, we can come up with a plan. We’ll select a prediction of window 120 months (10 years) or the length of our test set. The best correlation occurs at 125, but this is not evenly divisible by the forecasting range. We could increase the forecast horizon, but this offers a minimal increase in autocorrelation. We can select a batch size of 40 units which evenly divides into the number of testing and training observations. We select time steps = 1, which is because we are only using one lag. Finally, we set epochs = 300
, but this will need to be adjusted to balance the bias/variance tradeoff.
5.1.5 2D And 3D Train/Test Arrays
We can then setup the training and testing sets in the correct format (arrays) as follows. Remember, LSTM’s need 3D arrays for predictors (X) and 2D arrays for outcomes/targets (y).
5.1.6 Building The LSTM Model
We can build an LSTM model using the keras_model_sequential()
and adding layers like stacking bricks. We’ll use two LSTM layers each with 50 units. The first LSTM layer takes the required input shape, which is the [time steps, number of features]. The batch size is just our batch size. We set the first layer to return_sequences = TRUE
and stateful = TRUE
. The second layer is the same with the exception of batch_size
, which only needs to be specified in the first layer, and return_sequences = FALSE
which does not return the time stamp dimension (2D shape is returned versus a 3D shape from the first LSTM). We tack on a layer_dense(units = 1)
, which is the standard ending to a keras sequential model. Last, we compile()
using the loss = "mae"
and the popular optimizer = "adam"
.
5.1.7 Fitting The LSTM Model
Next, we can fit our stateful LSTM using a for
loop (we do this to manually reset states). This will take a minute or so for 300 epochs to run. We set shuffle = FALSE
to preserve sequences, and we manually reset the states after each epoch using reset_states()
.
5.1.8 Predicting Using The LSTM Model
We can then make predictions on the test set, x_test_arr
, using the predict()
function. We can retransform our predictions using the scale_history
and center_history
, which were previously saved and then squaring the result. Finally, we combine the predictions with the original data in one column using reduce()
and a custom time_bind_rows()
function.
We can use the yardstick
package to assess performance using the rmse()
function, which returns the root mean squared error (RMSE). Our data is in the long format (optimal format for visualizing with ggplot2
), so we’ll create a wrapper function calc_rmse()
that processes the data into the format needed for yardstick::rmse()
.
We can inspect the RMSE on the model.
The RMSE doesn’t tell us the story. We need to visualize. Note - The RMSE will come in handy in determining an expected error when we scale to all samples in the backtesting strategy.
5.1.10 Visualizing The Single Prediction
Next, we make a plotting function, plot_prediction()
, using ggplot2
to visualize the results for a single sample.
We can test out the plotting function setting the id = split_id
, which is “Slice11”.
The LSTM performed relatively well! The settings we selected seem to produce a decent model that picked up the trend. It jumped the gun on the next up-trend, but overall it exceeded my expectations. Now, we need to Backtest to see the true performance over time!
5.2 Backtesting The LSTM On All Eleven Samples
Once we have the LSTM working for one sample, scaling to all 11 is relatively simple. We just need to create an prediction function that can be mapped to the sampling plan data contained in rolling_origin_resamples
.
5.2.1 Creating An LSTM Prediction Function
This step looks intimidating, but it’s actually straightforward. We copy the code from Section 5.1 - Single LSTM into a function. We’ll make it a safe function, which is a good practice for any long-running mapping processes to prevent a single failure from stopping the entire process.
We can test the custom predict_keras_lstm()
function out with 10 epochs. It returns the data in long format with “actual” and “predict” values in the key column.
5.2.2 Mapping The LSTM Prediction Function Over The 11 Samples
With the predict_keras_lstm()
function in hand that works on one split, we can now map to all samples using a mutate()
and map()
combo. The predictions will be stored in a “list” column called “predict”. Note that this will take 5-10 minutes or so to complete.
We now have the predictions in the column “predict” for all 11 splits!.
We can assess the RMSE by mapping the calc_rmse()
function to the “predict” column.
And, we can summarize the RMSE for the 11 slices. PRO TIP: Using the average and standard deviation of the RMSE (or other similar metric) is a good way to compare the performance of various models.
5.2.4 Visualizing The Backtest Results
We can create a plot_predictions()
function that returns one plot with the predictions for the entire set of 11 backtesting samples!!!
Here’s the result. On a data set that’s not easy to predict, this is quite impressive!!
5.3 Predicting The Next 10 Years
We can predict the next 10 years by adjusting the prediction function to work with the full data set. A new function, predict_keras_lstm_future()
, is created that predicts the next 120 time steps (or ten years).
Next, run predict_keras_lstm_future()
on the sun_spots
data set.
Last, we can visualize the forecast with the plot_prediction()
function, setting id = NULL
. We can use filter_time()
to zoom in on the dataset since 1900.
Conclusions
This article demonstrates the power of a Stateful LSTM built using the keras
package. It’s surprising how well the deep learning approach learned the trend given that the only feature provided was the lag at 120. The backtested model had an Average RMSE of 34 and a Standard Deviation RMSE of 13. While not shown in the article, we tested this against ARIMA and prophet, and the LSTM outperformed both by a more than 30% reduction in average error and 40% reduction in standard deviation. This shows the benefit of machine learning tool-application fit.
Beyond the deep learning approach used, the article also exposed methods to determine whether an LSTM is a good option for the given time series using ACF plot. We also exposed how the veracity of a time series model should be benchmarked by backtesting, a strategy that can be used for cross validation that preserves the sequential aspects of the time series.
Business Science University
Enjoy data science for business? We do too. This is why we created Business Science University where we teach you how to do Data Science For Busines (#DS4B) just like us!
Who is this course for?
Anyone that is interested in applying data science in a business context (we call this DS4B). All you need is basic R
, dplyr
, and ggplot2
experience. If you understood this article, you are qualified.
What do you get it out of it?
You learn everything you need to know about how to apply data science in a business context:
-
Using ROI-driven data science taught from consulting experience!
-
Solve high-impact problems (e.g. $15M Employee Attrition Problem)
-
Use advanced, bleeding-edge machine learning algorithms (e.g. H2O, LIME)
-
Apply systematic data science frameworks (e.g. Business Science Problem Framework)
New R Package: Anomalize For Scalable Time Series Anomaly Detection
We just released our package, anomalize
. You can learn more from the Anomalize Documentation or by watching our 2-minute introductory video!
About Business Science
Business Science specializes in “ROI-driven data science”. We offer training, education, coding expertise, and data science consulting related to business and finance. Our latest creation is Business Science University, which is coming soon! In addition, we deliver about 80% of our effort into the open source data science community in the form of software and our Business Science blog. Visit Business Science on the web or contact us to learn more!
Don’t Miss A Beat
Connect With Business Science
If you like our software (anomalize
, tidyquant
, tibbletime
, timetk
, and sweep
), our courses, and our company, you can connect with us: