tidyquant 0.3.0: ggplot2 Enhancements, Real-Time Data, and More
Written
tidyquant
, version 0.3.0, is a pretty sizable release that includes a little bit for everyone, including new financial charting and moving average geoms for use with ggplot2
, a new tq_get
get option called "key.stats"
for retrieving real-time stock information, and several nice integrations that improve the ease of scaling your analyses. If your not already familiar with tidyquant
, it integrates the best quantitative resources for collecting and analyzing quantitative data, xts
, zoo
, quantmod
and TTR
, with the tidyverse
allowing for seamless interaction between each. I’ll briefly touch on some of the updates by going through some neat examples. The package is open source, and you can view the code on the tidyquant github page.
Table of Contents
v0.3.0 Updates
tidyquant: Bringing financial analysis to the tidyverse
When I said this was a big release, I wasn’t kidding. We have some major enhancements in tidyquant
:
-
Financial Visualizations for ggplot2: Candlestick charts, barcharts, moving averages and Bollinger Bands can be used in the ggplot
“grammar of graphics” workflow. There’s a new vignette, Charting with tidyquant, that details the new financial charting capabilities.
- Key stats from Yahoo Finance: Users can now get 55 different key statistics in real-time from Yahoo Finance with the new
"key.stats"
get option. The statistics include Bid, Ask, Day’s High, Day’s Low, Last Trade Price, Current P/E Ratio, and many more most of which change throughout the day. With the addition of the key statistics, tq_get
is now truly a one-stop shop for financial information. The user can now get:
- Real-time key stock statistics with
"key.stats"
- Historical key ratios and financial information over the past 10-years with
"key.ratios"
- Quarterly and annual financial statement data with
"financials"
- Historical daily stock prices with
"stock.prices"
- Stock indexes for 18 different indexes with
"stock.index"
- And more!
-
Enhancements that Make Scaling Financial Analysis Simple:
-
tq_get
now accepts multiple stocks in the form of either a character vector (e.g. c("AAPL", "GOOG", "FB")
) or a data frame with the stocks in the first column. This means scaling is ridiculously simple now. A call to tq_get(c("AAPL", "GOOG", "FB"), get = "stock.prices")
now gets the 10-years of daily stock prices for all three stocks in one data frame!
-
tq_mutate
and tq_transform
now work with grouped data frames. This means that you can extend the xts
, zoo
, quantmod
and TTR
functions to grouped data frames the same way that you can with dplyr::mutate
. In addition, you can now more easily rename the transformed / mutated data frame, with the col_rename
argument. All of this saves you time and requires less code!
This concludes the major changes. Now, let’s go through some examples!
Prerequisites
First, update to tidyquant
v0.3.0.
Next, load tidyquant
.
I also recommend the open-source RStudio IDE, which makes R Programming easy and efficient.
Examples
We’ve got some neat examples to show off the new capabilities:
-
Enhanced Financial Data Visualizations: We’ll check out how to use the new tidyquant
geoms with ggplot2
, which provide great visualizations for time-series and stock data!
-
Working with Key Statistics: We’ll investigate the new tq_get
get option, get = "key.stats"
, which enables access to real-time, intraday trading information!
-
Scaling Your Analysis: We’ll test out some of the new scaling features that make it even easier to scale your analysis from one security to many!
Example 1: Enhanced Financial Data Visualizations
I absolutely love these new ggplot geoms that come packaged with tidyquant
, and I’m really excited to show them off! Two new chart types come packaged with tidyquant
v0.3.0: geom_candlestick
and geom_barchart
(not to be confused with geom_bar
). In this post, we’ll focus on the candlestick chart, but the barchart works in a very similar manner.
Before we start, let’s get some data using tq_get
. The first call gets a single stock (nothing new here), and the second call retrieves the FANG stocks using the new scaling functionality by piping (%>%
) a character vector of symbols to tq_get
(there are other ways too!).
Before v0.3.0, we used geom_line
to create a line chart like so. Note that coord_x_date
is a new tidyquant
coordinate function that enables zooming in a part of the chart without out-of-bounds data loss (scale_x_date
is similar but causes out-of-bounds data loss which wreaks havoc on moving average geoms).
With tidyquant
, we can replace the geom_line
with geom_candlestick
to create a beautiful candlestick chart that shows open, high, low, close, and direction visually. The only real difference is that we need to specify the aesthetic arguments, open
, high
, low
and close
. Everything else can stay the same.
Pretty sweet! Let’s take this a step further with moving averages. The moving average geom, geom_ma
, is used to quickly draw moving average lines using a moving average function, ma_fun
, that is one of seven from the TTR
package. We can use these to “rapid prototype” moving averages, enabling us to quickly identify changes in trends. Let’s add 15 and 50-day moving averages. Note that geom_ma
takes arguments to control the moving average function (ma_fun = SMA
and n = 15
) and arguments to control the line such as color = "red"
or linetype = 4
.
We can also use Bollinger Bands to help visualize volatility. BBands take a moving average, such as ma_fun = SMA
from TTR
, and a standard deviation, sd = 2
by default. Because BBands depend on the high, low and close prices, we need to add these as aesthetic arguments. Let’s use a 20-day simple moving average with two standard deviations. We can see that there were two periods, one in October and one in November, that had higher volatility.
Last, we can visualize multiple stocks at once by adding a group
aesthetic and tacking on a facet_wrap
at the end of the ggplot
workflow. Note that the out-of-bounds data becomes important to the scale of the facet: too much data and the y-axis is off scale, too little data and the moving average is thrown off. An easy way to adjust is to use filter()
to subtract double the moving average number of periods (2 * n
) from the start date of the data. This reduces the out-of-bounds data without eliminating data that the moving average function needs for calculations.
Example 2: Working with Key Statistics
New to tq_get
is the get option get = "key.stats"
. So, what are key stats? Yahoo Finance has an amazing list of real-time statistics such as bid price, ask price, day’s high, day’s low, change, and many more features that change throughout the day. Key stats are our access to live data, the most current features of a stock / company, many of which are accurate to the second that they are retrieved. Pretty neat!
Getting Key Stats
Let’s get some key stats, and see what’s inside. We get key stats using the tq_get
function, setting get = "key.stats"
. When we show the data, it’s kind of messy (there’s a reason) so I’ve just listed the first ten column names. It comes in the form of a one row tibble (tidy data frame) that has 55 columns, one for each key stat.
The reason that the data comes this way is because, using the new scaling capability, we can get key stats for multiple stocks, and the rows get stacked on top of each other. This makes comparing key stats very easy!
Retrieve Real-Time Data at Periodic Intervals
Something great about real-time data is that it can be collected at periodic intervals when trading is in-session! The following code chunk when run will retrieve stock prices at a periodic interval:
Comparing Historical Data to Current Data
We now have get = "key.stats"
for current stats and with v0.2.0 we got get = "key.ratios"
for 10-years of historical ratios. When combined, we can now compare current attributes to historical trends. To put into perspective, we will investigate the P/E Ratio: Comparing Historical Trends Versus Current Value for AAPL. The P/E ratio is a measure of the stock valuation. Stocks are considered “expensive” when they trade above historical averages or above industry averages.
We already have the key stats from AAPL, so getting the current P/E Ratio is very easy.
Due to the amount of data and time-series nature, the key ratios come as a nested tibble, grouped by section type.
We need to get the historical P/E Ratios, which are in the “Valuation Ratios” section. We will do a series of filtering and unnesting to peel away the layers and isolate the “Price to Earnings” time-series data.
Now, we are ready to visualize the P/E Ratio: Comparing Historical Trends Versus Current Value for AAPL. The visualization below is inspired by r-statistics.co, an awesome resource for ggplot2
and R analysis. We add the following:
- Geoms:
geom_line()
and geom_point()
to chart the historical data
geom_ma()
to chart the three period simple moving average (the three period average helps identify the trend through the noise)
geom_hline()
to add a horizontal line at current P/E Ratio obtained from key stats.
- Legend: We manipulate the colors with
scale_color_manual()
and the position in the theme()
function.
- Logo: A logo is generated as a
grob
(grid graphical object) using the grid
and png
packages. The function annotate_custom()
allows us to simply add to the ggplot workflow. See Add an Image to Background for a tutorial.
The chart shows that the current valuation is slightly above the recent historical valuation indicating that the stock prices is slightly “expensive”. However, given that the P/E ratio is below the current SP500 average of 25, courtesy of www.multpl.com, one could also consider this stock “inexpensive”. It just depends on your perspective. :)
Example 3: Scaling Your Analysis
Probably the single most important benefit of performing financial analysis in the tidyverse
is the ability to scale. Based on some excellent feedback from @KanAugust, I have made scaling even easier. There’s two new options for scaling:
New Option 1: Passing a character vector of symbols:
Send a character vector in the form c("X", "Y", "Z")
to tq_get
. A new column is generated, symbol.x
, with the symbols that were passed to the x
argument.
New Option 2: Passing a tibble with symbols in the first column:
We can combine tq_get
calls using get = "stock.index"
and get = "stock.prices"
to pass a stock index to get stock prices. I’ve added slice(1:3)
to get the first three stocks from the index, which reduces the download time. If you remove slice(1:3)
, you will get the historical prices for all stocks in an index in the next step!
First, get stocks from an index.
Then get stock prices. Note that symbols must be in the first column.
We can also use tq_mutate
and tq_transform
with dplyr::group_by
to scale analyses! Thanks to some great feedback from @dvaughan32, the col_rename
argument is available to conveniently rename the newly transformed / mutated columns.
Here’s a powerful example: We can use group_by
and tq_transform
to collect annual returns for a tibble of stock prices for multiple stocks. The result can be piped to ggplot
for charting.
Conclusions
The tidyquant
package has several enhancements for financial analysis:
-
New ggplot2
geoms for candlestick charts, barcharts, moving averages, and Bollinger Bands, and a brand new vignette to help guide users on charting capabilities.
-
New get = "key.stats"
for current stats on stocks: 55 total are available. The key stats compliment the key ratios (`get = “key.ratios”), which contain 10-years of historical information on various key ratios and financial information.
-
New capabilities for scaling financial analyses to many stocks:
- Using
tq_get
with character vectors or tibbles of stocks
- Using
tq_mutate
/ tq_transform
with dplyr::group_by
With these updates, we can really do full financial analyses without ever leaving the tidyverse
!
Recap
We went over a few examples to illustrate the main updates to tidyquant
:
-
The first example showed an implementation of several new tidyquant
geoms that work with ggplot2
: geom_candlestick
/ geom_barchart
, geom_ma
, and geom_bbands
.
-
The second example showed use of the new tq_get
get option, get = "key.stats"
. The key stats provide real-time data from Yahoo Finance, and are a handy complement to the historical data provided using get options, "stock.prices"
, "key.ratios"
, and "financials"
.
-
The third and final example showed some of the improvements in scaling analysis with the tidyverse
. You can now pipe multiple symbols into tq_get
to scale any of the get options, and you can use tq_mutate
and tq_transform
with dplyr::group_by
.
I hope you enjoy the new features as much as I did creating them. As always there’s more to come! :)
Further Reading
-
r-statistics.co: You need to check out this website, which contains a wealth of quality, up-to-date R information. The Top 50 ggplot2 visualizations is amazing. This is now my go-to reference on ggplot2
.
-
Tidyquant Vignettes: This tutorial just scratches the surface of tidyquant
. The vignettes explain much, much more!
-
R for Data Science: A free book that thoroughly covers the tidyverse
packages.
-
Quantmod Website: Covers many of the quantmod
functions. Also, see the quantmod CRAN site.
-
Extensible Time-Series Website: Covers many of the xts
functions. Also, see the xts vignette.
-
TTR on CRAN: The reference manual covers each of the TTR
functions.
-
Zoo Vignettes: Covers the zoo
rollapply functions as well as other usage.