Data Trends for Investment Professionals


Using Quandl in R


Our mantra here at Quandl is making data easy to find and easy to use. Following that goal we (and subsequently the community) have created packages that integrate Quandl’s API into a number of software platforms. Today we’ll take a look at R.

R is a free statistical computing language created in 1993 based on an implementation of the S computing language. It has many packages written by its community which keep its methods on the cutting edge of statistical analysis.

Quandl’s API wrapped within R makes the tedious aspects of getting data into your console trivial, and gets you doing the real work faster. I’ll give a few examples of how Quandl’s data can be used in R, and seeing how YAHOO has just acquired Tumblr, it seems fitting to use their stock price as our base data.

To access this data we need to know its Quandl code. In this case it is “GOOG/NASDAQ_YHOO”. You can find Quandl codes using our website.

Make a complicated plot

R is well known for its graphing capabilities. If you need a graph tailored to your precise analytic needs, R can handle that easily.

Using a graphing package such as ggplot2 you can create custom graphs very easily. With a few lines of code you can take Quandl data and turn it into this:

Decompose a time series

Because of the large number of contributors to R, it contains numerous time series formats and they each interact with different packages. The Quandl package returns data in a number of them. In this example I return the data in the native R time series format and pass it to a function to decompose it into its seasonal and trend components.

data <- Quandl("GOOG/NASDAQ_YHOO", collapse="monthly", type="ts")
plot(stl(data[,4],s.window="periodic"), main = "YHOO Decomposition")

Calculate Trends

The zoo time series format handles irregularly spaced time series – like daily stock prices. Returning data in this format allows for the easy calculation of things that require consideration of the date, like a 200 day moving average or volatility.

data <- Quandl("GOOG/NASDAQ_YHOO",type="zoo")
rolling_average <- rollapply(data[,4],200,mean)
rolling_volatility <- rollapply(data[,4],200,sd)

Calculate Financial Indicators

Using a format like zoo also makes it easy to match two time series along their dates. This makes calculations involving multiple prices over different time periods much easier. We can use this to easily find the beta of gold prices from Bundesbank and Brent crude oil prices from the U.S. Department of energy.

The calculation of beta takes a couple of steps.

  • Load gold prices and Brent crude oil prices
  • Convert the values to daily returns
  • Match price returns along the dates
  • Perform Regression

Quandl takes care of the first two steps for you. First I’ll load the datasets into R, and apply the “rdiff” transformation (their percent returns) in the same step.

gold <- Quandl("BUNDESBANK/BBK01_WT5511", type="zoo", transformation="rdiff")
oil <- Quandl("DOE/RBRTE", type="zoo", transformation="rdiff")

Then I can use the zoo function “merge” to match up the two stocks along the dates.

beta_data <- merge(gold, oil)

Now all that’s left is to regress gold against oil.

lm(coredata(beta_data[,1]) ~ coredata(beta_data[,2]), na.action=na.omit)$coef[2]

> 0.0515

These are just some very basic things I did in R with one dataset. What can you do with over 6 million?

Fix This
Created with Sketch.