Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Ernie Chan's "Gold vs. gold-miners" stat arb

UPDATE: This algorithm is now out-of-date. Please find the new version at https://www.quantopian.com/posts/fixed-version-of-ernie-chans-gold-vs-gold-miners-stat-arb.

Due Credit

Ernie Chan is the author of Quantitative Trading, which is a terrific and accessible intro to algorithmic trading concepts. Ernie also maintains a blog, where he generously explains concepts and provides examples - one of my favorites is his post about exploring an arbitrage between Gold and gold-mining companies. Exchange Traded Funds (ETFs) exist for both gold the commodity and gold-mining companies, so it is pretty easy to explore Ernie's idea in Quantopian. Ernie also provides code with his book that demonstrates how to research this arbitrage opportunity, which helped me understand the statistical relationship the strategy exploits.

Lots of fundamental investors will take a view on commodity/company arbitrage trades like this one, so I was really interested in how an algorithmic investing approach would fare.

Ernie's book suggested GLD for the commodity, and GLX for mining companies. Both are ETFs in the Quantopian history, so I used them here.

How it works

The relationships used in the strategy are based on the spread between the stock prices. First an Ordinary Least Squares (OLS) calculation is done between the pricing histories. I think of OLS as a tool to fit a linear model to a scatter plot of the prices of GLD vs. the prices of GLX. To create the OLS required a bit of advanced coding in Quantopian - I used our (as of yet private, but soon to be opensource) EventWindow classes to create a rolling OLS.

Once OLS is calculated, you have a beta factor that can be used to normalize the spread between the two stocks. Then we calculated a z score, which is basically the number of standard deviations the current spread is from the average spread for the trailing window. If the spread is wide, the z score will be a positive number representing the number of standard deviations above the average. If the spread is narrow, it will be a negative number. When the spread is wide we sell GLD and buy GLX, with the expectation that the spread will return to normal. When the spread is narrow, we instead buy GLD and sell GLX. You can see the symmetry in the $ value of the bets over time if you clone the algo, and run it (be sure to start after June 2006).

I did a number of backtests - nearly 30 - to debug and then explore parameter values for the bet size (in # of shares) and length of the trailing window (I settled on 6 months). I also played around with different lengths for the average and stddev vs. the OLS.

Clone me!

I found it extremely fun to follow Ernie's lead, so in addition to sharing the backtest below, I wanted to throw out a few ideas for variations on this strategy that you can try by cloning my code and making small edits:

  • Ernie has another post about the same kind of arbitrage between energy stocks and energy commodity futures. It would be cool to try this strategy with ETFs from those areas.
  • Alcoa sells aluminum, but they are tightly coupled to natural gas prices because they use so much in the smelting of aluminum ore. I've always wondered if there is a statistical relationship that echoes the fundamental one.
  • It would be fun to try this statarb approach on a merger arbitrage. In that case, you might want to add a bias for a merger to either happen or not, but it would be neat to trade the spread minute to minute on a deal.
  • The airlines are always on watch when oil goes up. Same for Fedex and UPS. Maybe there is a relationship there.
  • GLD and GLX seem to have maintained a high level of cointegration, but there are periods like Q4 2008 when the relationship changes suddenly. It would be challenging to modify this algo to detect a break down in the cointegration of the two stocks. Maybe something as simple as tracking the size of a standard deviation might be interesting.

The other piece of this algorithm that may be fun to tweak is the order code. I didn't do anything fancy, but I did add a condition to reduce the leverage if the spread falls within one standard deviation of the average. I think it would be really interesting to adjust the amount of stock bought/sold based on the width of the spread. So instead of just making the same bet when the z factor is -2 vs. -4, maybe doubling your orders at -4 would be interesting!

Have fun,
fawce

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

15 responses

The OLS implementation is from statsmodels.

Very interesting, thanks for sharing!

Here is an interesting blog post on detecting when this strategy ceases to work.

A very easy adaptation that we could do here because we are constantly recomputing the cointegration is to only trade if the cointegration is above a certain threshold. So if those two ETFs stop tracking each other we stop investing.

The 3-way cointegration analysis also looks very interesting.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

fawce pointed out you using statsmodels. But I still feel lost.
What is zipline package? Why you can import it?
And what else packages I can use?

@Xingzhong

Thanks for the question - this example is a bit sneaky. I am using zipline, which is a set of tools for developing your own trading signals and creating backtests. Quantopian developed zipline, and we will be open sourcing it very soon. In fact, if you'd like to check out the code now, you can email feedback at quantopian with your github user name, and we will add you to the preview.

We regularly update the available modules. As of right now, algorithms can only import the following modules:

  • bisect
  • cmath
  • collections
  • datetime
  • functools
  • heapq
  • itertools
  • math
  • numpy
  • pandas
  • Queue
  • random
  • scipy
  • statsmodels
  • re
  • time
  • zipline

We update the module list very regularly based on user requests. To try an import of module, just add import xzy to your algorithm code. If we don't support it, you'll get a message with a link to our current whitelist (we update regularly). If there is something you'd like added to the whitelist, just let us know. If we think the module is safe, we'll add it and you can start using it.

Thanks fawce. I figure it out.
numpy, scipy, and pandas those support are really out of my Imagination.
It will definitely make lots of life easy.

We live off numpy/scipy/pandas and sklearn(!), please add the latter to your namespace.

wow! Are you guys really doing the algorithmic backtesting. This is the best place to learn the machine learning.

@Vishal, we'll have it in before the end of today.

Seems like this code does not build anymore, I get 'Algorithm compile error' on line 1.
Any idea how to fix it?

when I type into my own python console "from zipline.gens. " and press tab, there is no module named transform...what's the deal. How do I learn more about what ziplien has to offer

ah, answered my own question: " email feedback at quantopian with your github user name, and we will add you to the preview."

@Taylor - You're running into a problem with the example code, I will post an updated version of the code here shortly.

In the meantime, we open sourced the zipline code a little while back - you can get it all here: https://github.com/quantopian/zipline
One of my teammates, Thomas, presented the whole framework at PyData, and all of his materials are online and linked from here: https://www.quantopian.com/posts/hello-from-pydata

thanks,
fawce

thanks, john, for keeping me abreast. i'll check it out this weekend

I cloned this and tried to run it but got some errors. I adapted it to what I think has changed in the API but still it complained that handle_data method was missing even though it's there.

The original version of this algorithm ceased working due to changes in our API. Here is a fixed version (with a separate thread here).