Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Fixed version of Ernie Chan's "Gold vs. gold-miners" stat arb

This is a fixed version of the Gold vs. gold-miners algorithm. Do not use this version, but see below for a fixed version that uses the batch_transform to compute the regression.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

21 responses

Hello Thomas and others,

Any idea why the algorithm "stopped working" around mid-2012?

Most probably the uncertainty due to the Fiscal Cliff at the end of the year, money flowing into gold as a 'safe haven' but the basket of gold mining stocks GDX not seeing the same relationship due to market incorporating the increased capital gains tax into the stock price.

Thanks Lyle,

Here's a chart of the GLD & GDX prices:

Just eyeballing the charts, it looks like GDX just flattens out starting in early 2011, while GLD continues to rise (seems to be on the same general upward trend since early 2009). Kinda counter-intuitive...

Interesting analysis! I would be curious if a cointegration statistical test as described here would be sensitive to this and signal to e.g. stop trading this pair.

Thomas,

I looked over the paper a few days ago...not the sort of thing I'll be able to figure out without a significant time investment. Do you understand it? Do you think it could be implemented in Quantopian? I could contact one of the authors for guidance, but wouldn't even know what to ask at this point.

@Grant:

I actually meant to get back to you in the other thread but since this is more recent I'll just reply here.

The paper requires some background in Bayesian statistics and would definitely be a medium-scale project to implement in Quantopian. Luckily, someone else I sent the paper to is currently working on this. While he is using Mathematica for the first iteration it should be possible to port this to python (and by this extension Quantopian) without too much effort. So hopefully within a few weeks we'll have some good code to explore this further :).

@Grant:

I'm in the process of implementing the algorithms described in the Barber paper on Bayesian co-integration because I feel that they could be useful for various types of relative-value trading (e.g. simple pairs). This paper proposes a Bayesian approach to estimating whether or not two signals are co-integrated that avoids some of the pitfalls involved in using Dickey-Fuller methodology.

Initially I want to test the algorithm as described in the paper within Mathematica on simulated stock signals to verify both correctness of the implementation and to study how well it performs under the metrics of most interest to a trader. This investigation will likely suggest other ways to extend the algorithm after which it would make sense to translate the code into Python in order to test it against historical data on the Quantopian platform.

As @Thomas points out above, it will likely take us at least another few weeks to reach that point since there are more than a few steps involved here. You mention the possibility of contacting the authors for guidance. Perhaps they have some Matlab code that they are willing to share. If so, I wouldn't be surprised if it uses some of the routines from the toolbox that accompanies Barber's book on machine learning, which incidentally I recommend downloading from his website for additional background on this subject area.

Thanks rxs,

Here's a talk by the first author of the paper: http://techtalks.tv/talks/bayesian-conditional-cointegration/57453/. I listened to it...the approach is still kinda murky to me, but I'm getting a general sense.

Thomas & rxs,

Intuitively, one might expect an underlying causal relationship that would explain the common trending of GLD and GDX. Simplistically, I'd think that demand for ownership of physical gold (GLD) would result in interest in investing in companies that extract it from the earth (GDX). If so, might there be a detectable time lag between price changes in GLD and GDX? In other words, as a spread develops between GLD and GDX, is GLD driving GDX, or is there no detectable lag down to the minute level? Seems like a Quantopian algorithm could be concocted to have a look at the dynamics, right?

Grant

HI guys, I have played with cointegration in financial time series and it is a very interesting approach looking for mean reversion strategies...
Some comments come to my head reading this and the previous thread:

  • I read the code and I did not find any unit-root test such as ADF or PP or any stationarity test such as KPSS over the residual of the OLS regression. So, actually you do not know if these time series are actually cointegrated. I saw that you are proposing a bayesian approach to look for a cointegration relationship, but maybe a frequentist approach is easy to achieve using statsmodels (the last time I use it, they supported the adf test) because it is already implemented. Maybe, testing with adf test can be used as a filter to know when the pair is no longer cointegrated and not able to be traded.
  • Another idea would be to use a Johansen test to search for the cointegrated vectors... you have a lot of improvements over the adf test (also some problems) and a VECM that you can actually use to make forecasts (i think there is some work in statsmodels to have Johansen, but it is very preliminary).
  • There are some recent papers working in cointegration vectors as functions of time. In this way, the cointegration vector is no longer fixed, and you could catch /fit the dynamic of the stochastic trend underlying both time series, which would be very interesting from a forecasting point of view.

OK, just some ideas to discuss... I will try to get some improvements over this strategy with some additional code maybe implementing some of these ideas.
Cheers.

Damián.

Hi Damián,

Good to see you here :).

I think those are great ideas. Certainly using a prewritten test in statsmodel will be easier than the Bayesian cointegration test (not sure what the current status on that is; @rx?).
Can you post a paper on the cointegration of vectors as a function of time? Is the idea to look for cointegration between more than 2 stocks?

Thomas

Some interesting paper about time varying cointegration:

This is the first reference and is very old now:
http://www.jstor.org/discover/10.2307/3532993?uid=3737512&uid=2&uid=4&sid=21102185134967

Then, there was a "time of silence" and now you have some papers here:

There are more interesting papers, but I read them some years ago... I have to look for them again...

Damián.

Attached is a fixed version that also uses a batch_transform which makes the code a little nice.

Damian: Those are interesting references, I'll check them out as soon as I find some time.

The code looks very clean... as soon as I come back home from SciPyCon Argentina Conference, I will play with this for sure...

Thanks for posting it!

Damián.

Ernie also mentioned half life in his book. I am trying to incorporate it as an exit strategy instead of zscore <= 1.0 and zscore >= -1.0 signal.
I added the calculation of half life to the algorithm, but everything is left intact.
The half life values look off and I'm still working on how to apply it. Perhaps someone can have a go at it.

This is a version of exactly closing your previous position. And One question, should we consider closing our position at the end of backtesting period in any case?

hacked another version with NUGT as the long ....

The same idea is simplified, with negatively correlated constituents.

import quantopian.optimize as opt  
import statsmodels.api as sm  
import numpy as np  
# --------------------------------------------------------------------------------  
BULL = symbol('QQQ'); BEAR = symbol('TLT'); LEN = 21; UB = 1.6; LEV = 1.0; wt = {}  
# --------------------------------------------------------------------------------  
def initialize(context):  
    schedule_function(trade, date_rules.every_day(), time_rules.market_open(minutes = 150))

def trade(context, data):  
    prices = data.history([BULL, BEAR], 'price', LEN*2, '1d')  
    slope = sm.OLS(prices[BULL].values, prices[BEAR].values).fit().params[0]  
    spread = prices[BULL] + slope*prices[BEAR]  
    zscore = (spread[-1] - np.mean(spread[-LEN:])) / np.std(spread[-LEN:])

    if zscore >= UB:    wt[BULL], wt[BEAR] = 0.9*LEV, 0.1*LEV  
    elif zscore <= -UB: wt[BULL], wt[BEAR] = 0.1*LEV, 0.9*LEV

    order_optimal_portfolio(opt.TargetWeights(wt), constraints = [opt.MaxGrossExposure(LEV)])  
    record(zscore = zscore, ub = UB, lb = -UB)  

@Vladimir, nice work. Like your code. Thought I could change some numbers and push performance higher.

@Vladmir, @Guy, the old code doesn't run anymore, do you mind sharing a running new version? thanks!

@Helen Irun

The full code snippet in my post above.