Fixed version of Ernie Chan's "Gold vs. gold-miners" stat arb

Back to Community

edited May 14, 2013

This is a fixed version of the Gold vs. gold-miners algorithm. Do not use this version, but see below for a fixed version that uses the batch_transform to compute the regression.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

21 responses

Grant Kiehne

Jan 14, 2013

Hello Thomas and others,

Any idea why the algorithm "stopped working" around mid-2012?

Lyle Dean

Jan 14, 2013

Most probably the uncertainty due to the Fiscal Cliff at the end of the year, money flowing into gold as a 'safe haven' but the basket of gold mining stocks GDX not seeing the same relationship due to market incorporating the increased capital gains tax into the stock price.

Grant Kiehne

Jan 15, 2013

Thanks Lyle,

Here's a chart of the GLD & GDX prices:

Just eyeballing the charts, it looks like GDX just flattens out starting in early 2011, while GLD continues to rise (seems to be on the same general upward trend since early 2009). Kinda counter-intuitive...

Thomas Wiecki

Jan 17, 2013

Interesting analysis! I would be curious if a cointegration statistical test as described here would be sensitive to this and signal to e.g. stop trading this pair.

Disclaimer

Grant Kiehne

Jan 17, 2013

Thomas,

I looked over the paper a few days ago...not the sort of thing I'll be able to figure out without a significant time investment. Do you understand it? Do you think it could be implemented in Quantopian? I could contact one of the authors for guidance, but wouldn't even know what to ask at this point.

Thomas Wiecki

Jan 17, 2013

@Grant:

I actually meant to get back to you in the other thread but since this is more recent I'll just reply here.

The paper requires some background in Bayesian statistics and would definitely be a medium-scale project to implement in Quantopian. Luckily, someone else I sent the paper to is currently working on this. While he is using Mathematica for the first iteration it should be possible to port this to python (and by this extension Quantopian) without too much effort. So hopefully within a few weeks we'll have some good code to explore this further :).

Disclaimer

rxs

Jan 18, 2013

@Grant:

I'm in the process of implementing the algorithms described in the Barber paper on Bayesian co-integration because I feel that they could be useful for various types of relative-value trading (e.g. simple pairs). This paper proposes a Bayesian approach to estimating whether or not two signals are co-integrated that avoids some of the pitfalls involved in using Dickey-Fuller methodology.

Initially I want to test the algorithm as described in the paper within Mathematica on simulated stock signals to verify both correctness of the implementation and to study how well it performs under the metrics of most interest to a trader. This investigation will likely suggest other ways to extend the algorithm after which it would make sense to translate the code into Python in order to test it against historical data on the Quantopian platform.

As @Thomas points out above, it will likely take us at least another few weeks to reach that point since there are more than a few steps involved here. You mention the possibility of contacting the authors for guidance. Perhaps they have some Matlab code that they are willing to share. If so, I wouldn't be surprised if it uses some of the routines from the toolbox that accompanies Barber's book on machine learning, which incidentally I recommend downloading from his website for additional background on this subject area.

Grant Kiehne

Jan 18, 2013

Thanks rxs,

Here's a talk by the first author of the paper: http://techtalks.tv/talks/bayesian-conditional-cointegration/57453/. I listened to it...the approach is still kinda murky to me, but I'm getting a general sense.

Grant Kiehne

Jan 19, 2013

Thomas & rxs,

Intuitively, one might expect an underlying causal relationship that would explain the common trending of GLD and GDX. Simplistically, I'd think that demand for ownership of physical gold (GLD) would result in interest in investing in companies that extract it from the earth (GDX). If so, might there be a detectable time lag between price changes in GLD and GDX? In other words, as a spread develops between GLD and GDX, is GLD driving GDX, or is there no detectable lag down to the minute level? Seems like a Quantopian algorithm could be concocted to have a look at the dynamics, right?

Grant

Damián Avila

Apr 23, 2013

HI guys, I have played with cointegration in financial time series and it is a very interesting approach looking for mean reversion strategies...
Some comments come to my head reading this and the previous thread:

I read the code and I did not find any unit-root test such as ADF or PP or any stationarity test such as KPSS over the residual of the OLS regression. So, actually you do not know if these time series are actually cointegrated. I saw that you are proposing a bayesian approach to look for a cointegration relationship, but maybe a frequentist approach is easy to achieve using statsmodels (the last time I use it, they supported the adf test) because it is already implemented. Maybe, testing with adf test can be used as a filter to know when the pair is no longer cointegrated and not able to be traded.
Another idea would be to use a Johansen test to search for the cointegrated vectors... you have a lot of improvements over the adf test (also some problems) and a VECM that you can actually use to make forecasts (i think there is some work in statsmodels to have Johansen, but it is very preliminary).
There are some recent papers working in cointegration vectors as functions of time. In this way, the cointegration vector is no longer fixed, and you could catch /fit the dynamic of the stochastic trend underlying both time series, which would be very interesting from a forecasting point of view.

OK, just some ideas to discuss... I will try to get some improvements over this strategy with some additional code maybe implementing some of these ideas.
Cheers.

Damián.

Thomas Wiecki

Apr 23, 2013

Hi Damián,

Good to see you here :).

I think those are great ideas. Certainly using a prewritten test in statsmodel will be easier than the Bayesian cointegration test (not sure what the current status on that is; @rx?).
Can you post a paper on the cointegration of vectors as a function of time? Is the idea to look for cointegration between more than 2 stocks?

Thomas

Disclaimer

Damián Avila

Apr 23, 2013

Some interesting paper about time varying cointegration:

This is the first reference and is very old now:
http://www.jstor.org/discover/10.2307/3532993?uid=3737512&uid=2&uid=4&sid=21102185134967

Then, there was a "time of silence" and now you have some papers here:

http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=7881771 by Bierens and Martin (math theoretical stuff, a little hard for a Biochemist like me... jeje).
http://www.sciencedirect.com/science/article/pii/S0140988309001856 by Park and Zhao (Another approach and more applied).
http://www.sciencedirect.com/science/article/pii/S0304407611001588 (also some Bayesian approach to the problem).

There are more interesting papers, but I read them some years ago... I have to look for them again...

Damián.

Thomas Wiecki

May 14, 2013

Attached is a fixed version that also uses a batch_transform which makes the code a little nice.

Damian: Those are interesting references, I'll check them out as soon as I find some time.

Disclaimer

Damián Avila

May 14, 2013

The code looks very clean... as soon as I come back home from SciPyCon Argentina Conference, I will play with this for sure...

Thanks for posting it!

Damián.

ben v

May 27, 2013

Ernie also mentioned half life in his book. I am trying to incorporate it as an exit strategy instead of zscore <= 1.0 and zscore >= -1.0 signal.
I added the calculation of half life to the algorithm, but everything is left intact.
The half life values look off and I'm still working on how to apply it. Perhaps someone can have a go at it.

Zack Sun

May 2, 2014

This is a version of exactly closing your previous position. And One question, should we consider closing our position at the end of backtesting period in any case?

Peter Bakker

Jan 6, 2015

hacked another version with NUGT as the long ....

Vladimir

Nov 20, 2019

The same idea is simplified, with negatively correlated constituents.

import quantopian.optimize as opt  
import statsmodels.api as sm  
import numpy as np  
# --------------------------------------------------------------------------------  
BULL = symbol('QQQ'); BEAR = symbol('TLT'); LEN = 21; UB = 1.6; LEV = 1.0; wt = {}  
# --------------------------------------------------------------------------------  
def initialize(context):  
    schedule_function(trade, date_rules.every_day(), time_rules.market_open(minutes = 150))

def trade(context, data):  
    prices = data.history([BULL, BEAR], 'price', LEN*2, '1d')  
    slope = sm.OLS(prices[BULL].values, prices[BEAR].values).fit().params[0]  
    spread = prices[BULL] + slope*prices[BEAR]  
    zscore = (spread[-1] - np.mean(spread[-LEN:])) / np.std(spread[-LEN:])

    if zscore >= UB:    wt[BULL], wt[BEAR] = 0.9*LEV, 0.1*LEV  
    elif zscore <= -UB: wt[BULL], wt[BEAR] = 0.1*LEV, 0.9*LEV

    order_optimal_portfolio(opt.TargetWeights(wt), constraints = [opt.MaxGrossExposure(LEV)])  
    record(zscore = zscore, ub = UB, lb = -UB)

Guy Fleury

Nov 20, 2019

@Vladimir, nice work. Like your code. Thought I could change some numbers and push performance higher.

Helen Irun

Nov 28, 2019

@Vladmir, @Guy, the old code doesn't run anymore, do you mind sharing a running new version? thanks!

Vladimir

Nov 28, 2019

@Helen Irun

The full code snippet in my post above.

You've successfully submitted a support ticket.

Our support team will be in touch soon.