Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Epic fail for stat-arb

Kaboom!

Looking at this fantastic blow up, it seems like I should be able to just invert all my orders and have a great winner of an algorithm. Yet, I'm finding it surprisingly difficult to do that! Still, I keep thinking that reliably bad is just an inverted winner, and there must be a way to take this behavior and profit from it, especially given the steady decline. Any advice?

What I did to create this

After I tried the arbitrage between the gold commodity ETF and the gold mining ETF in this shared algo, I did more research on the fundamental investment. I read this article on Motley Fool, which described how miners will hedge their commodity exposure, and therefore their earnings do not exactly follow the fluctuations in gold. I thought this could cause the GDX ETF (miners) to be less correlated than a single gold miner. I settled on Royal Gold, primarily due to its relatively high trading volume.

I also modified the algorithm to reduce its positions whenever the R squared fit of the model fell below 0.9.

backtest results

(the original backtest has been deprecated and replaced with an image of the performance)

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

23 responses

I tried to just naively switch buy/sell orders but it did just as badly.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@Thomas, that was my first guess too. That amounts to betting that after the spread is more than 2 standard deviations wide, it will get wider still, so I would intuitively expect that to fare poorly too.

@fawce, I wonder how much of your losses are derived from depreciation of your portfolio and how much are simply from frictional costs associated with opening and closing your positions. If I understand your algo correctly, I think if the spread fluctuated around 2.0 or -2.0 standard deviations, you would end up buying and re-selling the same chunks of stock over and over. That would also explain why inverting the bets doesn't change your performance, since you're doing the same fluctuation except inverted.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@Scott
I was worried about the same, seems like I could test it by varying the standard deviation range for buying/selling. My worry is that there is a trend for the spread to continuously widen, which would tend to put the spread at a large delta from the trailing mean. If that is the case, I'm not sure how to make this profitable without just betting the spread will widen.

@fawce
.9 also seems like a really high minimum on your r^2 value to reduce your position. I wouldn't be surprised if you're almost always reducing your position every time you buy or sell the spread (which would have the same effect as the price fluctuating around the stddev window)

Also, if your r^2 is less than .9 but the absolute value of your zscore is greater than 2.0, I think you'll end up placing opposite orders within the same call to handle_data, which will further your losses to frictional costs.

@Scott, great catch, that should have been an elif - I just fixed it and I'm re-running...

I threw some print statements into your algo, and it looks like your r^2 values are actually coming out negative. While applied statistics is not my area of expertise, I'm pretty sure this is bad news for the predictive power of your model. Quoting from wikipedia:

"Important cases where the computational definition of R2 can yield negative values, depending on the definition used, arise where the predictions which are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data, and where linear regression is conducted without including an intercept. Additionally, negative values of R2 may occur when fitting non-linear trends to data. In these instances, the mean of the data provides a fit to the data that is superior to that of the trend under this goodness of fit analysis."

@Scott
Ah, now this makes a lot of sense. I fixed the if statement on line 170 to be elif, and the algorithm never trades! The r^2 guard in the algorithm was meant to protect against the condition you're describing above - where the assumed relationship between the two instruments proves to be too weak to trade. But with the bug, it was trading both sides of the guard and steadily draining capital in transaction costs.

Oh well, I guess that's what backtesting is for...

@fawce
I think the issue was (at least in part) with your use of statsmodels to build the linear regression. The OLS model is really meant for more generalized (ie, multidimensional) correlations, and I think it assumes you've done some amount of normalization on your data so that it doesn't have to account for constant factors. (This is all a bit speculative. I went and hunted through the statsmodels api reference a bit and found it pretty impenetrable. I think I'd have to sit down with an applied stats book for a while to properly figure out what's going on.) At any rate, since all we're doing is regressing two different series, a much simpler tool is scipy.stats.linregress. I ran that against the same data and got much more sensible results:

2012-01-17 14:31:00 PRINT rsquared 0.018384, beta -0.095647

2012-01-30 14:31:00 PRINT rsquared 0.307240, beta 0.339751

hmmm...so I just ran a long backtest with the same strategy, but using the following to calculate the relevant stats metrics:

from scipy import stats

p1 = [x.price1 for x in self.ticks]

p2 = [x.price2 for x in self.ticks]

gradient, intercept, r_value, p_value, std_err = stats.linregress(p1, p2)

self.rsquared = r_value **2

self.beta = gradient

This should calculate a linear regression between the price of p1 and p2, such that in the model:

p2 = gradient * p1 + intercept

It seems like the rsquared of the model fluctuates wildly as you recalculate. There are periods in 2006 where you have decent correlation (r^2 hanging out around .6 or higher) and other periods where it's less than .01, which would mean that the two sids are moving essentially at random relative to one another. I think improving on this algo would require figuring out a more reliable way to determine when the integration between the sids breaks down.

Instead of looking at the R^2 you can also look at the p-value which tells you how likely it is to get this coefficient if both were totally uncorrelated. Often p < .05 is considered to be significant (i.e. the probability of obtaining these results as a fluke are less than 5%).

@Thomas you wouldn't happen to know how to coax the p-value out of statsmodels, would you?

@Scott
I noticed another difference between this example and the gld/gdx version - here I am using a bet size of 100 shares, and in the gld/gdx I am using 5 shares. I switched to 5 shares here and got very different results - still not good, but not the grinding pit of despair pictured above. I think the larger betsize increases the slippage and transaction costs to a level that overwhelms the value of the arbitrage.

@fawce: I think you might want to add

p2 = sm.add_constant(p2)  

to add a constant factor of 1, otherwise you are not estimating an intercept (which is why @Scott's code should have produced something else, I think scipy is estimating an intercept by default). The reason is that statsmodels is estimating the general Y = X*beta + eps model. You want to estimate Y = X*beta + intercept + eps. Thus if you have a column of 1's in X it ends up estimating what you want.

To get the p-value you have to compute a Student t-test. After after line 103 you can add:

pvalue = results.t_test([0,1]).pvalue  

@fawce: that could also explain why you sometimes get negative R squared values.

@fawce there are two obvious improvements to the OLS you're using: (i) account for the intercept in the spread, (ii) use a symmetric version of the OLS known as TLS. It's detailed here in R - http://quanttrader.info/public/testForCoint.html

@J.J. Thanks! The issue of the intercept has been raised again in this thread. I can see mechanically how to account for the intercept in the spread, but I'm not grasping intuitively the meaning. Any chance you could take a moment and try to provide an explanation? Thanks for the link.

Also, just wanted to provide a link to the ever reliable Wes McKinney's statsmodel demo. Wes built an ipython notebook filled with examples, starting with none other than the OLS with its intercept.

@J.J. looks like @Thomas took your advice and added an intercept over here.

@fawce
Would say that the intuition is that if you omit the intercept you're systematically over/underestimating the spread. To illustrate, let's say the actual model is:
spread = p1 - beta*p2 - alpha
By assuming alpha = 0, you overesitmate the spread when alpha > 0 and underestimate it when alpha < 0

There other things you can do to optimise the spread / zscore:
- using logarithms
- using different window sizes
- using volatility decay - weigh recent history more heavily

I tried logs by cloning your code but somehow it didn't work very well :) It works in my own implementation though...

Had another question for you - what's the meaning / difference here:
42. spread = price1 - price2
60. self.spread = data[self.stock1].price - self.beta * data[self.stock2].price

@JJ the spread on line 42 is actually not used. Originally, I was calculating the moving average and stddev inside handle_add, and I reorganized the code to do it in the OLSWindow's handle_data method instead.

I believe the discrepancy in calculation of the spread is a bug - line 60 is the correct calculation, where the spread is being adjusted by the beta from the OLS calculated on the time series from the prices.

For the log, are you taking the log of the prices?

yes, spread = ln(p1) - ln(p2) = ln(p1/p2) is one way of doing it