Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Pair Trade using a Risk Factor Model

Hi all, this is a stab at a pairs trading algo that uses a risk factor model to estimate the hedge ratio between stocks. I translated the math in Pairs Trading: Quantitative Methods and Analysis from the Wiley finance book series.

This sort of thing might be useful for estimating risk factor exposures, since Q scoring now includes a beta metric I thought I'd share it and get some more eyes on the math. It looks pretty impressive here, but it's volatile and the beta estimations seem brittle. It also needs some work to control the leverage/exposure. My initial feeling is that old fashioned OLS regression is tough to beat for pairs trading, but this could be useful for trading a basket of stocks with common risk factors.

Cheers,
David

10 responses

Thanks for sharing David. The code is very clean and readable.
You have basically coded the entire chapter 3 (Factors Model) and 6 of Vidyamurthy's book.
Arbitrage Pricing Theory has many usages, and Vidyamurthy attempts to exploit this for pairs trading.
I couldn't get around with him using the hedge ratio as the beta (slope) for relating price x to y.
So I attempted to compare the hedge ratio with the beta calculated with OLS (shown below).
They are not the same... hopefully someone can point to me that Vidyamurthy's right, and I'm wrong!

I've also attached my revision of the code, which attempts to adjust the leverage and test for whether the spread is stationary prior trading.
Thanks again for sharing.

from datetime import datetime  
import numpy as np  
import pandas as pd  
import statsmodels.api as sm  
import statsmodels.tsa.stattools as ts  
%matplotlib inline
import zipline as zp  
import pytz

# get data  
start = datetime(2003, 4, 1, 0, 0, 0, 0, pytz.utc)  
end = datetime(2013, 4, 19, 0, 0, 0, 0, pytz.utc)  
tickers = ['SPY','SHY','HON','DHR']  
data = zp.utils.factory.load_bars_from_yahoo(stocks=tickers, indexes={},start=start, end=end)  
prices = np.log(data.minor_xs('price'))

# get hedge ratio as beta using the Risk Factor Model (Vidyamurthy,2004)  
factors = prices[tickers[0:2]][-50:]  
assets = prices[tickers[2:]][-50:]  
model = RiskFactorModel(assets, factors)  
x = 'HON'  
y = 'DHR'  
hedge = model.hedge_ratios[x][y]  
print(model.hedge_ratios)  
print(' ')  
print('hedge ratio:',hedge)

# get beta with ols  
md = pd.ols(y=prices[x][-50:],x=prices[y][-50:])  
print('beta:',md.beta)

https://scholar.google.com/scholar?hl=en&q=pair+trading+arbitrage+pricing+theory+Vidyamurthy&btnG=&as_sdt=1%2C5&as_sdtp=
Markus Harlacher, Cointegration based statistical arbitrage, Master thesis 2012 https://stat.ethz.ch/research/mas_theses/2012/harlacher.pdf
http://ro.uow.edu.au/cgi/viewcontent.cgi?article=4452&context=theses

If a pair doesn't work with the OLS, it's probably not a pair.

Hey Ted, I don't think you'd expect the two hedge ratios to be the same because one is a simple linear regression of 2 series, and the risk model assumes there common factors between the two and incorporates the covariance of the factors as well. For pairs I tend to agree with Bharath that an OLS model should capture the effects of any common risk factors pretty effectively. Factor models are useful for estimating risk exposures across a basket, but I'm not sure how to go about effectively using that information, I'm still in the learning phase.

Below are just some thoughts, and a question regarding this strategy after reading Vidyamurthy's book.

One key component to building a 'Factor model' is obviously the selection of factors, which I believe Vidyamurthy intentionally left out from the book.

Ideally the factors shall be able to explain the price/return of the asset. For pairs trading, we will be looking at the price of the asset. One way to determine how well the factors explain the asset price is the goodness of fit from linear regression. With linear regression, we can obtain the r^2, where a high r^2 would indicate that the factor explain the price of the asset well within the given time frame. However, high r^2 within one time frame does not guarantee the r^2 value would persist in the following time frame as indicated in the figure below.
r^2 from regression between asset price and factors

Similarly, for picking pairs, Vidyamurthy proposed to pick pairs with high 'distance measure'. This distance measure captures the similarity of the factor exposure to the 2 assets. Again, a high distance measure obtained at a given time frame may not persist through the following time frame, as indicated in the figure below.
distance measure between two time frames
(red indicates pairs have r^2>.9 in the earlier time frame.)

The optimal pairs are likely those with high distance measure and high r^2 (from the linear regression of factors and asset) in both time frames...

Any frameworks/ideas/hints for picking a good set of assets/factors that would persist through time?

Factors used: YAHOO/INDEX_HUI,YAHOO/INDEX_VIX,YAHOO/INDEX_OSX,YAHOO/INDEX_XAU, YAHOO/TSX_RTRE_TO,YAHOO/INDEX_WILREIT,YAHOO/INDEX_SML,YAHOO/INDEX_N225,YAHOO/INDEX_W5KLCV,YAHOO/INDEX_HCX
Assets used: components of Dow Jones

Thanks for sharing Ted, interesting stuff. I would think the factors that are most likely to persist are the ones most closely related to each companies underlying business model.

For example, I'd expect the price of oil to be a fairly persitent factor for XOM because their business is so heavily dependent on it. Most companies dealing in commodities are likely to have persistent risk exposure to the commodities price. Maybe the same could be said for financial companies and interest rates, or companies with a lot of business overseas and the dollar index. Some tech companies are probably harder to nail down because they're more or less just dependent on their ability to innovate within their space.

@Ted can you post a clonable scatter plot of r^2 relationship is that done on research?

I was not able to post my research page here, so please look at the code in the below link.
Building upon David's code, I have added methods for calculation of corrleation and snr, and then looked at if consistent correlation / snr can
actually lead to some edge via pairs trading. cheers.
https://www.quantopian.com/posts/re-pair-trade-using-a-risk-factor-model-analysis-of-260k-pairs

@Ted I didnt see the illustration of r^2 relationship in your notes that you mention... along this line**.. With linear regression, we can obtain the r^2, where a high r^2 would indicate that the factor explain the price of the asset well within the given time frame. However, high r^2 within one time frame does not guarantee the r^2 value would persist in the following time frame as indicated in the figure below.**

john, try playing with the below. have fun!

s=11
plt.scatter(info[s]['fm'].fm_rolling[-2].cor.unstack(),info[s]['fm'].fm_rolling[-1].cor.unstack())
plt.xlabel('older correlation')
plt.ylabel('newer correlation')


NameError Traceback (most recent call last)
in ()
1 s=11
----> 2 plt.scatter(info[s]['fm'].fm_rolling[-2].cor.unstack(),info[s]['fm'].fm_rolling[-1].cor.unstack())
3 plt.xlabel('older correlation')
4 plt.ylabel('newer correlation')

NameError: name 'plt' is not defined