Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Coke vs Pepsi: An Integration Trade

I recently did some work in R and the library package quantmod (see link below) to find co-integrated stock pairs. The idea is that some stocks not only move together (correlated) but tend to have a stationary mean-reverting spread.

So for instance, say Pepsi and Coke are priced at $25 and $50, respectively. Therefore, the relative spread is 0.5 (25/50). The idea is that although the prices of each individual stock will vary greatly over time, the spread will remain fairly constant.

My strategy buys Pepsi and sells Coke when the spread narrows below a certain amount as a percentage of its historic standard deviation. It then waits for the spread to widen to somewhere closer to its medium run average and closes the position. It does the opposite when the spread widens too far.

I experimented a bit with the specific parameters and used 2.0 standard deviations away from the medium run average as a trigger to buy and 0.5 standard deviations away from that same average to close out the position. I also use 20 days as the period to calculate the mean and standard deviation of the spread. My bet size was 200k with a requirement of my position value + cash to be greater than 0 to open new trades. I experimented with smaller bets but found that placing larger bets and waiting for reversion was more profitable.

Obviously these parameters need to be optimized.

Finding cointegrated pairs is simple enough. Does anyone else have any strategies on trading on cointegration?

R package for financial analysis:
http://www.quantmod.com/
Quick tutorial on finding cointegrated pairs using R:
http://quanttrader.info/public/testForCoint.html

4 responses

@Branko, thanks for sharing. This is an interesting idea.

I noticed a problem in the algorithm that result in overbuying your positions.

The way portfolio.positions_value and portfolio.cash are calculated by the backtester is not very useful for hedged positions. The short position is subtracted from the long position. As a result the overall position can run wild but the two variables that are supposed to measure the position remain neutral (close to zero).

So in calculate_direction() where you test for "cash" the result is not going to be what you expect. I suggest directly checking to see if you have a position and not using the aggregate value:

    # this boolean test reveals if a position is already established  
    if context.portfolio.positions[context.stocks[0]].amount == 0:

    # this boolean test is highly misleading because the variables are miscalculated by the backtester  
    if (context.portfolio.positions_value + context.portfolio.cash) > 0:  

In addition you had a short circuit test that would exit from handle_data() if a_order and b_order equal zero. I didn't trace all the logic to find out if this is a problem but in general I wouldn't want to have logic that halted before testing calculate_close_trades()

    if a_order == 0 and b_order == 0: return  

So I commented out that test. As a result calculate_close_trades() will likely be called even if no position is established. I added a quick test inside the function to prevent an error caused by trying to access variable keys that haven't been set yet.

    if not signal in context.close_trades: return  

The modified backtest is attached.

Thanks for the feedback Dennis C.

You're right about the short-circuit. I originally had the calculate_direction function yield a close position signal but had changed that and forgot to remove the short circuit.

In regards to the position limit, I want to be able to put down more bets when the spread continues to widen/collapse as the reversion would be expected to be even more profitable. Do you know of a good way to control hedged positions that would somewhat resemble real life trading apart from testing if there are already any holdings? Without any check, the swings would just be wild but with the check you suggested no activity could take place until spread converges.

Also, any idea how I would be able to optimize the variables? I imagine I would want to optimize them on one data set (in the past) and test them on another (in the more recent past). Hopefully I can do this in the Quantopian platform.

For better position metrics I suggest the home-brew calculation I posted in another thread on the topic. The values abs_cash and abs_capital_used should be calculated once per handle_data(). If you find any errors in this approach please let me know.

# calculate capital_used using absolute amount: abs(amount)*cost_basis  
pos = context.portfolio.positions  
abs_capital_used = sum(abs(pos[s].amount) * pos[s].cost_basis for s in pos)  
record(abs_capital_used = abs_capital_used)  
# calculate free cash: starting_cash + pnl - capital_used  
port = context.portfolio  
abs_cash = port.starting_cash + port.pnl - abs_capital_used  
record(abs_cash = abs_cash)  

You can specify any start and end date (from 2002 to 2013) in your backtests. But Quantopian isn't really set up for parameter optimization. I would caution you that overfitting can easily occur so be very conservative in your optimizations. Much better would be to find natural ratios and averages to use in your formulas. That way it has a much better chance of adapting to out-of-sample data.

Branko,
Thanks for sharing the code, insights, and links.

I recently did a casual survey of Coke and Pepsi historical data. I have a hypothesis that is a twist on cointegrated pairs.
The hypothesis asserts:

  • Coke and Pepsi have movement around dividend time
  • Maybe, a lot of institutional and retail money moves from KO to PEP and back around the dividend cycle
  • The magnitude of movement might be greater than the dividend as a percent of share price
  • Buy PEP on KO ex-Dividend day, and vice versa

I suppose that is more of a calendar feed and scheduling job than a technical cointegration strategy.
I do want to emphasize casual survey. I merely compared decades of KO and PEP visually on a Yahoo Interactive chart, with Dividend events turned on.
The hypothesis appeared truthy, but was not measured and tested. Anyone in the community doing that sort of testing?

I posted an image here:
Cointegrated Pair with Ex-Dividend Timing?

Cheers,
Monte
nFol.io