Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Using PCA to trade a Long/Short basket

Hello all,
I wanted to share an algo based on principal component analysis (PCA). PCA is a transformation to convert a set of (possibly) correlated variables into a set of orthogonal "principle components." Each component maximizes the variance under the constraint that it is uncorrelated with all previous components.

While the concept sounds useful and I've heard of PCA being used a lot in financial applications, I have not seen many concrete examples applying the idea to a trading strategy. This is an initial stab at applying PCA to a trading strategy.

The strategy is pretty straight forward.
1. Estimate the principle components
2. Regress each security against the first N most significant components.
3. Use the residuals from 2 to get a z-score for each security.
4. Throw out all securities where abs(z-score) is below some threshold
5. Invest in the remaining securities with weights proportional to the negative of their z-scores
6. Hope for the best, repeat every X days (weekly here)

Take aways:
1. Transactions & slippage seem to kill any edge for this strategy when the universe is large. (Costs ignored here)
2. A lot of the success seems to dwindle over recent years, I'm guessing this sort of strategy has gotten crowded.
3. The first few principle components do seem to be a decent method to reduce dimensionality.

Does anybody have any insightful uses for PCA within trading strategies?? Any academic papers, references, or wisdom that you've found useful would be much appreciated.

Best,
David Edwards

4 responses

it does really well during the crash. maybe it could be used by other algorithms when they detect a crash?

Hi David,

That seems to be a really nice way of doing mean reversion on an underlying market trend. Any reason you used (log) prices to input to the PCA rather than the returns? I think this might impair the principle components you are able to estimate. For example, PCA is very sensitive to scale so the highest priced stocks should always end up in the largest components. You could zscore the prices before running the PCA or directly use returns.

Thomas

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

This is classic statistical arbitrage. Here is a paper: https://www.math.nyu.edu/faculty/avellane/AvellanedaLeeStatArb071108.pdf.

Here is a notebook. https://www.quantopian.com/posts/update-statarb-using-pca-now-with-cloning

I have been reading a lot on this topic and a few points worth mentioning are:

  1. PCA on returns
  2. PCA on correlation matrix (not covariance)
  3. Check for beta stability when you regress returns on PCA factors.
  4. z scores from OU process.
  5. PCA factors are not stable. I am still researching how to address this. Found this paper on this topic http://www.utdallas.edu/~yexiaoxu/MEC.PDF. Still reading it.