Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Using PCA for Statistical Factors Regression

I read Ernest Chan's "Machine Trading", and in his chapter on Factor Analysis, he introduced the idea of using Principal Component Analysis (PCA) to get the statistical factors and then regressing them against next day's returns to get buy/sell signals.

I translated his MatLab code into Python as best as I could, but the backtesting results so far have been dismal, compared to the results in his book.

Was wondering if anyone has tried something similar? Or is there something clearly wrong with my code?

Appreciate any comments or help from you guys.

Thanks!
Yi Peng

10 responses

I would suggest to initially run your backtests with commissions and slippage set to 0, just to see if the strategy has some alpha. With the new splippage model I saw good algorithms (they used to be good at least) perform poorly, so watch out for that.

Here's the algo (Backtest ID: 5ad9230ba7eb4e43d5833bac) with:

    set_commission(commission.PerShare(cost=0, min_trade_cost=0))  
    set_slippage(slippage.FixedSlippage(spread=0))  

Would you be willing to share the new code? What did you fix?

This is my implementation of Ernest Chan's Statistical Factor loadings algo. In addition, I have added a couple of ways to trade the OLS results. They are commented out in the code. Also, I have implemented a reduction of features from 10 to 5 using Sklearn RFE. This seems to work pretty well.

Now, if someone wants to contribute, please try to fix the high turnover rate.

Here is pretty much the same as the above but without RFE feature selection. From the resulting OLS, we pick the top stock with best OLS score and split into top/bottom to create the long-short.

Not to address turnover although it might help there, I didn't check.
Just offering some extras/options and an occasional use of weighted by score can be informative.

It does not matter how much alpha you get in a backtest if a trading strategy cannot at least survive its frictional costs.

The first of any acid test that should be done on any trading strategy is to find out if it could survive these frictional costs (commissions, slippage, and other fees). The second test might be to see if it might break down going forward (giving it more time). And a third test to figure out if it is scalable (give it more money to manage).

All 3 tests were done simultaneously in the attached algo using Luc's version (Backtest ID: 5c62b8dee310ed49b6a0c97e).

Have not read the program, however, I have no motivation to go any further.

@Guy, Thank you for your contribution. Yes, it was obvious to me that an algo that has a turnover of 100%+ will fail given standard slip of 5bps. Hence why I posted my results asking the community for ideas on how to reduce the turnover, if at all possible.

Thanks @Blue for posting your code. Your "norm()" function has decreased TO to 18%. I have yet to understand why. I will check out carefully your code.

Is there any filter on the universe that could be applied such that it is reduced, and as such, reduce the TO?

/Luc

You might try lengthening the look back window, using exponentially weighted scores, and shrinking the covariance matrix to build the PCA.

Your algorithm better used for a short-term signal for long-term trading. See this paper To Trade or Not to Trade? Informed Trading with Short-Term Signals for Long-Term Investors