Dear all,
I am trying to reproduce the simple example of using a 'statistical factors' model for prediction of S&P 500 stocks returns from the E.Chan's book 'Machine Trading: Deploying Computer Algorithms to Conquer the Markets'. The model is essentially a linear regression on the variables generated as factors from PCA on returns of the S&P 500 stocks. The code attached to the book uses the 2007-2013 daily data from www.crsp.com - and the author achieves some reasonable model quality (in-sample R2 of 0.05-0.10 or more) and the CAGR of 15% when trading the long-short portfolio based on the predicted returns. I'm getting the in-sample R2 scores aroung 0.01 and a negative CAGR.
One of the reasons that I see is that the price data from the dataset (coming, as mentioned, from www.crsp.com) are quite different from the prices on Quantopian - not sure if one of those is adjusted for splits/dividends and the other is not, or aything else...
Has anybody tried to implement that example on Quantopian? Or does anybody have ideas on what the prices from crsp.com are?