I have (in MATLAB) built up a grid search routine based off of Ernie Chan's basic structure - searching for cointegrating portfolios among all permutations of a set of ETFs/stocks, testing with Johansen's method, ADF, hurst, variance ratio etc etc, running a simple bollinger band backtest and testing for geometric return/maximum drawdown etc.
Anyone else doing similar work? Once the portfolio size gets >4 for a set of 550 instruments, the permutations get out of hand, so I'm looking into possibly doing a clustering analysis to try and narrow the search beforehand, but it's not obvious what features to cluster on at first glance. I am wondering if some clustering on some sort of mutual pairwise cointegration at a lower threshold of significance might help.
In the meantime, I'm going to move on to the execution/OMS side of things and make some progress there.
Maybe I should start a blog lol.