Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
pattern recognition based on zlib

Here is a simple example of a pattern recognition algorithm based on the text string compression routine zlib. Thanks to James Jack for useful discussions and coding the NCD and CDM functions, and to Quantopian for enabling zlib.

For those interested in doing an intellectual "deep dive" into the topic, a starting point is:

Ming Li; Xin Chen; Xin Li; Bin Ma; Vitanyi, P.M.B.; , "The similarity metric,"
Information Theory, IEEE Transactions on , vol.50, no.12, pp. 3250- 3264, Dec. 2004
http://homepages.cwi.nl/~paulv/papers/similarity.pdf

2 responses

I ran the algorithm on SPY & SH (S&P 500 ETF & S&P 500 short ETF, respectively). Sample output, with X & Y representing the coded prices (relative to their respective moving averages) over a 30-day trailing window:

2012-02-23handle_data:53INFO----------------------------------  
2012-02-23handle_data:54INFO X: 000000000000000011111111111111  
2012-02-23handle_data:55INFO Y: 111111111111111100000000000000  
2012-02-23handle_data:56INFO----------------------------------  
2012-02-23handle_data:57INFO NCD: 0.142857142857  
2012-02-23handle_data:58INFO CDM: 0.571428571429  
2012-02-23handle_data:59INFO----------------------------------  

One might have expected NCD & CDM both to be ~ 1 (indicating a high degree of dissimilarity). Instead, both indicate a relatively high similarity between SPY & SH (NCD << 1 & CDM ~ 0.5). The interpretation, I think, is that SPY & SH (as coded) have the same information content. For example, if I am given the SPY time series, I can predict the SH time series (so long as I know that it moves in the opposite direction). I obtain a similar result for the pair SPY & IVV, which move in the same direction.

Hi Grant, I think I better start with RSquared, as it's something I should be able to understand easily, and is pretty straight forward. For example, SH vs SPY has a RSquared above 0.9, which is consistent with them being mirrors of eachother.