All,
I'm working on an algorithm to sort a set of securities (selected using set_universe), by comparing their price (or volume, etc.) histories. The basic outline is:
- Get a trailing window of prices for all securities.
- Code the prices with z-scores (see Ref. 2 in algorithm).
- Convert the coded prices into text strings.
- Rank the securities (sids), based on a similarity comparison of the text strings
One application would be to identify outliers that are not following the overall market trend.
I'm looking for advice on step 4 above. The ranking will be done based on a text compression based comparison function (NCD, per Ref. 1 in the algorithm). Perhaps a Python expert can tell me the best way to do this. Note that I need to know the new ordering relative to the original, by sid (i.e. rank the sids based on a similarity criterion).
Generally, I suspect that the algorithm could be sped up with better coding...any ideas?
Also, sometimes I get the error:
There was a runtime error.
ValueError: cannot convert float NaN to integer
USER ALGORITHM:47, in handle_dataGo to IDE
X[j] = X[j] + str(int(coded_d[i,j]))
Any idea if this is due to the set_universe changing the list of sids, or if the batch transform filling function is not working (as I understand, it should clean all of the NaN's due to missing trades)?
Grant