Hi guys,
I am trying to group similar companies using k-means clustering. However, some of my variables such as industry code and sector code are categorical, which is not applicable to k-means clustering, as indicated by this post. The reason is that "The sample space for categorical data is discrete, and doesn't have a natural origin. A Euclidean distance function on such a space isn't really meaningful."
The post also suggests that instead of k-means, I should use k-modes, which was introduced in this paper by Zhexue Huang. I also found a k-mode library at https://github.com/nicodv/kmodes. However, it seems like I could not import it directly to Quantopian's research environment. There is a source code as well, but when I run it on Quantopian research, I received this error "InputRejected:
Importing Parallel from sklearn.externals.joblib raised an ImportError. No modules or attributes with a similar name were found."
Is there any way that I can use k-modes on Quantopian?
Thanks very much for your help!
Thanh Duong