Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Help on using K-modes clustering

Hi guys,

I am trying to group similar companies using k-means clustering. However, some of my variables such as industry code and sector code are categorical, which is not applicable to k-means clustering, as indicated by this post. The reason is that "The sample space for categorical data is discrete, and doesn't have a natural origin. A Euclidean distance function on such a space isn't really meaningful."

The post also suggests that instead of k-means, I should use k-modes, which was introduced in this paper by Zhexue Huang. I also found a k-mode library at https://github.com/nicodv/kmodes. However, it seems like I could not import it directly to Quantopian's research environment. There is a source code as well, but when I run it on Quantopian research, I received this error "InputRejected:
Importing Parallel from sklearn.externals.joblib raised an ImportError. No modules or attributes with a similar name were found."

Is there any way that I can use k-modes on Quantopian?

Thanks very much for your help!

Thanh Duong