Hi guys,
I am completely new to Quantopian! Recently, I have been trying to see if I can group comparable companies together using k-means clustering. I decided to test it on the financial industry first. I used variables like enterprise_value, market_cap, sustainable growth rate, ROA, ROE, ROIC as factors that can group different firms together. Then, within each cluster, I would long/short the top/bottom EV/EBITDA firms (ie. the undervalued/overvalued firms)
The problem is that I had to convert Pipeline data frame to an array in order to k-means clustering from sklearn library. Unfortunately, after I assigned each firm with a cluster label, I could not convert it back into the original DataFrame that had security number and column's title.
I hope that you guys can help me figure out the next step.
Thank you very much,
Thanh Duong