Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Equity Valuation: The Comparables Approach Using K-means Clustering

Hi guys, this is a follow up from my "K-means Clustering Help" post.

One of the simplest ways to find an undervalued company is through the comparables approach. Here is a detailed explanation from Investopedia: https://www.investopedia.com/articles/investing/080913/equity-valuation-comparables-approach.asp

"The basic premise of the comparables approach is that an equity’s value should bear some resemblance to other equities in a similar class. For a stock, this can simply be determined by comparing a firm to its key rivals, or at least those rivals that operate similar businesses. Discrepancies in the value between similar firms could spell opportunity. The hope is that it means the equity being valued is undervalued and can be bought and held until the value increases. The opposite could hold true, which could present opportunity for shorting the stock, or positioning one’s portfolio to profit from a decline in its price.

There are two primary comparable approaches. The first is the most common and looks at market comparables for a firm and its peers. Common market multiples include the following: enterprise value to sales (EV/S), enterprise multiple, price to earnings (P/E), price to book (P/B) and price to free cash flow (P/FCF)..."

However, to find a set of equities in a similar class, we often have to make many assumptions. So, in this algorithm, I used k-means clustering to attempt to quantitatively cluster similar firms into comparable groups (info in Kmeans:https://en.wikipedia.org/wiki/K-means_clustering). Kmeans will group the firms based on variables such as Market Cap, ROE, ROA, etc. We could also find EV/EBITDA multiple of each company in each set and, according to the theory, the bottom 10-25% EV/EBITDA of each group should be undervalued and the top 10-25% EV/EBITDA of each group should be overvalued.

However, k-means clustering cannot be applied to discrete variables, such as industry code, sector ID, credit ranking, etc. So my algorithm only experimented with companies from the financial industry and did not include any discrete variables.

I have attached the result. I am brand new to Quantopian so please give me some advice on how to cluster firms more accurately and get around the discrete variables problem.

Thank you very much,

Thanh

1 response

I'm wondering if standardization of the data is needed, for K-Means?