Each company can be represented by few metrics that would define the health of the company. Metrics like eps, revenue, market-cap, last 4 quarter earnings, RoE, PE ratio, etc. There could be other metrics or features as well. If we represent each company with these metrics and apply clustering on this data, would the clusters generated be meaningful?
I had this question one night when I was sleepless and I jumped out of my bed and inked down the basic idea and approach.
We know we have companies like Bajaj Finance that has given excellent returns in the past and is continuing to do so. There would be other companies like this. Would it be possible that companies like Bajaj Finance will be clubbed into one cluster.
Find the characteristics of each cluster by looking at few known companies mapped in each cluster and try to see which not-so-famous companies can be found in the same cluster.
- Which companies to select in data?
We start with top 500 NSE companies. Get list of top 500 companies from NSE website
- Where to extract the data from?
- What features to use for clustering?
Start with basic set of features that are easier to get like sector, market-cap, EPS, ROE, PE ratio. I will build on this later. Good now is better than perfect tomorrow.
- debt/equity ratio - last 4 quarter earnings - Price info: mean price, std, 25 percentile price, median price, 75 percentile price, 52 week high, 52 week low - Volume traded - Some way to get user sentiments
- Start modeling
The complete implementation and code can be found in this github repository.
This is a first version of the project. I plan to come back to this at a later time and add more features to represent a company.