Manish Barnwal

...just another human

Cluster NSE top 500 companies


Each company can be represented  by few metrics that would define the health of the company. Metrics like eps, revenue, market-cap, last 4 quarter earnings, RoE, PE ratio, etc. There could be other metrics or features as well. If we represent each company with these metrics and apply clustering on this data, would the clusters generated be meaningful?

I had this question one night when I was sleepless and I jumped out of my bed and inked down the basic idea and approach.

We know we have companies like Bajaj Finance that has given excellent returns in the past and is continuing to do so. There would be other companies like this. Would it be possible that companies like Bajaj Finance will be clubbed into one cluster.


Find the characteristics of each cluster by looking at few known companies mapped in each cluster and try to see which not-so-famous companies can be found in the same cluster.


  • Which companies to select in data?

We start with top 500 NSE companies. Get list of top 500 companies from NSE website

  • Where to extract the data from? Use requests and beautifulsoup library

  • What features to use for clustering?

Start with basic set of features that are easier to get like sector, market-cap, EPS, ROE, PE ratio. I will build on this later. Good now is better than perfect tomorrow.

Other features

  - debt/equity ratio

  - last 4 quarter earnings

  - Price info: mean price, std, 25 percentile price, median  price, 75 percentile price, 52 week high, 52 week low

  - Volume traded

  - Some way to get user sentiments
  • Start modeling

Code implementation

The complete implementation and code can be found in this github repository.

This is a first version of the project. I plan to come back to this at a later time and add more features to represent a company.

Advertiser Disclosure: This post contains affiliate links, which means I receive a commission if you make a purchase using this link. Your purchase helps support my work.