Yesterday, I was taking a session on Data Science for few of my colleagues. The aim was to give a brief overview of machine learning. There were two of us taking the session. We had a rough idea what all we wanted to cover in the two hours session.
I started the session starting with what machine learning is. The types of learning - supervised and unsupervised and the examples that fall into each of these.
What is machine learning?
Machine learning helps the machine learn from the data. Understand the pattern in the data and use it to predict the future values. There is a true function that maps the inputs to the output values. Machine learning is the process that helps to estimate that function. We try to find the proxy function that is as close as possible to the true function.
The essence of machine learning is function estimation.
If you are interested to read more on this, click on this post. The essence of machine learning is function estimation.
We then moved to explaining the simplest technique in machine learning - linear regression. We then moved to cover classification problems - where you have to classify an observation into one of the pre-defined classes. Talked about logistic regression, explained how in regression we try to regress the response variable with respect to the independent variables and how in logistic regression we regress the probability that the response variable will belong to class one.
In binary classification problem, logistic regression gives you the probability to belong to class 1. And if you want the probability for class 0, you just subtract this probability from 1. So what you have right now is the probability to fall in class 1.
What you want is the class to which the observations would fall? How do you convert this probabilities to get the classes? I posed this question to my colleagues. One of them said, this should come from business knowledge. How strict or lenient you want your model to become? Some said they would take any value greater than 0.5 as belonging to class 1 or else class 0. And few others were still contemplating.
I am assuming you understand what TPR and FPR means. If not, you may want to visit this post - TPR, FPR, ROC, and AUC.
By this time, I had already explained them ROC curve and confusion matrix. We went back to ROC curve and explained how the ROC curve gives you the true positive rate, false positive rate corresponding to a probability cut-off. The graph looks something like below.
How do you generate the above graph?
There are functions in R that can give you this plot in a single line. However, for the sake of doing it, I have written the below code that generated the above plot.
Choosing the probability cut-off
Once you have an understanding of ROC curve, we will proceed further to understand how we can use this plot to get the probability cut-off.
You choose some probability cut-offs say from 0.5 till 0.9 with some increment say 0.05 and calculate the TPR and FPR corresponding to each probability value.
You have to decide how much TPR and FPR you want. There is a trade-off between the tpr and fpr. If you want to increase TPR, your FPR will also increase. So depending on whether you want to detect all the positives (higher TPR) and willing to incur some error in terms of FPR, you decide the probability cut-off.
Many a times you may want to choose probability that gives you the maximum accuracy. However, care should be taken when you have a case where the response column is skewed. For instance, a bank wants to predict the loan defaulters.