Tutorial on dplyr- a package for data manipulation in R

May 15, 2017

R is the most used tool in data science. It has no dearth of packages for specific use cases. There are three packages that I feel can get your most of the work done - ggplot2, dplyr, data.table.

ggplot2- Used for visualization. Also known as grammar of graphics. This package ...

The essence of machine learning is function estimation

May 12, 2017

Machine learning is cool. There is no denying in that. In this post we will try to make it a little uncool, well it will still be cool but you may start looking at it differently. Machine learning is not a black box. It is intuitive and this post is ...

Time series and forecasting using R

May 03, 2017

Time series forecasting is a skill that few people claim to know. Machine learning is cool. And there are a lot of people interested in becoming a machine learning expert. But forecasting is something that is a little domain specific.

Retailers like Walmart, Target use forecasting systems and tools to ...

Diving into H2O with R

Mar 28, 2017

Do you understand the pain when you have to train advanced machine learning algorithms like Random Forest on huge datasets? When there is a factor column that has way too many number of levels? When the time taken to train the model is so huge that you went to your ...

An illustrated introduction to adversarial validation part 2

Feb 16, 2017

In the last post we talked about the idea of adversarial validation and how it helps the problem when your validation set result doesn't comply with that of test set result. In this post, I will share the R code to help achieve the idea of adversarial validation. The ...

How to use Git and Github

Feb 15, 2017

I had taken this course - How to use git and github some time last year. This post is an amalgamation of the course notes and other tutorials I have completed in understanding git. I will talk about the most frequently used commands. If you already are confident of your git ...

An illustrated introduction to adversarial validation part 1

Feb 15, 2017

You'd have heard about cross-validation - a common technique used in data-science process to avoid overfitting and many a times to tune the optimal parameters. Overfitting is when the model does well on training data but fails drastically on test data. The reason could be one of the following:

The ...

The curse of bias and variance [draft]

Feb 08, 2017

Statistics is the field of study where we try to draw conclusions about the population from a sample. Why do we talk about sample? Why can't we get the conclusions about the population directly from the population? Let me illustrate this by an example.

Let us say we want ...

Visualization in ML is under-rated

Jan 27, 2017

Visualization is one of the most important pillars of data science. Every one wants to learn Machine learning but if you explain them the little tasks that involve the overall workflow of the process, it turns them off. Everyone just wants to do the cool stuff. They want to build ...

Random Forest explained intuitively

Oct 18, 2016

Photo of a forest

Random Forests algorithm has always fascinated me. I like how this algorithm can be easily explained to anyone without much hassle. One quick example, I use very ...

← Older Newer →

Manish Barnwal

...just another human