# When R package is not available across the cluster

When deploying R codes across the cluster, many a times the reason for the failure of the task is unavailability of a particular package across all nodes of the cluster. We wait for someone to get the package installed across all the nodes. This may take some days. Do we wait for them? Naah!

Presenting a temporary solution that one of my colleague came up with. I have used this technique and this works smoothly.

The following are the steps:

1. Install the package you require on one of the edge nodes into a local directory
• Create a local directory. Let's say our directory name is rPackages
    mkdir rPackages

• Install the required package, say 'randomForest' in the directory just created
    install.packages('randomForest', repos=’repo_name', lib='rPackages/')
Note that you need to choose the appropriate repo_name. The one that your company allows.


2. Check if you can load the package from this local directory

library(randomForest, lib.loc='rPackages/')


3. Create zip file of “dir_location” using command

zip -r rPackages.zip rPackages/


4. Add this zip file in your HIVE hql (or anything else)

add file rPackages.zip;
Don’t forget the semicolon


5. Unzip the file inside R script i.e. each reducer will have rPackages directory now

unzip('rPackages.zip', overwrite=TRUE)


library(randomForest, lib.loc='rPackages/')