I am on a EMR cluster with AMI 3.0.4. Once the cluster is up, I ssh to master and did the following manually:
cd /home/hadoop/share/hadoop/common/lib/
rm guava-11.0.2.jar
wget http://central.maven.org/maven2/com/google/guava/guava/14.0.1/guava-14.0.1.jar
chmod 777 guava-14.0.1.jar
Is it possible to do above in a bootstrap action? Thanks!
With EMR 4.0 the hadoop installation path changed. So the manual update of guava-14.0.1.jar must be changed to:
cd /usr/lib/hadoop/lib
sudo wget http://central.maven.org/maven2/com/google/guava/guava/14.0.1/guava-14.0.1.jar
sudo rm guava-11.0.2.jar
The boostrap Action in the Answer from Sandesh doesn't work for us.
Now we got a solution for EMR 4.0. You have to provide a spark-config.json in S3 which sets the extra ClassPath for both the Spark Executor and Driver. In the "Edit software settings (optional)" section you can define the location of this config file and load it from S3.
The guava-14.0.1.jar needs to be downloaded via the boostrap script: guava_download.sh
mkdir -p /home/hadoop/lib/
cd /home/hadoop/lib/
wget https://repo1.maven.org/maven2/com/google/guava/guava/14.0.1/guava-14.0.1.jar
Yes , you can add bootstrap script to do this. create a shell script and upload it s3 and then use the path for script in bootstrap action for EMR.
e.g you can keep guava-14.0.1.jar in s3 bucket and download it
hadoop fs -copyToLocal s3n://rootbucket/myjars/guava-14.0.1.jar /home/hadoop/share/hadoop/common/lib/
rm -rf /home/hadoop/share/hadoop/common/lib/guava-11.0.2.jar
I assume you are doing it as you have some dependency from with 14.0.1 jar from your map reduce code. You can build fat jar with guava-14.0.1.jar added and upload the jar as your custom jar to run you job