Sometimes my MR job complains that MyMapper class in not found. And that i have to give job.setJarByClass(MyMapper.class); to tell it to load it from my jar file.
cloude
Be sure to add any dependencies to both the HADOOP_CLASSPATH
and -libjars
upon submitting a job like in the following examples:
Use the following to add all the jar dependencies from (for example) current and lib
directories:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:`echo *.jar`:`echo lib/*.jar | sed 's/ /:/g'`
Bear in mind that when starting a job through hadoop jar
you'll need to also pass it the jars of any dependencies through use of -libjars
. I like to use:
hadoop jar <jar> <class> -libjars `echo ./lib/*.jar | sed 's/ /,/g'` [args...]
NOTE: The sed
commands require a different delimiter character; the HADOOP_CLASSPATH
is :
separated and the -libjars
need to be ,
separated.
Yes, job.setJarByClass
is necessary. So that hadoop will copy your jar to the task trackers. If you does not invoke job.setJarByClass
, hadoop will think your jar is in the classpath of task trackers, so it does not copy your jar.