In the Hadoop API documentation it\'s given
that
setJarByClass
public void setJarByClass(Class> cls)
Set the Jar by finding where a give
job.setJarByClass(WordCount.class);
Helps to identify the Jar which contains the Mapper
and Reducer
by specifying a class in that Jar.
Please note that the above method on the Job class is called in the driver. You driver is invoked form a client, typically your desktop or a edge machine which is not part of the cluster and your classes (in jar files) would be sitting on that machine. For your mapreduce job to run on the cluster, you need to send your Mapper, reducer and any other required classes to the cluster from your client machine. You driver class takes care of sending the jar file containing required classes to the cluster. Which jar to send you need to specify as the driver do not know which one should be sent amongst the heap of jar files you have on your driver's class path. This is done by using the method setJarByClass or setJar or any other variant of similar method on Job class.
Obviously if you don't specify this, meaning not calling this method or commenting it out will result in ClassNotFound exception on the slave nodes.
Hope this clarifies!
This method sets the jar file in which each node will look for the Mapper and Reducer classes.
It does not create a jar from the given class. Rather, it identifies the jar containing the given class. And yes, that jar file is "executed" (really the Mapper and Reducer in that jar file are executed) for the MapReduce job.
(Also see Stanley Xu's answer to a similar question about the need for this method since you give the jar on the command line)