Problem with -libjars in hadoop

后端 未结 3 1905
遥遥无期
遥遥无期 2020-12-01 19:07

I am trying to run MapReduce job on Hadoop but I am facing an error and I am not sure what is going wrong. I have to pas library jars which is required by my mapper.

相关标签:
3条回答
  • 2020-12-01 19:34

    Also worth to note subtle but important point: the way to specify additional JARs for JVMs running distributed map reduce tasks and for JVM running job client is very different.

    • -libjars makes Jars only available for JVMs running remote map and reduce task

    • To make these same JAR’s available to the client JVM (The JVM that’s created when you run the hadoop jar command) need to set HADOOP_CLASSPATH environment variable:

    $ export LIBJARS=/path/jar1,/path/jar2
    $ export HADOOP_CLASSPATH=/path/jar1:/path/jar2
    $ hadoop jar my-example.jar com.example.MyTool -libjars ${LIBJARS} -mytoolopt value
    

    See: http://grepalex.com/2013/02/25/hadoop-libjars/

    Another cause of incorrect -libjars behaviour could be in wrong implementation and initialization of custom Job class.

    • Job class must implement Tool interface
    • Configuration class instance must be obtained by calling getConf() instead of creating new instance;

    See: http://kickstarthadoop.blogspot.ca/2012/05/libjars-not-working-in-custom-mapreduce.html

    0 讨论(0)
  • 2020-12-01 19:44

    I found the answer, it was throwing error cause I was missing on the "main" class name in the command.

    The correct way to execute is: hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar /home/hadoop/vardtst.jar VardTest -libjars /home/hadoop/clui.jar,/home/hadoop/model.jar gutenberg ou101

    where VardTest is the class containing the main() method.

    Thanks

    0 讨论(0)
  • 2020-12-01 19:51

    When you are specifying the -LIBJARS with the Hadoop jar command. First make sure that you edit your driver class as shown below:

        public class myDriverClass extends Configured implements Tool {
    
          public static void main(String[] args) throws Exception {
             int res = ToolRunner.run(new Configuration(), new myDriverClass(), args);
             System.exit(res);
          }
    
          public int run(String[] args) throws Exception
          {
    
            // Configuration processed by ToolRunner 
            Configuration conf = getConf();
            Job job = new Job(conf, "My Job");
    
            ...
            ...
    
            return job.waitForCompletion(true) ? 0 : 1;
        }
    }
    

    Now edit your "hadoop jar" command as shown below:

    hadoop jar YourApplication.jar [myDriverClass] args -libjars path/to/jar/file

    Now lets understand what happens underneath. Basically we are handling the new command line arguments by implementing the TOOL Interface. ToolRunner is used to run classes implementing Tool interface. It works in conjunction with GenericOptionsParser to parse the generic hadoop command line arguments and modifies the Configuration of the Tool.

    Within our Main() we are calling ToolRunner.run(new Configuration(), new myDriverClass(), args) - this runs the given Tool by Tool.run(String[]), after parsing with the given generic arguments. It uses the given Configuration, or builds one if it's null and then sets the Tool's configuration with the possibly modified version of the conf.

    Now within the run method, when we call getConf() we get the modified version of the Configuration. So make sure that you have the below line in your code. If you implement everything else and still make use of Configuration conf = new Configuration(), nothing would work.

    Configuration conf = getConf();

    0 讨论(0)
提交回复
热议问题