How to sumit a mapreduce job to remote cluster configured with yarn?

大兔子大兔子 提交于 2019-12-14 02:22:33

问题


I am trying to execute a simple mapreduce program from eclipse .Following is my program

package wordcount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://quickstart.cloudera:8020");
        conf.set("mapreduce.framework.name", "yarn");
        conf.set("yarn.resourcemanager.address", "quickstart.cloudera:8032");
        conf.set("yarn.app.mapreduce.am.staging-dir", "/user");
        Job job = Job.getInstance(conf);
        job.setJarByClass(WordCount.class);
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/hadoop-mapreduce-client-app-2.6.0-cdh5.7.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/hadoop-yarn-common-2.6.0-cdh5.7.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/hadoop-common-2.6.0-cdh5.7.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/hadoop-yarn-api-2.6.0-cdh5.7.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/hadoop-mapreduce-client-core-2.6.0-cdh5.7.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/hadoop-mapreduce-client-common-2.6.0-cdh5.7.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/commons-logging-1.2.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/guava-15.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/commons-collections-3.2.2.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/protobuf-java-2.5.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/commons-configuration-1.7.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/commons-lang-2.6.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/log4j-1.2.16.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/slf4j-api-1.7.5.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/slf4j-log4j12-1.7.5.jar"));
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path("/user/cloudera/prasad/test.txt"));
        FileOutputFormat.setOutputPath(job, new Path("/user/cloudera/prasad/wordout2"));
        job.waitForCompletion(true);
    }
}

Initially when I run the above program, it was throwing some ClassNotFoundExceptions in container log, so I have added all the corresponding jars as written in the program.Now its not showing any errors in container log but it is the mapreduce job is getting failed.

But resource manager showing the below error

Exception from container-launch with container ID: container_1473338609943_0003_01_000001 and exit code: 1
ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
    at org.apache.hadoop.util.Shell.run(Shell.java:478)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

when I click on application log it is not showing anything just displaying the following message.

 Log Type: stderr

Log Upload Time: Thu Sep 08 05:26:35 -0700 2016

Log Length: 243

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.impl.MetricsSystemImpl).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


Log Type: stdout

Log Upload Time: Thu Sep 08 05:26:35 -0700 2016

Log Length: 0 

Please let me know what is wrong with my program.

Following are screens of libs I am using

来源:https://stackoverflow.com/questions/39396103/how-to-sumit-a-mapreduce-job-to-remote-cluster-configured-with-yarn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!