HBase Mapreduce Dependency Issue when using TableMapper

问题

I am using CDH5.3 and I am trying to write a mapreduce program to scan a table and do some proccessing. I have created a mapper which extends TableMapper and exception that i am getting is :

java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/usr/local/hadoop-2.5-cdh-3.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:267)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:388)

but as you can note here it is searching for protobuf-java-2.5.0.jar in the hdfs path but actually it is present in the local path - /usr/local/hadoop-2.5-cdh-3.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar , i verified . This is not happening with normal mapreduce programs . only when i am using TableMapper this error happens .

My driver code is as following :

   public class AppDriver  {

public static void main(String[] args) throws Exception{
 Configuration hbaseConfig = HBaseConfiguration.create();
    hbaseConfig.set("hbase.zookeeper.quorum", PropertiesUtil.getZookeperHostName());
    hbaseConfig.set("hbase.zookeeper.property.clientport", PropertiesUtil.getZookeperPortNum());

 Job job = Job.getInstance(hbaseConfig, "hbasemapreducejob");

    job.setJarByClass( AppDriver.class );

    // Create a scan
    Scan scan = new Scan();

    scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
    scan.setCacheBlocks(false);    // don't set to true for MR jobs
    // scan.setStartRow(Bytes.toBytes(PropertiesUtil.getHbaseStartRowkey()));
    //  scan.setStopRow(Bytes.toBytes(PropertiesUtil.getHbaseStopRowkey()));

 TableMapReduceUtil.initTableMapperJob(PropertiesUtil.getHbaseTableName(),scan, ESportMapper.class, Text.class, RecordStatusVO.class, job);
    job.setReducerClass( ESportReducer.class );

    job.setNumReduceTasks(1);
    TableMapReduceUtil.addDependencyJars(job);

    // Write the results to a file in the output directory
    FileOutputFormat.setOutputPath( job, new Path( args[1] ));


   boolean b = job.waitForCompletion(true);
    if (!b) {
        throw new IOException("error with job!");
    }

}

I am taking properties file as args[0] .

some more underline info :

i am using standalone CDH 5.3 in my local system and hbase 0.98.6 . my hbase is running on top of hdfs in sudo distributed mode .

my gradle.build is as following :

apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'application' 
 // Basic Properties
 sourceCompatibility = 1.7
 targetCompatibility = '1.7'

 version = '3.0'
 mainClassName ="com.ESport.mapreduce.App.AppDriver"


 jar {
    manifest { 
     attributes "Main-Class": "$mainClassName"
  }  

 from {
    configurations.compile.collect { it.isDirectory() ? it :   zipTree(it) }
}

 zip64 true
}


repositories {
mavenCentral()
maven { url "http://clojars.org/repo" }
maven { url "http://repository.cloudera.com/artifactory/cloudera-  repos/" }
}

dependencies {

testCompile group: 'junit', name: 'junit', version: '4.+'

compile group: 'commons-collections', name: 'commons-collections', version: '3.2'
compile 'org.apache.storm:storm-core:0.9.4'
compile 'org.apache.commons:commons-compress:1.5'
compile 'org.elasticsearch:elasticsearch:1.7.1'

compile('org.apache.hadoop:hadoop-client:2.5.0-cdh5.3.0'){
    exclude group: 'org.slf4j'
}
compile('org.apache.hbase:hbase-client:0.98.6-cdh5.3.0') {

    exclude group: 'org.slf4j'
    exclude group: 'org.jruby'
    exclude group: 'jruby-complete'
    exclude group: 'org.codehaus.jackson'

}

compile 'org.apache.hbase:hbase-common:0.98.6-cdh5.3.0'
compile 'org.apache.hbase:hbase-server:0.98.6-cdh5.3.0'
compile 'org.apache.hbase:hbase-protocol:0.98.6-cdh5.3.0'

compile('com.thinkaurelius.titan:titan-core:0.5.2'){
    exclude group: 'org.slf4j'
}
compile('com.thinkaurelius.titan:titan-hbase:0.5.2'){
    exclude group: 'org.apache.hbase'
    exclude group: 'org.slf4j'
}
compile('com.tinkerpop.gremlin:gremlin-java:2.6.0'){
    exclude group: 'org.slf4j'
}
compile 'org.perf4j:perf4j:0.9.16'

compile 'com.fasterxml.jackson.core:jackson-core:2.5.3'
compile 'com.fasterxml.jackson.core:jackson-databind:2.5.3'
compile 'com.fasterxml.jackson.core:jackson-annotations:2.5.3'
compile 'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.1.2'

}

and i am using this command to run the jar :

hadoop jar ESportingMapReduce-3.0.jar config.properties /myoutput

回答1:

If you are trying to setup in hbase in pseudo distributed mode, most probable reason for this adding hadoop home to $PATH.
By just removing hadoop home from $PATH you can start hbase in pseudo distributed mode.
Some people by default add hadoop home in .bashrc.
If you are added it in .bashrc remove hadoop home from it.

来源：https://stackoverflow.com/questions/34349720/hbase-mapreduce-dependency-issue-when-using-tablemapper

标签

Hadoop

MapReduce

hbase

build.gradle

cloudera-cdh