HBase Mapreduce Dependency Issue when using TableMapper

流过昼夜 提交于 2020-01-23 11:41:22

问题


I am using CDH5.3 and I am trying to write a mapreduce program to scan a table and do some proccessing. I have created a mapper which extends TableMapper and exception that i am getting is :

java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/usr/local/hadoop-2.5-cdh-3.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:267)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:388)

but as you can note here it is searching for protobuf-java-2.5.0.jar in the hdfs path but actually it is present in the local path - /usr/local/hadoop-2.5-cdh-3.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar , i verified . This is not happening with normal mapreduce programs . only when i am using TableMapper this error happens .

My driver code is as following :

   public class AppDriver  {

public static void main(String[] args) throws Exception{
 Configuration hbaseConfig = HBaseConfiguration.create();
    hbaseConfig.set("hbase.zookeeper.quorum", PropertiesUtil.getZookeperHostName());
    hbaseConfig.set("hbase.zookeeper.property.clientport", PropertiesUtil.getZookeperPortNum());

 Job job = Job.getInstance(hbaseConfig, "hbasemapreducejob");

    job.setJarByClass( AppDriver.class );

    // Create a scan
    Scan scan = new Scan();

    scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
    scan.setCacheBlocks(false);    // don't set to true for MR jobs
    // scan.setStartRow(Bytes.toBytes(PropertiesUtil.getHbaseStartRowkey()));
    //  scan.setStopRow(Bytes.toBytes(PropertiesUtil.getHbaseStopRowkey()));

 TableMapReduceUtil.initTableMapperJob(PropertiesUtil.getHbaseTableName(),scan, ESportMapper.class, Text.class, RecordStatusVO.class, job);
    job.setReducerClass( ESportReducer.class );

    job.setNumReduceTasks(1);
    TableMapReduceUtil.addDependencyJars(job);

    // Write the results to a file in the output directory
    FileOutputFormat.setOutputPath( job, new Path( args[1] ));


   boolean b = job.waitForCompletion(true);
    if (!b) {
        throw new IOException("error with job!");
    }

}

I am taking properties file as args[0] .

some more underline info :

i am using standalone CDH 5.3 in my local system and hbase 0.98.6 . my hbase is running on top of hdfs in sudo distributed mode .

my gradle.build is as following :

apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'application' 
 // Basic Properties
 sourceCompatibility = 1.7
 targetCompatibility = '1.7'

 version = '3.0'
 mainClassName ="com.ESport.mapreduce.App.AppDriver"


 jar {
    manifest { 
     attributes "Main-Class": "$mainClassName"
  }  

 from {
    configurations.compile.collect { it.isDirectory() ? it :   zipTree(it) }
}

 zip64 true
}


repositories {
mavenCentral()
maven { url "http://clojars.org/repo" }
maven { url "http://repository.cloudera.com/artifactory/cloudera-  repos/" }
}

dependencies {

testCompile group: 'junit', name: 'junit', version: '4.+'

compile group: 'commons-collections', name: 'commons-collections', version: '3.2'
compile 'org.apache.storm:storm-core:0.9.4'
compile 'org.apache.commons:commons-compress:1.5'
compile 'org.elasticsearch:elasticsearch:1.7.1'

compile('org.apache.hadoop:hadoop-client:2.5.0-cdh5.3.0'){
    exclude group: 'org.slf4j'
}
compile('org.apache.hbase:hbase-client:0.98.6-cdh5.3.0') {

    exclude group: 'org.slf4j'
    exclude group: 'org.jruby'
    exclude group: 'jruby-complete'
    exclude group: 'org.codehaus.jackson'

}

compile 'org.apache.hbase:hbase-common:0.98.6-cdh5.3.0'
compile 'org.apache.hbase:hbase-server:0.98.6-cdh5.3.0'
compile 'org.apache.hbase:hbase-protocol:0.98.6-cdh5.3.0'

compile('com.thinkaurelius.titan:titan-core:0.5.2'){
    exclude group: 'org.slf4j'
}
compile('com.thinkaurelius.titan:titan-hbase:0.5.2'){
    exclude group: 'org.apache.hbase'
    exclude group: 'org.slf4j'
}
compile('com.tinkerpop.gremlin:gremlin-java:2.6.0'){
    exclude group: 'org.slf4j'
}
compile 'org.perf4j:perf4j:0.9.16'

compile 'com.fasterxml.jackson.core:jackson-core:2.5.3'
compile 'com.fasterxml.jackson.core:jackson-databind:2.5.3'
compile 'com.fasterxml.jackson.core:jackson-annotations:2.5.3'
compile 'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.1.2'

}

and i am using this command to run the jar :

hadoop jar ESportingMapReduce-3.0.jar config.properties /myoutput


回答1:


If you are trying to setup in hbase in pseudo distributed mode, most probable reason for this adding hadoop home to $PATH.
By just removing hadoop home from $PATH you can start hbase in pseudo distributed mode.
Some people by default add hadoop home in .bashrc.
If you are added it in .bashrc remove hadoop home from it.



来源:https://stackoverflow.com/questions/34349720/hbase-mapreduce-dependency-issue-when-using-tablemapper

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!