One reducer in Custom Partitioner makes mapreduce jobs slower

好久不见. 提交于 2019-12-08 12:41:59

问题


Hi i have an application that reads records from HBase and writes into text files. Application is working as expected but when tested this for huge data it is taking 1.20 hour to complete the job . Here is the details of my application

  1. Size the data in the HBase is 400 GB approx 2 billions records .
  2. I have created 400 regions in the HBase tabl so 400 mappers .
  3. I have used custom Partitioner that puts records into 194 text files.
  4. I have lzo compression for map output and gzip for final output.
  5. I have used md5 hashing for my row key

I have used custom partitioner for my data segregation . I have 194 partitioner and reducer and all reducer gets completed very fast except last two that has very huge no of records because of the condition .

I don not know how handle this situation.

My condition is such that two partitoner will get large no of records and i can not change that also .

All reducer gets completed within 3 minutes but because of that overall job takes 30 mintes of time .

Here is my implementation

hbaseConf.set("mapreduce.map.output.compress", "true");
hbaseConf.set("mapreduce.map.output.compress.codec", "org.apache.hadoop.io.compress.SnappyCodec");

FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);

My Partitioner logic is here

if (str.contains("Japan|^|2017|^|" + strFileName + "")) {

    return 0;

} else if (str.contains("Japan|^|2016|^|" + strFileName + "")) {

    return 1;

} else if (str.contains("Japan|^|2015|^|" + strFileName + "")) {

    return 2;

} else if (str.contains("Japan|^|2014|^|" + strFileName + "")) {

    return 3;

} else if (str.contains("Japan|^|2013|^|" + strFileName + "")) {

    return 4;
}

来源:https://stackoverflow.com/questions/43112385/one-reducer-in-custom-partitioner-makes-mapreduce-jobs-slower

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!