How to implement sort in hadoop?

后端 未结 1 1792
孤街浪徒
孤街浪徒 2021-02-06 00:07

My problem is sorting values in a file. keys and values are integers and need to maintain the keys of sorted values.

key   value
1     24
3     4
4     12
5              


        
1条回答
  •  误落风尘
    2021-02-06 00:33

    You can probably do this (I'm assuming you are using Java here)

    From maps emit like this -

    context.write(24,1);
    context.write(4,3);
    context.write(12,4)
    context.write(23,5)
    

    So, all you values that needs to be sorted should be the key in your mapreduce job. Hadoop by default sorts by ascending order of key.

    Hence, either you do this to sort in descending order,

    job.setSortComparatorClass(LongWritable.DecreasingComparator.class);
    

    Or, this,

    You need to set a custom Descending Sort Comparator, which goes something like this in your job.

    public static class DescendingKeyComparator extends WritableComparator {
        protected DescendingKeyComparator() {
            super(Text.class, true);
        }
    
        @SuppressWarnings("rawtypes")
        @Override
        public int compare(WritableComparable w1, WritableComparable w2) {
            LongWritable key1 = (LongWritable) w1;
            LongWritable key2 = (LongWritable) w2;          
            return -1 * key1.compareTo(key2);
        }
    }
    

    The suffle and sort phase in Hadoop will take care of sorting your keys in descending order 24,4,12,23

    After comment:

    If you require a Descending IntWritable Comparable, you can create one and use it like this -

    job.setSortComparatorClass(DescendingIntComparable.class);
    

    In case if you are using JobConf, use this to set

    jobConfObject.setOutputKeyComparatorClass(DescendingIntComparable.class);
    

    Put the following code below your main() function -

    public static void main(String[] args) {
        int exitCode = ToolRunner.run(new YourDriver(), args);
        System.exit(exitCode);
    }
    
    //this class is defined outside of main not inside
    public static class DescendingIntWritableComparable extends IntWritable {
        /** A decreasing Comparator optimized for IntWritable. */ 
        public static class DecreasingComparator extends Comparator {
            public int compare(WritableComparable a, WritableComparable b) {
                return -super.compare(a, b);
            }
            public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
                return -super.compare(b1, s1, l1, b2, s2, l2);
            }
        }
    }
    

    0 讨论(0)
提交回复
热议问题