My problem is sorting values in a file. keys and values are integers and need to maintain the keys of sorted values.
key value
1 24
3 4
4 12
5
You can probably do this (I'm assuming you are using Java here)
From maps emit like this -
context.write(24,1);
context.write(4,3);
context.write(12,4)
context.write(23,5)
So, all you values that needs to be sorted should be the key in your mapreduce job. Hadoop by default sorts by ascending order of key.
Hence, either you do this to sort in descending order,
job.setSortComparatorClass(LongWritable.DecreasingComparator.class);
Or, this,
You need to set a custom Descending Sort Comparator, which goes something like this in your job.
public static class DescendingKeyComparator extends WritableComparator {
protected DescendingKeyComparator() {
super(Text.class, true);
}
@SuppressWarnings("rawtypes")
@Override
public int compare(WritableComparable w1, WritableComparable w2) {
LongWritable key1 = (LongWritable) w1;
LongWritable key2 = (LongWritable) w2;
return -1 * key1.compareTo(key2);
}
}
The suffle and sort phase in Hadoop will take care of sorting your keys in descending order 24,4,12,23
After comment:
If you require a Descending IntWritable Comparable, you can create one and use it like this -
job.setSortComparatorClass(DescendingIntComparable.class);
In case if you are using JobConf, use this to set
jobConfObject.setOutputKeyComparatorClass(DescendingIntComparable.class);
Put the following code below your main()
function -
public static void main(String[] args) {
int exitCode = ToolRunner.run(new YourDriver(), args);
System.exit(exitCode);
}
//this class is defined outside of main not inside
public static class DescendingIntWritableComparable extends IntWritable {
/** A decreasing Comparator optimized for IntWritable. */
public static class DecreasingComparator extends Comparator {
public int compare(WritableComparable a, WritableComparable b) {
return -super.compare(a, b);
}
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
return -super.compare(b1, s1, l1, b2, s2, l2);
}
}
}