Java 8 provides java.util.Arrays.parallelSort, which sorts arrays in parallel using the fork-join framework. But there\'s no corresponding Collections.parallelSort
By combining the existing answers I came up with this code.
This works if you are not interested in creating a custom List class and if you don't bother to create a temporary array (Collections.sort
is doing it anyway).
This uses the initial list and does not create a new one as in the parallelStream
solution.
// Convert List to Array so we can use Arrays.parallelSort rather than Collections.sort.
// Note that Collections.sort begins with this very same conversion, so we're not adding overhead
// in comparaison with Collections.sort.
Foo[] fooArr = fooLst.toArray(new Foo[0]);
// Multithread the TimSort. Automatically fallback to mono-thread when size is less than 8192.
Arrays.parallelSort(fooArr, Comparator.comparingStuff(Foo::yourmethod));
// Refill the List using the sorted Array, the same way Collections.sort does it.
ListIterator<Foo> i = fooLst.listIterator();
for (Foo e : fooArr) {
i.next();
i.set((Foo) e);
}
Just speculating here, but I see several good reasons for generic sort algorithms preferring to work on arrays instead of List
instances:
RandomAccess
, this probably means a lot of overhead compared to plain array accesses which can be optimized very well.List
instance on the other hand, can't be easily copied. New lists would have to be allocated which poses two problems. First, this means allocating some new objects which is likely more costly than allocating arrays. Second, the algorithm would have to choose what implementation of List
should be allocated for this temporary structure. There are two obvious solutions, both bad: either just choose some hard-coded implementation, e.g. ArrayList
, but then it could just allocate simple arrays as well (and if we're generating arrays then it's much easier if the soiurce is also an array). Or, let the user provide some list factory object, which makes the code much more complicated.List
interface offers is addAll()
method, but this is probably not efficient for most cases (think of pre-allocating the new list to its target size vs adding elements one by one which many implementations do).So probably the designers thought of CPU efficiency and code simplicity most of all, and this is easily achieved when the API accepts arrays. Some languages, e.g. Scala, have sort methods that work directly on lists, but this comes at a cost and probably is less efficient than sorting arrays in many cases (or sometimes there will probably just be a conversion to and from array performed behind the scenes).
There doesn't appear to be any straightforward way to sort a List
in parallel in Java 8. I don't think this is fundamentally difficult; it looks more like an oversight to me.
The difficulty with a hypothetical Collections.parallelSort(list, cmp)
is that the Collections
implementation knows nothing about the list's implementation or its internal organization. This can be seen by examining the Java 7 implementation of Collections.sort(list, cmp)
. As you observed, it has to copy the list elements out to an array, sort them, and then copy them back into the list.
This is the big advantage of the List.sort(cmp)
extension method over Collections.sort(list, cmp)
. It might seem that this is merely a small syntactic advantage being able to write myList.sort(cmp)
instead of Collections.sort(myList, cmp)
. The difference is that myList.sort(cmp)
, being an interface extension method, can be overridden by the specific List
implementation. For example, ArrayList.sort(cmp)
sorts the list in-place using Arrays.sort()
whereas the default implementation implements the old copyout-sort-copyback technique.
It should be possible to add a parallelSort
extension method to the List
interface that has similar semantics to List.sort
but does the sorting in parallel. This would allow ArrayList
to do a straightforward in-place sort using Arrays.parallelSort
. (It's not entirely clear to me what the default implementation should do. It might still be worth it to do copyout-parallelSort-copyback.) Since this would be an API change, it can't happen until the next major release of Java SE.
As for a Java 8 solution, there are a couple workarounds, none very pretty (as is typical of workarounds). You could create your own array-based List
implementation and override sort()
to sort in parallel. Or you could subclass ArrayList
, override sort()
, grab the elementData
array via reflection and call parallelSort()
on it. Of course you could just write your own List
implementation and provide a parallelSort()
method, but the advantage of overriding List.sort()
is that this works on the plain List
interface and you don't have to modify all the code in your code base to use a different List
subclass.
Use the following:
yourCollection.parallelStream().sorted().collect(Collectors.toList());
This will be parallel when sorting, because of parallelStream()
. I believe this is what you mean by parallel sort?
I think you are doomed to use a custom List
implementation augmented with your own parallelSort
or else change all your other code to store the big data in Array
types.
This is the inherent problem with layers of abstract data types. They're meant to isolate the programmer from details of implementation. But when the details of implementation matter - as in the case of underlying storage model for sort - the otherwise splendid isolation leaves the programmer helpless.
The standard List
sort documents provide an example. After the explanation that mergesort is used, they say
The default implementation obtains an array containing all elements in this list, sorts the array, and iterates over this list resetting each element from the corresponding position in the array. (This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.)
In other words, "since we don't know the underlying storage model for a List
and couldn't touch it if we did, we make a copy organized in a known way." The parenthesized expression is based on the fact that the List
"i'th element accessor" on a linked list is Omega(n), so the normal array mergesort implemented with it would be a disaster. In fact it's easy to implement mergesort efficiently on linked lists. The List
implementer is just prevented from doing it.
A parallel sort on List
has the same problem. The standard sequential sort fixes it with custom sort
s in the concrete List
implementations. The Java folks just haven't chosen to go there yet. Maybe in Java 9.