What is more efficient: sorted stream or sorting a list?

后端 未结 3 1904
滥情空心
滥情空心 2021-01-03 21:43

Assume we have some items in a collection and we want to sort them using certain comparator, expecting result in a list:

Collection items = ...;
         


        
3条回答
  •  攒了一身酷
    2021-01-03 22:02

    It is safe to say that two forms of sort will have the same complexity ... even without looking at the code. (If they didn't then one form would be severely broken!)

    Looking at Java 8 source code for streams (specifically the internal class java.util.stream.SortedOps), the sorted() method adds a component to a stream pipeline that captures all of the stream elements into either an array or an ArrayList.

    • An array is used if and only if the pipeline assembly code can deduce the number of elements in the stream ahead of time.

    • Otherwise, an ArrayList is used to gather the elements to be sorted.

    If an ArrayList is used, you incur the extra overhead of building / growing the list.

    Then we return to two versions of the code:

    List sortedItems = new ArrayList<>(items);
    Collections.sort(sortedItems, itemComparator);
    

    In this version, the ArrayList constructor copies the elements items to an appropriately sized array, and then Collections.sort does an in-place sort of that array. (This happens under the covers).

    List sortedItems = items
        .stream()
        .sorted(itemComparator)
        .collect(Collectors.toList());
    

    In this version, as we have seen above, the code associated with sorted() either builds and sorts an array (equivalent to what happens above) or it builds the ArrayList the slow way. But on top of that, there are the overheads of stream the data from items and to the collector.

    Overall (with the Java 8 implementation at least) code examination tells me that first version of the code cannot be slower than the second version, and in most (if not all) cases it will be faster. But as the list gets larger, the O(NlogN) sorting will tend to dominate the O(N) overheads of copying. That will mean that the relative difference between the two versions will get smaller.

    If you really care, you should be able to write a benchmark to test the actual difference with a specific implementation of Java, and a specific input dataset. (Or adapt @Eugene's benchmark!)

提交回复
热议问题