Java 8 Stream, How to get Top N count? [closed]

穿精又带淫゛_ 提交于 2019-12-03 08:39:55

First of all, let's state that your code is absolutely correct. It does what needs to be done and it's even optimized by using sets. It can be further improved in two ways, though:

  1. Time complexity: you are sorting the whole dataset, which has a time complexity of O(mlogm), with m being the size of your initial list of players. Immediately, you are taking the top N elements of your list, with N << m.

    Below I'm showing a way to improve time complexity of the algorithm to O(mlogN), which means that in your specific case it would become O(m) (this is because N=2, so logN=log2=1).

  2. You are traversing the dataset 3 times: first you're iterating the list of players to create the map of counts, then you are iterating this map to get a set with the top N players, and finally you're iterating the list of players again to check whether each player belongs to the set of top N players.

    This can be improved to perform only 2 passes over the dataset: the first one to create a map of counts (similar to what you've already done) and the other one to create a structure that will keep only the top N elements, sorted by count descending, with the result ready to be returned once the traversal has finished.

Important: the solution below requires that your PlayerStatistics class implements the hashCode and equals methods consistently.

First we have a generic method topN that (not surprisingly) extracts the top N elements from any given map. It does this by comparing its entries by value, descending (in this version, values V must be Comparable<V>, but this algorithm can be easily extended to support values that don't implement Comparable<V> by providing a custom Comparator<V>):

public static 
<K, V extends Comparable<? super V>, T extends Comparable<? super T>>
Collection<K> 
topN(
        Map<K, V> map, 
        int N,
        Function<? super K, ? extends T> tieBreaker) {

    TreeMap<Map.Entry<K, V>, K> topN = new TreeMap<>(
        Map.Entry.<K, V>comparingByValue()      // by value descending, then by key
            .reversed()                         // to allow entries with duplicate values
            .thenComparing(e -> tieBreaker.apply(e.getKey())));

    map.entrySet().forEach(e -> {
      topN.put(e, e.getKey());
      if (topN.size() > N) topN.pollLastEntry();
    });

    return topN.values();
}

Here the topN TreeMap behaves as a priority queue of size N (though we add up to N+1 elements). First we put the entry into the topN map, then, if the map has more than N entries, we immediately invoke the pollLastEntry method on it, which removes the entry with the lowest priority (according to the order of the keys of the TreeMap). This guarantees that upon traversal, the topN map will only contain the top N entries, already sorted.

Note that I'm using a comparator that first sorts the TreeMap<Map.Entry<K, V>, K> by values V in descending order, and then by keys K. This is achieved with the help of the Function<? super K, ? extends T> tieBreaker function, which transforms each key K to a value T that must be Comparable<T>. All this allows the map to contain entries with duplicate values of V, without requiring keys K to also be Comparable<K>.

Finally, you'd use the above method as follows:

Map<PlayerStatistics, Long> counts = yourInitialListOfPlayers.stream()
    .filter(x -> !"".equals(x.getName()) && x.getName() != null)
    .collect(Collectors.groupingBy(x -> x, Collectors.counting()));

Collection<PlayerStatistics> top2 = topN(counts, 2, PlayerStatistics::getName);
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!