First of all, let's state that your code is absolutely correct. It does what needs to be done and it's even optimized by using sets. It can be further improved in two ways, though:
Time complexity: you are sorting the whole dataset, which has a time complexity of O(mlogm)
, with m
being the size of your initial list of players. Immediately, you are taking the top N
elements of your list, with N << m
.
Below I'm showing a way to improve time complexity of the algorithm to O(mlogN)
, which means that in your specific case it would become O(m)
(this is because N=2
, so logN=log2=1
).
You are traversing the dataset 3 times: first you're iterating the list of players to create the map of counts, then you are iterating this map to get a set with the top N
players, and finally you're iterating the list of players again to check whether each player belongs to the set of top N
players.
This can be improved to perform only 2 passes over the dataset: the first one to create a map of counts (similar to what you've already done) and the other one to create a structure that will keep only the top N
elements, sorted by count descending, with the result ready to be returned once the traversal has finished.
Important: the solution below requires that your PlayerStatistics
class implements the hashCode
and equals
methods consistently.
First we have a generic method topN
that (not surprisingly) extracts the top N
elements from any given map. It does this by comparing its entries by value, descending (in this version, values V
must be Comparable
, but this algorithm can be easily extended to support values that don't implement Comparable
by providing a custom Comparator
):
public static
, T extends Comparable super T>>
Collection
topN(
Map map,
int N,
Function super K, ? extends T> tieBreaker) {
TreeMap, K> topN = new TreeMap<>(
Map.Entry.comparingByValue() // by value descending, then by key
.reversed() // to allow entries with duplicate values
.thenComparing(e -> tieBreaker.apply(e.getKey())));
map.entrySet().forEach(e -> {
topN.put(e, e.getKey());
if (topN.size() > N) topN.pollLastEntry();
});
return topN.values();
}
Here the topN
TreeMap behaves as a priority queue of size N
(though we add up to N+1
elements). First we put the entry into the topN
map, then, if the map has more than N
entries, we immediately invoke the pollLastEntry method on it, which removes the entry with the lowest priority (according to the order of the keys of the TreeMap
). This guarantees that upon traversal, the topN
map will only contain the top N
entries, already sorted.
Note that I'm using a comparator that first sorts the TreeMap, K>
by values V
in descending order, and then by keys K
. This is achieved with the help of the Function super K, ? extends T> tieBreaker
function, which transforms each key K
to a value T
that must be Comparable
. All this allows the map to contain entries with duplicate values of V
, without requiring keys K
to also be Comparable
.
Finally, you'd use the above method as follows:
Map counts = yourInitialListOfPlayers.stream()
.filter(x -> !"".equals(x.getName()) && x.getName() != null)
.collect(Collectors.groupingBy(x -> x, Collectors.counting()));
Collection top2 = topN(counts, 2, PlayerStatistics::getName);