Fast sort algorithms for arrays with mostly duplicated elements?

后端 未结 5 842
野的像风
野的像风 2021-01-11 17:03

What are efficient ways to sort arrays that have mostly a small set of duplicated elements? That is, a list like:

{ 10, 10, 55, 10, 999, 8851243, 10, 55, 55, 55, 10

相关标签:
5条回答
  • 2021-01-11 17:18

    Implementation in C++ based on algo as suggested by @Antti Huima

    • Count frequencies and store in hashtable.
    • sort elements in hashtable.
    • overwrite input array with sorted elements depending on frequencies.
    #include <unordered_map>
    #include <map>
    // Modifies input array to a sorted array
    // Complexity: O(n+(k*log(k))) where 'k' = number of unique elements input array
    template <typename Datatype>
    void SortArrayWithDuplicates(std::vector<Datatype>& in_seq) {
      std::unordered_map<Datatype, int> key_counts_map;
      // Count freqs O(n)
      for (const auto& itr: in_seq)
          key_counts_map[itr] += 1;
    
      // Sort elements by inserting into a map O(k*log(k))
      std::map<Datatype, int> key_counts_sorted_map;
      for (auto const& itr: key_counts_map)
          key_counts_sorted_map.insert(std::make_pair(itr.first, itr.second));
    
      auto AlwaysTrue = [](Datatype i)->bool{return true;};
      auto seq_itr = std::begin(in_seq);
      // Update input sequence with new sorted values
      for (auto const& itr: key_counts_sorted_map) {
          std::replace_if(seq_itr, seq_itr+itr.second, AlwaysTrue, itr.first);
          seq_itr += itr.second;
      }
    }
    
    0 讨论(0)
  • 2021-01-11 17:20

    In practice, you can first iterate through the array once and use a hash table the count the number of occurrences of the individual elements (this is O(n) where n = size of the list). Then take all the unique elements and sort them (this is O(k log k) where k = number of unique elements), and then expand this back to a list of n elements in O(n) steps, recovering the counts from the hash table. If k << n you save time.

    0 讨论(0)
  • 2021-01-11 17:23

    IMO Pidgeonhole sort is a good example for such data.

    I will clarify a bit: if you know that quantity of the unique elements in the array is reasonable and you know that there are a lot of duplicates I would think of implementing something like counting sort but make list of "buckets" dynamic. After the first pass you will get rid of duplicates, then sort array without duplicates with some good sorting algorithm and then restore sorted array in a way like counting sort does.

    0 讨论(0)
  • 2021-01-11 17:25

    I would try Counting sort with some mapping function. Ie. you wont use the frequencies array of size equal to the range of elements, instead you would iterate over the array, write down distinct elements and use them in a mapping function to the array of frequencies.

    This way the algorithm has only one extra iteration and a mapping function, which should work in a constant time (using some kind of hash table). The complexity of this approach would be O(n), which should be optimal.

    0 讨论(0)
  • 2021-01-11 17:30

    Not the best algorithm, but simple:
    You can put everything in a trie, and have the leaves be counters. That should take O(n*m) where n is the number of elements and m is the size of the biggest element (typically that would be a constant, but not necessarily). Then pre-order traverse the tie, outputting counter elements of the current key when you hit a leaf. That should take only O(n+p) where p is the size of the trie, which should be tiny compared to n.

    0 讨论(0)
提交回复
热议问题