问题
I have a lot of music and i want to rank them from least favorite to favorite (this will take many days). I would like to compare two music files at a time (2-way comparison). I saw some questions on algorithms with the fewest comparisons. But the catch is that (since it's a long process) i want to add new music to the collection and in that case i don't want to start over sorting everything (thus creating a lot more comparison steps).
Which algorithm has the least amount of comparisons while still allowing new elements to be added which need to be compared too?
I'm not interested in least amount of comparisons for just a few items. Let's say 1000 items minimum.
Bonus if the algorithm supports N-way comparison (where N > 2) in case i would like to compare pictures instead.
EDIT: comparing two songs are a manual process by listening to them (thus slowly), the sorting algorithm is needed to rank them in the fewest amount of comparisons
回答1:
A non-comparative sorting algorithm, like radix sort, can sort data with 0 comparisons! These are not as generic as comparative sorting algorithms like merge or insertion sort, but can get far better runtime if your data meets the necessary requirements.
Essentially, if you have knowledge about the distribution of your data, you can sort faster than O(n log n). For instance, if you are sorting n numbers, and know that they are integers between 1 and N, you can use counting sort to sort them in O(n + N). You can iteratively add elements for O(1) as well.
Applying this to your problem of ranking music is more challenging (songs are not integers), but you can you do a variation of bucket sort where you first bin your music into, say, 10% "tiers": top 0-10%, 10-20%, 20-30%, ..., 90-100% (i.e., the bottom). Then you can either recursively apply bucket sort to those (top 0-1%, 1-2%, etc.) or apply standard sorting algorithms. Eventually, you'll need to do a standard comparison sort. This approach, compared with only using comparison sort, will reduce the number of comparisons by a factor of log(n)/log(n/B), where B is the number of buckets. For 100 buckets and 10000 songs, this is a factor of 2 reduction.
An alternative, comparison-saving approach is to do insertion sort (for both initial sorting and later insertions) with a modified binary search: instead of setting the initial bounds of the binary serach at 0 and n, set them to values based on your own intuition of where you are certain it will end up, like 0 and n/10, if it's definitely in your top 10%. The more granularly you can do this, the fewer comparisons you will need.
Caveat: with both bucket sort and the modified binary search, if you are wrong, you will need to do additional comparisons to fix your mistake.
And one final word: this question assumes that there exists is a correct ranking and that it can be achieved via comparisons. If you have circular preferences, such as a > b, b > c, and c > a, a la rock-paper-scissors, then a ranking cannot be constructed. The algorithms will still complete, but the resulting list will be inconsistent.
回答2:
There seem to be two stages in your problem. The first stage is to sort all of the songs you already have, and the second stage is to insert new songs, one-by-one, into the already-sorted order.
The first stage is what standard sorting algorithms do. In this stage, the input is an array presumed to be completely unordered, and all of the sorting is done at once. You want to do this using the minimum number of comparisons possible.
There is no perfect answer to this question; no known sorting algorithm uses a provably minimum number of comparisons for all inputs. Information theory gives n log₂ n - 1.443 n + O(log n) as a theoretical lower bound for the average number of comparisons, but this bound has not been achieved.
The currently-known sorting algorithms which get closest to the above bound are merge-insertion sort (also known as the Ford–Johnson algorithm), and variations of it. Merge-insertion sort performs on average approximately n log₂ n - 1.415 n comparisons, which is very close to the theoretical bound. For 1024 items, you'd probably be doing something like ~8,790 comparisons, where the theoretical bound is like ~8,760.
According to this other Stack Overflow answer as of December 2018, none of the algorithms which improve on merge-insertion sort are "freely documented", which I take to mean that these improved algorithms are only presented in academic papers. More public information is available for merge-insertion sort, and there is not much room for the variants to improve on it, so I would suggest going with this algorithm rather than wading through academic literature; unless your n is much larger, there is little to gain from it.
The second stage is a different problem than what sorting algorithms solve. In this stage, you need an "online" algorithm which allows adding new items into the current sorted order.
You cannot do this with fewer than ⌈log₂ (n + 1)⌉ comparisons per insertion, because there are n + 1 positions the new item could belong in the current order, and each comparison gives one bit of information.
The binary search algorithm works to find the correct position in a sorted array; or you could use a balanced binary search tree data structure. Either way, each insertion will be achieved using the optimal number of comparisons. The advantage of using a binary search tree is that insertion takes O(log n) time overall; inserting into a sorted array takes O(log n) comparisons but O(n) time to move other elements around in the array.
回答3:
Assuming that your music library has no order to it, merge sort is the best sorting algorithm to use. Adding elements while merge sort is ongoing is not so easy though.
I think your best bet is a depth limited search tree like the 2-3 tree or the red-black tree. Personally I would suggest the 2-3 tree as the red black is a variant of it with less per node complexity but a worse minimum depth bound.
Using this tree you can simply start adding songs to it according to the rules clearly described on Wikipedia and every song you have added will be in sorted order. This has the added benefit that when inserting a song it will be compared multiple times in a row and thus it will be fresh in your memory so you might not need to listen to it for every comparison.
This method sorts your songs one at a time so if a new song comes along that you want to rank right away you can just add it before the rest of the unsorted songs.
You will probably need to make a program to assist you with maintaining the order and the tree structure. The only manual way I can think of is to use nested folders as nodes which makes adding to and rearranging the tree doable. It does make querying a bit of a hassle though, depending on what you want to do.
来源:https://stackoverflow.com/questions/59770437/which-sorting-algorithm-uses-the-fewest-number-of-comparisons-while-elements-are