问题
I want to do an empirical test on the speed of sorting algorithms. Initially I randomly generated data but this seems to be unfair and mess up some algorithms. For example with quicksort the pivot selection is important and one method of picking the pivot is to always pick the first and another method is to pick the median of the first, last, and middle elements. But if the array is already random it doesn't matter which pivot is selected, so in this sense it's unfair. How do you resolve this?
Where can I get real world samples for testing sorting algorithms? I've heard in real scenarios data is often partially sorted, but how is this information used in a sorting algorithm?
回答1:
To test the efficiency of sorting algorithms, several data sets are usually used and timed separately. Completely random, partially sorted, completely sorted, and sorted but reversed data are run through the same algorithms to come up with suitable averages in each field. This creates the most fair testing environment possible.
While some algorithms are, on average, much better than others, they each serve their own unique purpose in a solution.
While void of numerical data, the process that I am describing can be seen in an interesting animation on sorting-algorithms.com.
回答2:
Most of your questions have already been answered, so I will answer your last question, which is how sorting algorithms make use of the fact that the given data is partially sorted. One good example is modified merge sort, where the data is scanned initially to identify all sub-arrays of partially sorted data, then merging those sub-arrays together starting from the shorter sub-arrays. This method can lead to a substantial speed up over other algorithms that do not make use of the partially sorted information.
回答3:
You want to do an empirical comparison of sorting algorithms. That's good and the results are usually educational.
The way empirical testing works, though, is that you decide on a bunch of things you think are worth measuring, then you run the experiment and measure them.
If you decide that you care about the average-case performance of your sorting algorithm, you generate a bunch of random data and take the average of the running times.
If you decide that you care about the worst-case performance, you have to do rather more work. There are N! ways to generate a permutation of length N, and that's too many when N is big. So you have to analyse the algorithms to figure out what kind of data will elicit the worst case and write a generator that generates such data.
Usually you do this sort of thing because you care about how a given sorting algorithm will perform in your particular use case. So you generate lots of data typical for your use case and feed it through various sorting algorithms. Then you crunch the numbers in a manner appropriate for your setting.
For sorting in particular, you can always randomly scramble the data before feeding it to the sorting algorithm. (These days you'll want to scramble it in a manner that's friendly to the cache, but that's not too tricky.) So average running time may be a reasonable thing to measure.
来源:https://stackoverflow.com/questions/25295265/how-do-you-test-speed-of-sorting-algorithm