Fastest code C/C++ to select the median in a set of 27 floating point values

后端 未结 15 1032
梦谈多话
梦谈多话 2020-12-07 09:24

This is the well know select algorithm. see http://en.wikipedia.org/wiki/Selection_algorithm.

I need it to find the median value of a set of 3x3x3 voxel values. Sinc

相关标签:
15条回答
  • 2020-12-07 09:43

    The most likely algorithm to use in your first attempt is just nth_element; it pretty much gives you what you want directly. Just ask for the 14th element.

    On your second attempt, the goal is to take advantage of the fixed data size. You do not wnat to allocate any memory at all duing your algorithm. So, copy your voxel values to a pre-allocated array of 27 elements. Pick a pivot, and copy it to the middle of a 53 element array. Copy the remaining values to either side of the pivot. Here you keep two pointers (float* left = base+25, *right=base+27). There are now three possibilities: the left side is larger, the right side is larger, or the both have 12 elements. The last case is trivial; your pivot is the median. Otherwise, call nth_element on either the left side or the right side. The exact value of Nth depends on how many values were larger or smaller than the pivot. For instance, if the division is 12/14, you need the smallest element bigger than the pivot, so Nth=0, and if the division was 14/12, you need the biggest element smaller the pivot, so Nth=13. The worst cases are 26/0 and 0/26, when your pivot was an extreme, but those happen only in 2/27th of all cases.

    The third improvement (or the first, if you have to use C and do not have nth_element) replaces nth_element entirely. You still have the 53 element array, but this time you fill it directly from the voxel values (saving you an interim copy into a float[27]). The pivot in this first iteration is just voxel[0][0][0]. For subsequent iterations, you use a second pre-allocated float[53] (easier if both are the same size) and copy floats between the two. The basic iteration step here is still: copy the pivot to the middle, sort the rest to the left and the right. At the end of each step, you'll know whether the median is smaller or larger than the current pivot, so you can discard the floats bigger or smaller than that pivot. Per iteration, this eliminates between 1 and 12 elements, with an average of 25% of the remaining.

    The final iteration, if you still need more speed, is based on the observation that most of your voxels overlap significantly. You pre-calculate for every 3x3x1 slice the median value. Then, when you need an initial pivot for your 3x3x3 voxel cube, you take the median of the the three. You know a priori that there are 9 voxels smaller and 9 voxels larger than that median of medians (4+4+1). So, after the first pivotting step, the worst cases are a 9/17 and a 17/9 split. So, you'd only need to find the 4th or 13th element in a float[17], instead of the 12th or 14th in a float[26].


    Background: The idea of copying first a pivot and then the rest of a float[N] to a float[2N-1], using left and right pointers is that you fill a float[N] subarray around the pivot, with all elements smaller than the pivot to the left (lower index) and higher to the right (higher index). Now, if you want the Mth element, you might find yourself lucky and have M-1 elements smaller than the pivot, in which case the pivot is the element you need. If there are more than (M-1) elements smaller than the pivot, the Mth element is amongst them, so you can discard the pivot and anything bigger than the pivot, and seacrh for the Mth element in all the lower values. If there are less than (M-1) elements smaller than the pivot, you're looking for a value higher than the pivot. So, you'll discard the pivot and anything smaller than it. Let the number of elements less than the pivot, i.e. to the left of the pivot be L. In the next iteration, you want the (M-L-1)th element of the (N-L-1)floats that are bigger than the pivot.

    This kind of nth_element algorithm is fairly efficient because most of the work is spent copying floats between two small arrays, both of which will be in cache, and because your state is most of the time represented by 3 pointers (source pointer, left destination pointer, right destination pointer).

    To show the basic code:

    float in[27], out[53];
    float pivot = out[26] = in[0];     // pivot
    float* left = out+25, right = out+27
    for(int i = 1; i != 27; ++1)
    if((in[i]<pivot)) *left-- = in[i] else *right++ = in[i];
    // Post-condition: The range (left+1, right) is initialized.
    // There are 25-(left-out) floats <pivot and (right-out)-27 floats >pivot
    
    0 讨论(0)
  • 2020-12-07 09:43

    I suppose your best bet is to take an existing sorting algorithm and try to figure out whether you can adapt it so that the set does not need to be fully sorted. For determining the median, you need at most half the values sorted, either the lower or higher half would be enough:

    original:              | 5 | 1 | 9 | 3 | 3 |
    sorted:                | 1 | 3 | 3 | 5 | 9 |
    lower half sorted:     | 1 | 3 | 3 | 9 | 5 |
    higher half sorted:    | 3 | 1 | 3 | 5 | 9 |
    

    The other half would be a bucket of unsorted values that merely share the property of being larger/smaller or equal to the largest/smallest sorted value.

    But I have no ready algorithm for that, it's just an idea of how you might take a short-cut in your sorting.

    0 讨论(0)
  • 2020-12-07 09:45

    If there are 3x3x3=27 possible values (if so why the floats?), can you create an array of 27 elements and count each possibility in one pass through the data?

    0 讨论(0)
  • 2020-12-07 09:50

    The question cannot easily be answered for the simple reason that the performance of one algorithm relative to another depends as much the on compiler / processor / data structure combination as on the algorithm itself, as you surely know

    Therefore your approach to try a couple of them seems good enough. And yes, quicksort should be pretty fast. If you haven't done so, you might want to try insertionsort which often performs better on small data sets. This said, just settle on a sorting algo that does the job fast enough. You will typically not get 10-times faster just be picking the "right" algo.

    To get substantial speed-ups, the better way frequently is to use more structure. Some ideas that worked for me in the past with large-scale problems:

    • Can you efficiently pre-calculate while creating the voxels and store 28 instead of 27 floats?

    • Is an approximate solution good enough? If so, just look at the median of, say 9 values, since "in general it can be expected that values are relatively close." Or you can replace it with the average as long as the values are relatively close.

    • Do you really need the median for all billions of voxels? Maybe you have an easy test whether you need the median, and can then only calculate for the relevant sub-set.

    • If nothing else helps: look at the asm code that the compiler generates. You might be able write asm code that is substantially faster (e.g. by doing all the calcs using registers).

    Edit: For what it's worth, I have attached the (partial) insertionsort code mentioned in the comment below (totally untested). If numbers[] is an array of size N, and you want the smallest P floats sorted at the beginning of the array, call partial_insertionsort<N, P, float>(numbers);. Hence if you call partial_insertionsort<27, 13, float>(numbers);, numbers[13] will contain the median. To gain additional speed, you would have to unfold the while loop, too. As discussed above, to get really fast, you have to use your knowledge about the data (e.g. is the data already partially sorted? Do you know properties of the distribution of the data? I guess, you get the drift).

    template <long i> class Tag{};
    
    template<long i, long N, long P, typename T>
    inline void partial_insertionsort_for(T a[], Tag<N>, Tag<i>)
    {   long j = i <= P+1 ? i : P+1;  // partial sort
        T temp = a[i];
        a[i] = a[j];       // compiler should optimize this away where possible
        while(temp < a[j - 1] && j > 0)
        { a[j] = a[j - 1];
          j--;}
        a[j] = temp;
        partial_insertionsort_for<i+1,N,P,T>(a,Tag<N>(),Tag<i+1>());}
    
    template<long i, long N, long P, typename T>
    inline void partial_insertionsort_for(T a[], Tag<N>, Tag<N>){}
    
    template <long N, long P, typename T>
    inline void partial_insertionsort(T a[])
     {partial_insertionsort_for<0,N,P,T>(a, Tag<N>(), Tag<0>());}
    
    0 讨论(0)
  • 2020-12-07 09:52

    When having let's say a million different values from which you need the median. Is it possible to base your median on a subset of those million, let's say 10%. So that the median is close to the n-th element which divides the values in 2 equal (or almost equal) subsets? Therefor, for finding the median you'll need less than O(n)-times (in this case O(1/10n) and hereby come closer to optimal sorting with quicksort in O(nlogn)?

    0 讨论(0)
  • 2020-12-07 09:52

    If you want to see algorithms look up the books by Donald E. Knuth.

    PS. If you think you have invented something better, then you should be able to show that the complexity is similar or better to the complexity of the known algorithms. Which for variations based on bucket and radix is O(n) on the other hand quick-sort is only O(n.log(n)). A method that is 20% faster is still O(n.log(n)) until you can show the algorithm :-)

    0 讨论(0)
提交回复
热议问题