sort array of integers lexicographically C++

前端 未结 12 1518
野的像风
野的像风 2021-02-02 13:53

I want to sort a large array of integers (say 1 millon elements) lexicographically.

Example:

input [] = { 100, 21 , 22 , 99 , 1  , 927 }
sorted[] = { 1           


        
12条回答
  •  名媛妹妹
    2021-02-02 14:19

    I believe the following works as a sort comparison function for positive integers provided the integer type used is substantially narrower than the double type (e.g., 32-bit int and 64-bit double) and the log10 routine used returns exactly correct results for exact powers of 10 (which a good implementation does):

    static const double limit = .5 * (log(INT_MAX) - log(INT_MAX-1));
    
    double lx = log10(x);
    double ly = log10(y);
    double fx = lx - floor(lx);  // Get the mantissa of lx.
    double fy = ly - floor(ly);  // Get the mantissa of ly.
    return fabs(fx - fy) < limit ? lx < ly : fx < fy;
    

    It works by comparing the mantissas of the logarithms. The mantissas are the fractional parts of the logarithm, and they indicate the value of the significant digits of a number without the magnitude (e.g., the logarithms of 31, 3.1, and 310 have exactly the same mantissa).

    The purpose of fabs(fx - fy) < limit is to allow for errors in taking the logarithm, which occur both because implementations of log10 are imperfect and because the floating-point format forces some error. (The integer portions of the logarithms of 31 and 310 use different numbers of bits, so there are different numbers of bits left for the significand, so they end up being rounded to slightly different values.) As long as the integer type is substantially narrower than the double type, the calculated limit will be much larger than the error in log10. Thus, the test fabs(fx - fy) < limit essentially tells us whether two calculated mantissas would be equal if calculated exactly.

    If the mantissas differ, they indicate the lexicographic order, so we return fx < fy. If they are equal, then the integer portion of the logarithm tells us the order, so we return lx < ly.

    It is simple to test whether log10 returns correct results for every power of ten, since there are so few of them. If it does not, adjustments can be made easily: Insert if (1-fx < limit) fx = 0; if (1-fu < limit) fy = 0;. This allows for when log10 returns something like 4.99999… when it should have returned 5.

    This method has the advantage of not using loops or division (which is time-consuming on many processors).

提交回复
热议问题