I want to sort a large array of integers (say 1 millon elements) lexicographically.
Example:
input [] = { 100, 21 , 22 , 99 , 1 , 927 }
sorted[] = { 1
I believe the following works as a sort comparison function for positive integers provided the integer type used is substantially narrower than the double
type (e.g., 32-bit int
and 64-bit double
) and the log10
routine used returns exactly correct results for exact powers of 10 (which a good implementation does):
static const double limit = .5 * (log(INT_MAX) - log(INT_MAX-1));
double lx = log10(x);
double ly = log10(y);
double fx = lx - floor(lx); // Get the mantissa of lx.
double fy = ly - floor(ly); // Get the mantissa of ly.
return fabs(fx - fy) < limit ? lx < ly : fx < fy;
It works by comparing the mantissas of the logarithms. The mantissas are the fractional parts of the logarithm, and they indicate the value of the significant digits of a number without the magnitude (e.g., the logarithms of 31, 3.1, and 310 have exactly the same mantissa).
The purpose of fabs(fx - fy) < limit
is to allow for errors in taking the logarithm, which occur both because implementations of log10
are imperfect and because the floating-point format forces some error. (The integer portions of the logarithms of 31 and 310 use different numbers of bits, so there are different numbers of bits left for the significand, so they end up being rounded to slightly different values.) As long as the integer type is substantially narrower than the double
type, the calculated limit
will be much larger than the error in log10
. Thus, the test fabs(fx - fy) < limit
essentially tells us whether two calculated mantissas would be equal if calculated exactly.
If the mantissas differ, they indicate the lexicographic order, so we return fx < fy
. If they are equal, then the integer portion of the logarithm tells us the order, so we return lx < ly
.
It is simple to test whether log10
returns correct results for every power of ten, since there are so few of them. If it does not, adjustments can be made easily: Insert if (1-fx < limit) fx = 0; if (1-fu < limit) fy = 0;
. This allows for when log10
returns something like 4.99999… when it should have returned 5.
This method has the advantage of not using loops or division (which is time-consuming on many processors).