ANSI-C: maximum number of characters printing a decimal int

前端 未结 8 1709
醉梦人生
醉梦人生 2021-02-19 10:17

I\'d like to know if it is an easy way of determining the maximum number of characters to print a decimal int.

I know contains

8条回答
  •  北恋
    北恋 (楼主)
    2021-02-19 10:26

    The simplest canonical and arguably most portable way is to ask snprintf() how much space would be required:

    char sbuf[2];
    int ndigits;
    
    ndigits = snprintf(sbuf, (size_t) 1, "%lld", (long long) INT_MIN);
    

    slightly less portable perhaps using intmax_t and %j:

    ndigits = snprintf(sbuf, (size_t) 1, "%j", (intmax_t) INT_MIN);
    

    One could consider that to be too expensive to do at runtime though, but it can work for any value, not just the MIN/MAX values of any integer type.

    You could of course also just directly calculate the number of digits that a given integer would require to be expressed in Base 10 notation with a simple recursive function:

    unsigned int
    numCharsB10(intmax_t n)
    {
            if (n < 0)
                    return numCharsB10((n == INTMAX_MIN) ? INTMAX_MAX : -n) + 1;
            if (n < 10)
                    return 1;
    
            return 1 + numCharsB10(n / 10);
    }
    

    but that of course also requires CPU at runtime, even when inlined, though perhaps a little less than snprintf() does.

    @R.'s answer above though is more or less wrong, but on the right track. Here's the correct derivation of some very well and widely tested and highly portable macros that implement the calculation at compile time using sizeof(), using a slight correction of @R.'s initial wording to start out:

    First we can easily see (or show) that sizeof(int) is the log base 2 of UINT_MAX divided by the number of bits represented by one unit of sizeof() (8, aka CHAR_BIT):

    sizeof(int) == log2(UINT_MAX) / 8

    because UINT_MAX is of course just 2 ^ (sizeof(int) * 8)) and log2(x) is the inverse of 2^x.

    We can use the identity "logb(x) = log(x) / log(b)" (where log() is the natural logarithm) to find logarithms of other bases. For example, you could compute the "log base 2" of "x" using:

    log2(x) = log(x) / log(2)

    and also:

    log10(x) = log(x) / log(10)

    So, we can deduce that:

    log10(v) = log2(v) / log2(10)

    Now what we want in the end is the log base 10 of UINT_MAX, so since log2(10) is approximately 3, and since we know from above what log2() is in terms of sizeof(), we can say that log10(UINT_MAX) is approximately:

    log10(2^(sizeof(int)*8)) ~= (sizeof(int) * 8) / 3

    That's not perfect though, especially since what we really want is the ceiling value, but with some minor adjustment to account for the integer rounding of log2(10) to 3, we can get what we need by first adding one to the log2 term, then subtracting 1 from the result for any larger-sized integer, resulting in this "good-enough" expression:

    #if 0
    #define __MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t) \
        ((((sizeof(t) * CHAR_BIT) + 1) / 3) - ((sizeof(t) > 2) ? 1 : 0))
    #endif
    

    Even better we can multiply our first log2() term by 1/log2(10) (multiplying by the reciprocal of the divisor is the same as dividing by the divisor), and doing so makes it possible to find a better integer approximation. I most recently (re?)encountered this suggestion while reading Sean Anderson's bithacks: http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10

    To do this with integer math to the best approximation possible, we need to find the ideal ratio representing our reciprocal. This can be found by searching for the smallest fractional part of multiplying our desired value of 1/log2(10) by successive powers of 2, within some reasonable range of powers of 2, such as with the following little AWK script:

        awk 'BEGIN {
                minf=1.0
        }
        END {
                for (i = 1; i <= 31; i++) {
                        a = 1.0 / (log(10) / log(2)) * 2^i
                        if (a > (2^32 / 32))
                                break;
                        n = int(a)
                        f = a - (n * 1.0)
                        if (f < minf) {
                                minf = f
                                minn = n
                                bits = i
                        }
                        # printf("a=%f, n=%d, f=%f, i=%d\n", a, n, f, i)
                }
                printf("%d + %f / %d, bits=%d\n", minn, minf, 2^bits, bits)
        }' < /dev/null
    
        1233 + 0.018862 / 4096, bits=12
    

    So we can get a good integer approximation of multiplying our log2(v) value by 1/log2(10) by multiplying it by 1233 followed by a right-shift of 12 (2^12 is 4096 of course):

    log10(UINT_MAX) ~= ((sizeof(int) * 8) + 1) * 1233 >> 12

    and, together with adding one to do the equivalent of finding the ceiling value, that gets rid of the need to fiddle with odd values:

    #define __MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t) \
        (((((sizeof(t) * CHAR_BIT)) * 1233) >> 12) + 1)
    
    /*
     * for signed types we need room for the sign, except for int64_t
     */
    #define __MAX_B10STRLEN_FOR_SIGNED_TYPE(t) \
        (__MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t) + ((sizeof(t) == 8) ? 0 : 1))
    
    /*
     * NOTE: this gives a warning (for unsigned types of int and larger) saying
     * "comparison of unsigned expression < 0 is always false", and of course it
     * is, but that's what we want to know (if indeed type 't' is unsigned)!
     */
    #define __MAX_B10STRLEN_FOR_INT_TYPE(t)                     \
        (((t) -1 < 0) ? __MAX_B10STRLEN_FOR_SIGNED_TYPE(t)      \
                      : __MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t))
    

    whereas normally the compiler will evaluate at compile time the expression my __MAX_B10STRLEN_FOR_INT_TYPE() macro becomes. Of course my macro always calculates the maximum space required by a given type of integer, not the exact space required by a particular integer value.

提交回复
热议问题