Is there a better way to size a buffer for printing integers?

后端 未结 5 611
执笔经年
执笔经年 2021-01-16 01:41

I want to create a buffer for sprintfing a integer (in this case an unsigned int). A simple and misguided approach would be:

char b         


        
相关标签:
5条回答
  • 2021-01-16 02:20

    If the array should work on all real-world computers, then int can either be 2 or 4 bytes. No other alternatives exist (*).

    Meaning the maximum value it can hold is either 65535 or 4.29*10^9. Which in turn means that your array needs to hold either 5 or 10 digits.

    Which in turn means that the array could be declared as:

     char buf [sizeof(int)/2 * 5 + 1];
    

    which will either expand to 5+1 or 10+1, which covers all known computers in the world.

    A better and more professional solution is to use the fixed-width types from stdint.h. Then always you know in advance exactly how many digits that is needed, portably, and can therefore get rid of the above "magic numbers".


    (*) In C language standard theory, an int could be anything 2 bytes or larger. But since no such systems will ever exist in the real world, there is no point in making your code portable to them. The C language has already introduced long and long long for a reason.

    People who are concerned about portability to wildly exotic, completely fictional systems are misguided, they are mostly C language lawyers who like posing. You should not let such theoretical nonsense affect how you write professional programs for real-world computers.


    EDIT

    The "C language-lawyer poser" version would look like this:

    #include <stdio.h>
    #include <limits.h>
    
    #define STRINGIFY(s) #s
    #define GET_SIZE(n) sizeof(STRINGIFY(n))
    #define DIGITS(type) _Generic((type), unsigned int: GET_SIZE(INT_MAX) )
    
    int main(void) 
    {
      unsigned int x;
      char buf [DIGITS(x)];
    
      printf("%zu", sizeof(buf));
    
      return 0;
    }
    

    Note that this assumes that INT_MAX expands to an integer constant and not to an expression. I got really strange results from GCC when using UINT_MAX, because that macro is defined as an expression internally, inside limits.h.

    0 讨论(0)
  • 2021-01-16 02:24

    Compiling different relevant comments, most notably:

    • the math question.
    • Martin R's comment that summarizes it well: “n binary digits require ceil(n*ln(2)/ln(10)) ≈ ceil(n * 0.301)”

    You have your answer:

    #define MAX_DECIMAL_SIZE(x)  ((size_t)(CHAR_BIT * sizeof(x) * 302 / 1000) + 1)
    
    char buffer[MAX_DECIMAL_SIZE(unsigned int) + 1];
    sprintf(buffer, "%u", x);
    
    /* MAX_DECIMAL_SIZE(uint8_t) => 3
     * MAX_DECIMAL_SIZE(uint16_t) => 5
     * MAX_DECIMAL_SIZE(uint32_t) => 10
     * MAX_DECIMAL_SIZE(uint64_t) => 20
     * MAX_DECIMAL_SIZE(__uint128_t) => 39 */
    

    The 302/1000 comes from ln(2)/ln(10), rounded up. You can take more digits, from 0.3010299956639812… for more precision, but that's overkill until you work with 32768-bits systems or so. Continued fractions work too (see Martin R's comment below). Either way, be careful that CHAR_BIT * sizeof(x) * <your chosen numerator> is not too large and remember the result must be greater than the actual value.

    And if you really insist on octal representation, just change the multiplier to ln(2)/ln(8) (that's ⅓) and you'll have the number of octal digits required.

    0 讨论(0)
  • 2021-01-16 02:33

    If you're OK with dynamically allocated memory, you can use asprintf instead. This function will allocate the proper amount of memory to hold the string.

    char *buf;
    int result = asprintf(&buf, "%u", x);
    if (result == -1) {
        perror("asprintf failed");
    } else {
        ...
        free(buf);
    }
    
    0 讨论(0)
  • 2021-01-16 02:42

    The cases where something like this is needed is rare: perhaps some microcontroller code, transferring a value over some serial protocol. In such cases, using any of the printf() family of functions may increase the size of the final binary.

    (In typical C development environments, the C library is dynamically loaded, and there is absolutely no benefit in trying to avoid standard C library functions. It will not decrease the program size.)

    So, if I needed such code, I might write a header file,

    #if defined(INTTYPE) && defined (UINTTYPE) && defined (FUNCNAME)
    
    #ifndef DECIMAL_DIGITS_IN
    #define DECIMAL_DIGITS_IN(x) ((CHAR_BIT * sizeof (x) * 28) / 93 + 2)
    #endif
    
    char *FUNCNAME(const INTTYPE value)
    {
        static char buffer[DECIMAL_DIGITS_IN(value) + 1];
        char       *p = buffer + sizeof buffer;
        UINTTYPE    left = (value < 0) ? -value : value;
    
        *(--p) = '\0';
        do {
            *(--p) = '0' + (left % 10);
            left /= 10;
        } while (left > 0);
    
        if (value < 0)
            *(--p) = '-';
    
        return p;
    }
    
    #undef FUNCNAME
    #undef INTTYPE
    #undef UINTTYPE
    
    #endif
    

    and for each type I'd need, I'd use

    #define FUNCNAME int2str
    #define INTTYPE  int
    #define UINTTYPE unsigned int
    #include "above.h"
    

    In more ordinary code, the best approach is to use snprintf() to avoid buffer overruns, with the buffer size "guesstimated". For example,

    unsigned int x;
    
    char  buffer[256];
    int   len;
    
    len = snprintf(buffer, sizeof buffer, "Message with a number %u", x);
    if (len < 0 || (size_t)len >= sizeof buffer - 1) {
        /* Abort! The buffer was (almost certainly) too small! */
    } else {
        /* Success; we have the string in buffer[]. */
    }
    

    Whether buffer[] is a few dozen or even few hundred bytes larger than necessary, is completely irrelevant in typical programs. Just make it large enough, and output an error message in the error case that tells which buffer (file and function) was not long enough, so it'll be easy to fix in the unlikely case it ever is too short.


    As mentioned by dbush, asprintf() GNU extension is a viable alternative. It returns a dynamically allocated string.

    Outside of GNU systems -- and this is what I suggest OP considers, too -- one can implement their own asprintf(), using vsnprintf() (available in C99 and later C libraries, and also in POSIX.1 C libraries).

    I prefer the variant that acts like POSIX.1 getline(), i.e. takes pointers to a pointer to a dynamically allocated buffer and the size of that buffer as extra parameters, and resizes that buffer if necessary:

    #include <stdlib.h>
    #include <stdarg.h>
    #include <string.h>
    #include <stdio.h>
    #include <errno.h>
    
    size_t dynamic_printf(char **dataptr, size_t *sizeptr, const char *format, ...)
    {
        va_arg  args;
        char   *data;
        size_t  size;
        int     len;
    
        if (!dataptr || !sizeptr || !format) {
            errno = EINVAL;
            return 0;
        }
        if (!*sizeptr) {
            *dataptr = NULL;
            *sizeptr = 0;
        }
        data = *dataptr;
        size = *sizeptr;
    
        va_start(args, format);
        len = vsnprintf(data, size, format, args);
        va_end(args);
    
        if (len < 0) {
            errno = EINVAL;
            return 0;
        } else
        if ((size_t)len < size) {
            errno = 0;
            return (size_t)len;
        }
    
        /* Need to reallocate the buffer. */
        size = (size_t)len + 1;
        data = realloc(data, size);
        if (!data) {
            errno = ENOMEM;
            return 0;
        }
        *dataptr = data;
        *sizeptr = size;
    
        va_start(args, format);
        len = vsnprintf(data, size, format, args);
        va_end(args);
    
        if (len != (int)(size - 1)) {
            errno = EINVAL;
            return 0;
        }
    
        errno = 0;
        return (size_t)len;
    }
    

    The idea is that you can reuse the same dynamic buffer across several dynamic_printf() calls:

        char   *data = NULL;
        size_t  size = 0;
        size_t  len;
    
        /* Some kind of loop for example */
    
            len = dynamic_printf(&data, &size, "This is something I need in a buffer");
            if (errno) {
                /* Abort! Reason is strerror(errno) */
            } else {
                /* data is non-NULL, and has len chars in it. */
            }
    
        /* Strings are no longer used, so free the buffer */
        free(data);
        data = NULL;
        size = 0;
    

    Note that it is perfectly safe to run free(data); data = NULL; size = 0; between calls. free(NULL) does nothing, and if the buffer pointer is NULL and size zero, the function will just dynamically allocate a new buffer.

    In the worst case (when the buffer is not long enough), the function does "print" the string twice. This is perfectly acceptable, in my opinion.

    0 讨论(0)
  • 2021-01-16 02:44

    OP's solution minimally meets design goals.

    Is there a better way to size a buffer for printing integers?

    Even a short analysis indicates the the number of bits needed with a unsigned grows by a factor of log10(2) or about 0.30103.... for each value bit when printing decimal and by 1/3 for printing octal. OP's code uses a factor of one-third or 0.33333...

    unsigned x;
    char buf[(CHAR_BIT*sizeof(unsigned)+5)/3];
    sprintf(buf, "%u", x);
    

    Considerations:

    1. If buffer tightness concerns are real, then a buffer for decimal printing deserves a separate consideration than printing in octal.

    2. Correctness: Unless code uses a strange locale with sprintf(), the conversion of the widest unsigned, which is UINT_MAX works for all platforms.

    3. Clarity: the ...5)/3 is unadorned and does not indicate the rational for 5 and 3.

    4. Efficiency. The buffer size is modestly excessive. This would not be an issues for a single buffer, but for an array of buffers a tighter value is recommended.

    5. Generality: macro is crafted to only one type.

    6. Potential hazard: With code re-use, a code extrapolation could use the same 5 and 3 for int without due consideration. OP's 5/3 works for int too, so this is not an issue.

    7. Corner case: Using 5/3 for signed types and octal is a problem as (CHAR_BIT*sizeof(unsigned)+5)/3 should be (CHAR_BIT*sizeof(unsigned) + 5)/3 + 1. Example: problem occurs when trying to convert an int -32768 to base 8 text: "-100000" via some function (not sprintf(... "%o" ...)). That buffer needed is 8 where as CHAR_BIT*sizeof(unsigned)+5)/3 could be 7.


    Is there a better way to do this?

    Candidate for base 10:

    28/93 (0.301075...) is a very close, and greater, approximation of log10(2). Of course code could use a more obvious fraction like 30103/100000.

    Generality: A good macro would also adapt to other types. Below is one for various unsigned types.

    #define LOG10_2_N 28
    #define LOG10_2_D 93
    //                              1 for the ceiling                          1 for \0
    #define UINT_BUFFER10_SIZE(type) (1 + (CHAR_BIT*sizeof(type)*LOG10_2_N)/LOG10_2_D + 1)
    
    
    unsigned x;
    char bufx[UINT_BUFFER10_SIZE(x)];
    sprintf(bufx, "%u", x);
    
    size_t z;
    char bufz[UINT_BUFFER10_SIZE(z)];
    sprintf(bufz, "%zu", z);
    

    The 28/93 fraction give the same answer integer results as log10(2) for integer sizes 1 to 92 bits and so is space efficient for arrays of buffers. It is never too small.

    A macro for signed type could use

    #define INT_BUFFER_SIZE(type) (1+1+ (CHAR_BIT*sizeof(type)-1)*LOG10_2_N)/LOG10_2_D + 1)
    

    Avoid an off-by-one issue: I recommend using SIZE in the macro name to convey the buffer size needed and not the maximum string length.

    Candidate for base 8:

    Once a computed size for non-base 10 is needed, applications I've made usually need a a buffer to handle any base 2 and up. Consider printf() may allow %b someday too. So for a general purpose buffer to handle integer to text, any base, any sign-ness suggest:

    #define INT_STRING_SIZE(x)  (1 /* sign */ + CHAR_BIT*sizeof(x) + 1 /* \0 */)
    
    int x = INT_MIN;
    char buf[INT_STRING_SIZE(x)];
    my_itoa(buf, sizeof buf, x, 2);
    puts(buf); --> "-10000000000000000000000000000000"  (34 char were needed)
    
    0 讨论(0)
提交回复
热议问题