What precisely does the %g printf specifier mean?

核能气质少年 提交于 2019-12-17 16:36:18

问题


The %g specifier doesn't seem to behave in the way that most sources document it as behaving.

According to most sources I've found, across multiple languages that use printf specifiers, the %g specifier is supposed to be equivalent to either %f or %e - whichever would produce shorter output for the provided value. For instance, at the time of writing this question, cplusplus.com says that the g specifier means:

Use the shortest representation: %e or %f

And the PHP manual says it means:

g - shorter of %e and %f.

And here's a Stack Overflow answer that claims that

%g uses the shortest representation.

And a Quora answer that claims that:

%g prints the number in the shortest of these two representations

But this behaviour isn't what I see in reality. If I compile and run this program (as C or C++ - it's a valid program with the same behaviour in both):

#include <stdio.h>

int main(void) {
    double x = 123456.0;
    printf("%e\n", x);
    printf("%f\n", x);
    printf("%g\n", x);
    printf("\n");

    double y = 1234567.0;
    printf("%e\n", y);
    printf("%f\n", y);
    printf("%g\n", y);
    return 0;
}

... then I see this output:

1.234560e+05
123456.000000
123456

1.234567e+06
1234567.000000
1.23457e+06

Clearly, the %g output doesn't quite match either the %e or %f output for either x or y above. What's more, it doesn't look like %g is minimising the output length either; y could've been formatted more succinctly if, like x, it had not been printed in scientific notation.

Are all of the sources I've quoted above lying to me?

I see identical or similar behaviour in other languages that support these format specifiers, perhaps because under the hood they call out to the printf family of C functions. For instance, I see this output in Python:

>>> print('%g' % 123456.0)
123456
>>> print('%g' % 1234567.0)
1.23457e+06

In PHP:

php > printf('%g', 123456.0);
123456
php > printf('%g', 1234567.0);
1.23457e+6

In Ruby:

irb(main):024:0* printf("%g\n", 123456.0)
123456
=> nil
irb(main):025:0> printf("%g\n", 1234567.0)
1.23457e+06
=> nil

What's the logic that governs this output?


回答1:


This is the full description of the g/G specifier in the C11 standard:

A double argument representing a floating-point number is converted in style f or e (or in style F or E in the case of a G conversion specifier), depending on the value converted and the precision. Let P equal the precision if nonzero, 6 if the precision is omitted, or 1 if the precision is zero. Then, if a conversion with style E would have an exponent of X:

     if P > X ≥ −4, the conversion is with style f (or F) and precision P − (X + 1).
     otherwise, the conversion is with style e (or E) and precision P − 1.

Finally, unless the # flag is used, any trailing zeros are removed from the fractional portion of the result and the decimal-point character is removed if there is no fractional portion remaining.

A double argument representing an infinity or NaN is converted in the style of an f or F conversion specifier.

This behaviour is somewhat similar to simply using the shortest representation out of %f and %e, but not equivalent. There are two important differences:

  • Trailing zeros (and, potentially, the decimal point) get stripped when using %g, which can cause the output of a %g specifier to not exactly match what either %f or %e would've produced.
  • The decision about whether to use %f-style or %e-style formatting is made based purely upon the size of the exponent that would be needed in %e-style notation, and does not directly depend on which representation would be shorter. There are several scenarios in which this rule results in %g selecting the longer representation, like the one shown in the question where %g uses scientific notation even though this makes the output 4 characters longer than it needs to be.

In case the C standard's wording is hard to parse, the Python documentation provides another description of the same behaviour:

General format. For a given precision p >= 1, this rounds the number to p significant digits and then formats the result in either fixed-point format or in scientific notation, depending on its magnitude.

The precise rules are as follows: suppose that the result formatted with presentation type 'e' and precision p-1 would have exponent exp. Then if -4 <= exp < p, the number is formatted with presentation type 'f' and precision p-1-exp. Otherwise, the number is formatted with presentation type 'e' and precision p-1. In both cases insignificant trailing zeros are removed from the significand, and the decimal point is also removed if there are no remaining digits following it.

Positive and negative infinity, positive and negative zero, and nans, are formatted as inf, -inf, 0, -0 and nan respectively, regardless of the precision.

A precision of 0 is treated as equivalent to a precision of 1. The default precision is 6.

The many sources on the internet that claim that %g just picks the shortest out of %e and %f are simply wrong.




回答2:


My favorite format for doubles is "%.15g". It seems to do the right thing in every case. I'm pretty sure 15 is the maximum reliable decimal precision in a double as well.



来源:https://stackoverflow.com/questions/54162152/what-precisely-does-the-g-printf-specifier-mean

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!