问题
I would like to know if and how the powers of 10 are related to the printing of scientific notation in the console. I've searched R docs and haven't found anything relevant, or that I really understand.
First off, my scipen
and digits
settings are
unlist(options("scipen", "digits"))
# scipen digits
# 0 7
Now, powers of 10 are printed normally up to the 4th power, and then printing switches to scientific notation at the 5th power.
10^(1:4)
# [1] 10 100 1000 10000
10^(1:5)
# [1] 1e+01 1e+02 1e+03 1e+04 1e+05
Interestingly, this does not happen for some other numbers larger than 10.
11^(1:5)
# [1] 11 121 1331 14641 161051
Judging from the following, 5 digits seem significant.
100^(1:2)
# [1] 100 10000
100^(1:3)
# [1] 1e+02 1e+04 1e+06
So my questions then are:
Why is scientific notation activated between the 4th and 5th power for 10 and not for other numbers? Is the number 5 significant? Furthermore, why 5 and not a number closer to the maximum digits option of 22?
回答1:
Well, the answer is actually there in the definition of scipen
in ?options
, although it's pretty hard to understand what it means without playing around with some examples:
‘scipen’: integer. A penalty to be applied when deciding to print numeric values in fixed or exponential notation. Positive values bias towards fixed and negative towards scientific notation: fixed notation will be preferred unless it is more than ‘scipen’ digits wider.
To see what that means, examine the following three pairs of exactly identical numbers. In the first two cases, the width in characters of the fixed notation that is less than or equal to the width of the scientific, so fixed notation is preferred.
In the third case, though, the fixed notation is wider (i.e. "more than 0 digits wider"), because the 5 zeros amount to more characters than the 4 characters used to represent the same value using e+nn
. As a result, in that case scientific notation is preferred.
1e+03
1000
# [1] 1000
1e+04
10000
# [1] 10000
1e+05
100000 ## <- wider
# [1] 1e+05
Next, examine some numbers that also end with lots of zeros, but whose representation in scientific notation will require use of a .
. For these numbers, scientific notation will be used once you have 6 or more zeros (i.e. more than the 5 characters taken up by one .
and the characters e+nn
).
1.1e+06
1100000
# [1] 1100000
1.1e+07
11000000 ## <- wider
# [1] 1.1e+07
Reasoning about the tradeoff gets a bit trickier for most other numbers, for which the values of both options("scipen")
and options("digits")
come into play, but the general idea is exactly the same.
To see some of the slightly surprising complications that come into play, you might want to paste the following into your console (perhaps after first trying to predict where within each series the switch to scientific notation will occur).
100001
1000001
10000001
100000001
1000000001
10000000001
100000000001
1000000000001
111111
1111111
11111111
111111111
1111111111
11111111111
111111111111
1111111111111
回答2:
I'm confused as to what exactly is your question; or, more specially, how you would use an answer to this question to somehow change/control the behavior of R. You you trying to format numbers a certain way? There are better ways to do that.
When you type values like that, the results are implicitly run though one of the print()
commands to be formatted "nicely" to the console. Whenever things have to look "nice" on screen, the code to do that is often ugly. Here most of the that code is taken care of by the formatReal function, and the helper scientific function. The latter tracks the following information for a number
/* for a number x , determine
* sgn = 1_{x < 0} {0/1}
* kpower = Exponent of 10;
* nsig = min(R_print.digits, #{significant digits of alpha})
* roundingwidens = 1 if rounding causes x to increase in width, 0 otherwise
*
* where |x| = alpha * 10^kpower and 1 <= alpha < 10
*/
Then the former function uses this information to try to make "nice" looking numbers by balancing values to the left and the right of the decimal place. It's a combination of many things like the order of magnitude of the number and the number of significant digits as well as environmental influences form the scipen
option, etc.
print()
is only meant to make things look "nice." What exactly is nice depends on all the values in a vector. You'll find few hard cutoffs in that code; it's very adaptive. There is no easy way to concisely describe everything it does in the general case (which is what it sounds like you are asking for).
The only thing that is certain is that if you need to have your numbers formatted in a certain way, use a function like sprintf()
or formatC()
that allows for precise control.
Of course this behavior is dependent on class()
and i've pointed the the formatReal
stuff since that's where most tricky things happen. But observe the difference when you use integers
c(10, 100, 1000, 10000, 100000)
# [1] 1e+01 1e+02 1e+03 1e+04 1e+05
c(10L, 100L, 1000L, 10000L, 100000L)
# [1] 10 100 1000 10000 100000
来源:https://stackoverflow.com/questions/25859609/why-do-powers-of-10-print-in-scientific-notation-at-the-5th-power