R issue with rounding milliseconds

生来就可爱ヽ(ⅴ<●) 提交于 2019-11-29 07:13:21

I don't see that:

> options(digits.secs = 4)
> as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.061 UTC"
> as.POSIXlt("13:29:56.062", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.062 UTC"
> as.POSIXlt("13:29:56.063", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.063 UTC"
> options(digits.secs = 3)
> as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.061 UTC"
> as.POSIXlt("13:29:56.062", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.062 UTC"
> as.POSIXlt("13:29:56.063", format = '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.063 UTC"

with

> sessionInfo()
R version 2.15.0 Patched (2012-04-14 r59019)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8    
 [5] LC_MONETARY=en_GB.utf8    LC_MESSAGES=en_GB.utf8   
 [7] LC_PAPER=C                LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base

With the "%OSn" format strings, one forces truncation. If the fractional second cannot be represented exactly in floating points then the truncation may very well go the wrong way. If you see things going to wrong way you can also round explicitly to the unit you want or add a half of the fraction you wish to operate at (in the case shown 0.0005):

> t1 <- as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC')
> t1
[1] "2012-06-07 13:29:56.061 UTC"
> t1 + 0.0005
[1] "2012-06-07 13:29:56.061 UTC"

(but a I said, I don't see the problem here.)

This latter point was made by Simon Urbanek on the R-Devel mailing list on 30-May-2012.

Joshua Ulrich

This is the same problem as Milliseconds puzzle when calling strptime in R.

Your example:

> x <- as.POSIXlt("13:29:56.061", format='%H:%M:%OS', tz='UTC')
> print(as.numeric(x), digits=20)
[1] 1339075796.0610001087

is not representative of the problem. as.numeric(x) converts your POSIXlt object to POSIXct before converting to numeric, so you get different floating-point-precision rounding errors.

That's not how print.POSIXlt (which calls format.POSIXlt) works. format.POSIXlt formats each element of the POSIXlt list construct individually, so you would need to look at:

print(x$sec, digits=20)
[1] 56.060999999999999943

And that number is truncated at the third decimal place, so you see 56.060. You can see this by calling format directly:

> format(x, "%H:%M:%OS6")
[1] "13:29:56.060999"

In testing I have noted that this issue still exists for 32bit R 3.01 and that this is due to a truncation of floating point data that is specific to the 32bit implementation of print, format and as.character operators for POSIXlt date times.

The underlying data hasn't been stored in a different type that is leading to the truncation in one case (32bit) and not the other (64bit), but the "print", "format" and "as.character" functions for the POSIXlt type specifically which are used to display the POSIXlt data as a displayable string.

Whilst the documented behaviour is that these functions truncate (ignore) extra digits (as mentioned by @Gavin Simpson), this is not true in the same way for 32 and 64 bit versions. To demonstrate; we'll generate 1000 different times and perform some comparisons operations:

> options(digits.sec=3)
> x = as.POSIXlt("13:29:56.061", format='%H:%M:%OS', tz='UTC')

> for (i in 0:999) {
>     x[i+1] = as.POSIXlt(paste0("13:29:56.",sprintf("%03d",i)),format='%H:%M:%OS',tz='UTC')
> }

> sum(x[2:1000]>x[1:999])
[1] 999

Under both 32 bit and 64 bit the comparison operators are consistent, however under 32 bit I see:

> x[1:6]
[1] "2015-10-16 13:29:56.000 UTC" "2015-10-16 13:29:56.000 UTC"
[3] "2015-10-16 13:29:56.002 UTC" "2015-10-16 13:29:56.003 UTC"
[5] "2015-10-16 13:29:56.003 UTC" "2015-10-16 13:29:56.005 UTC"

So it is clearly a display issue. Looking at the actual numbers in the POSIXlt datatype, particularly the seconds we can see what appears to happen:

> y = (x[1:6]$sec) 
> y
[1] 56.000 56.001 56.002 56.003 56.004 56.005
> trunc(y*1000)/1000
[1] 56.000 56.001 56.002 56.003 56.004 56.005
> trunc((y-floor(y))*1000)/1000
[1] 0.000 0.000 0.002 0.003 0.003 0.005

I would suggest that this is a bug that should be fixed in the underlying base library, as a temporary fix though, you can overwrite the "print", "as.character" and "format" functions to change the output to your desired output e.g.

format.POSIXlt = function(posix) {
    return(paste0(posix$year+1900,"-",sprintf("%02d",posix$mon+1),"-",sprintf("%02d",posix$mday)," ",
        sprintf("%02d",posix$hour),":",sprintf("%02d",posix$min),":",sprintf("%002.003f",posix$sec)))
    }

print.POSIXlt = function(posix) {
    print(paste0(posix$year+1900,"-",sprintf("%02d",posix$mon+1),"-",sprintf("%02d",posix$mday)," ",
        sprintf("%02d",posix$hour),":",sprintf("%02d",posix$min),":",sprintf("%002.003f",posix$sec)))
    }

as.character.POSIXlt = function(posix) {
    return(paste0(posix$year+1900,"-",sprintf("%02d",posix$mon+1),"-",sprintf("%02d",posix$mday)," ",
        sprintf("%02d",posix$hour),":",sprintf("%02d",posix$min),":",sprintf("%002.003f",posix$sec)))
    }

The milliseconds are there:

 unclass(as.POSIXlt("13:29:56.061", '%H:%M:%OS', tz='UTC'))
 $sec
 [1] 56.061
 ...

(There's no need to call format here, it's the name of an argument not the required input from some other function).

Otherwise, I cannot reproduce (on Windows 64-bit R 2.15.0):

options(digits.secs = 3)
as.POSIXlt("13:29:56.061", '%H:%M:%OS', tz='UTC')
[1] "2012-06-07 13:29:56.061 UTC"

sessionInfo()
R version 2.15.0 Patched (2012-05-05 r59321)
Platform: x86_64-pc-mingw32/x64 (64-bit)
...
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!