Accurately converting from character->POSIXct->character with sub millisecond datetimes

后端 未结 4 433
独厮守ぢ
独厮守ぢ 2020-12-03 08:51

I have a character datetime column in a file. I load the file (into a data.table) and do things that require the column to be converted to POSIXct.

相关标签:
4条回答
  • 2020-12-03 09:29

    As the answers to the questions you linked to already say, how a value is printed/formatted is not the same as what the actual value is. This is just a printed representation issue.

    R> as.POSIXct('2011-10-11 07:49:36.3')-as.POSIXlt('2011-10-11 07:49:36.3')
    Time difference of 0 secs
    R> as.POSIXct('2011-10-11 07:49:36.2')-as.POSIXlt('2011-10-11 07:49:36.3')
    Time difference of -0.0999999 secs
    

    Your understanding that POSIXct is less precise than POSIXlt is incorrect. You're also incorrect in saying that you can't include a POSIXlt object as a column in a data.frame.

    R> x <- data.frame(date=Sys.time())
    R> x$date <- as.POSIXlt(x$date)
    R> str(x)
    'data.frame':   1 obs. of  1 variable:
     $ date: POSIXlt, format: "2013-03-13 07:38:48"
    
    0 讨论(0)
  • 2020-12-03 09:31

    Two things:

    1) @statquant is right (and the otherwise known experts @Joshua Ulrich and @Dirk Eddelbuettel are wrong), and @Aaron in his comment, but that will not be important for the main question here:

    POSIXlt by design is definitely more accurate in storing times than POSIXct: As its seconds are always in [0, 60), it has a granularity of about 6e-15, i.e., 6 femtoseconds which would be dozens of million times less granular than POSIXct.

    However, this is not very relevant here (and for current R): Almost all operations, notably numeric ones, use the Ops group method (yes, not known to beginners, but well documented), just look at Ops.POSIXt which indeed trashes the extra precision by first coercing to POSIXct. In addition, the format()/print() ing uses 6 decimals after the "." at most, and hence also does not distinguish between the internally higher precision of POSIXlt and the "only" 100 nanosecond granularity of POSIXct.
    (For the above reason, both Dirk and Joshua were lead to their wrong assertion: For all simple practical uses, the precision of *lt and *ct is made the same).

    2) I do tend to agree that we (R Core) should improve the format()ing and hence print()ing of such fractions of seconds POSIXt objects (still after the bug fix mentioned by @Aaron above).
    But then I may be wrong, and "we" have got it right, by some definition of "right" ;-)

    0 讨论(0)
  • 2020-12-03 09:39

    When you write

    My understanding is that POSIXct representation is less precise than the POSIXlt representation

    you are plain wrong.

    It is the same representation for both -- down to milliseconds on Windows, and down to (almost) microseconds on the other OSs. Did you read help(DateTimeClasses) ?

    As for your last question, yes the development version of my RcppBDT package uses Boost Date.Time and can go all the way to nanoseconds if your OS supports it and you turned the proper representation on. But it does replace POSIXct, and does not yet support vectors of time objects.

    Edit: Regarding your follow-up question:

    R> one <- Sys.time(); two <- Sys.time(); two - one
    Time difference of 7.43866e-05 secs
    R>
    R> as.POSIXlt(two) - as.POSIXlt(one)
    Time difference of 7.43866e-05 secs
    R> 
    R> one    # options("digits.sec"=6) on my box
    [1] "2013-03-13 07:30:57.757937 CDT"
    R> 
    

    Edit 2: I think you are simply experiencing that floating point representation on computers is inexact:

    R> print(as.numeric(as.POSIXct("04-Jan-2013 17:22:08.138",
    +                   format="%d-%b-%Y %H:%M:%OS")), digits=18)
    [1] 1357341728.13800001
    R> print(as.numeric(as.POSIXct("04-Jan-2013 17:22:08.139",
    +                   format="%d-%b-%Y %H:%M:%OS")), digits=18)
    [1] 1357341728.13899994
    R> 
    

    The difference is not precisely 1/1000 as you assumed.

    0 讨论(0)
  • 2020-12-03 09:47

    So I guess you do need a little fudge factor added to my suggestion here: https://stackoverflow.com/a/7730759/210673. This seems to work but perhaps might include other bugs; test carefully and think about what it's doing before using for anything important.

    myformat.POSIXct <- function(x, digits=0) {
      x2 <- round(unclass(x), digits)
      attributes(x2) <- attributes(x)
      x <- as.POSIXlt(x2)
      x$sec <- round(x$sec, digits) + 10^(-digits-1)
      format.POSIXlt(x, paste("%Y-%m-%d %H:%M:%OS",digits,sep=""))
    }
    
    0 讨论(0)
提交回复
热议问题