问题
I already have a partial answer to the problem here, which I understand as far as it is explained: How to most efficiently restructure a character string for fasttime in data.table
However, the task has been extended, and needs to deal with a variation of the orginal formatting.
I have a large dataset, with a column of dates of character class in the form of:
01 Jan 2014
or:
dd MMM yyyy
Which I want to restructure to feed into fastPOSIXct
which only accepts character input in POSIXct
order:
yyyy-mm-dd
The above linked question notes that an efficient approach would be to use regex and then supply the output to fast.time
. Here do I need to extend this to include a method to understand monthly abbreviations, convert to numeric, then rearrange? How would I do this? I know that there is a month.abb
as a built in constant. Should I be using this, or is there a smarter way?
回答1:
What about using lubridate
:
x <- "01 Jan 2014"
x
[1] "01 Jan 2014"
library(lubridate)
dmy(x)
[1] "2014-01-01 UTC"
Of course lubridate
functions accept tz
argument too. To see a complete list of acceptable arguments see OlsonNames()
Benchmark
I decided to update this answer with some empirical data using the micro benchmark
package and the lubridate
option for use fasstime.
library(micro benchmark)
microbenchmark(dmy(x), times = 10000)
Unit: milliseconds
expr min lq mean median uq max neval
dmy(x) 1.992639 2.02567 2.142212 2.041514 2.07153 39.1384 10000
options(lubridate.fasttime = T)
microbenchmark(dmy(x), times = 10000)
Unit: milliseconds
expr min lq mean median uq max neval
dmy(x) 1.993326 2.02488 2.136748 2.039467 2.065326 163.2008 10000
来源:https://stackoverflow.com/questions/30938562/how-to-most-efficiently-convert-a-character-string-of-01-jan-2014-to-posixct-i