Using strptime %z with special timezone format

一笑奈何 提交于 2019-12-06 23:31:47

问题


I am working with .csv data that was exported from Teradata. Several columns were originally timestamps with timezones, so after loading the .csv in R I'd like to convert these columns (which are loaded as strings) to POSIXlt or POSIXct. I am using strptime, but the format of the timezone from the .csv file does not match what strptime is expecting. For example, it expects -0400 but the .csv has the format -04:00 where a colon separates the hours and minutes.

I can remove the colon, but this is an extra step and complication I'd like to avoid if possible. Is there a way to tell strptime to use a different format for the timezone (%z)?

Here is an example:

## Example data:
x <- c("2011-10-12 22:17:13.860746-04:00", "2011-10-12 22:17:13.860746+00:00")
format <- "%Y-%m-%d %H:%M:%OS%z"

## Doesn't work:
strptime(x,format)
## [1] NA NA

## Ignores the timezone:
as.POSIXct(x)
## [1] "2011-10-12 22:17:13 EDT" "2011-10-12 22:17:13 EDT"

## Remove the last colon:
x2 <- gsub("(.*):", "\\1", x)
x2
## [1] "2011-10-12 22:17:13.860746-0400" "2011-10-12 22:17:13.860746+0000"

## This works, but requires extra processing (removing the colon)
strptime(x2,format)
## [1] "2011-10-12 22:17:13" "2011-10-12 18:17:13"

So I'm looking to achieve this last result using something like strptime(x,"%Y-%m-%d %H:%M:%OS%zz"), where %zz is a custom expression for the timezone that recognizes the -04:00 format. Or %zH:%zM might be even better.

If this isn't possible, does anyone have a slick/flexible function for converting strings (of various formats) to dates for multiple columns of a data.frame/data.table?


回答1:


It turns out lubridate can handle this format:

library(lubridate)
ymd_hms(x)
## [1] "2011-10-13 02:17:13 UTC" "2011-10-12 22:17:13 UTC"

Or, to display in the local timezone:

with_tz(ymd_hms(x))
## [1] "2011-10-12 22:17:13 EDT" "2011-10-12 18:17:13 EDT"

For more flexibility (still using lubridate):

parse_date_time(x, "%Y-%m-%d %H:%M:%OS%z")

For faster speed (amongst lubridate options):

lubridate:::.strptime(x, "%Y-%m-%d %H:%M:%OS%OO")

Timings:

microbenchmark(
  ymd_hms(x),
  parse_date_time(x, "%Y-%m-%d %H:%M:%OS%z"),
  lubridate:::.strptime(x, "%Y-%m-%d %H:%M:%OS%OO"),
  strptime(gsub("(.*):", "\\1", x), format)
)

## Unit: microseconds
##                                               expr      min       lq       mean    median        uq      max neval
##                                         ymd_hms(x) 1523.819 1578.495 1715.14577 1629.5385 1744.3695 2850.393   100
##         parse_date_time(x, "%Y-%m-%d %H:%M:%OS%z") 1108.676 1150.633 1273.77301 1190.3315 1264.8050 5947.204   100
##  lubridate:::.strptime(x, "%Y-%m-%d %H:%M:%OS%OO")   89.838  103.390  112.45338  107.8425  115.2265  216.512   100
##        strptime(gsub("(.*):", "\\\\1", x), format)   46.716   58.294   71.90934   69.9415   86.5860  105.044   100



回答2:


I've just come across this question trying to achieve the same thing.

The only thing I've found to fix it is to use regex to remove the colon, as you have mentioned. You can tighten the regex a little to avoid making mistakes in the replacement.

x2 <- gsub('^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\\.[0-9]+[+-][0-9]{2}):([0-9]{2})$',
           '\\1\\2',
           x)
# [1] "2011-10-12 22:17:13.860746-0400" "2011-10-12 22:17:13.860746+0000"


来源:https://stackoverflow.com/questions/15032184/using-strptime-z-with-special-timezone-format

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!