dplyr - mutate_each - colswise coercion to POSIXlt fails

戏子无情 提交于 2020-01-02 07:16:07

问题


I recently came across dplyr and - as a newbie - like it very much. Hence, I try to convert some of my base-R code into dplyr-code.

Working with air traffic control data, I am struggling with coercing timestamps using lubridate and as.POSIXlt to parse timestamps embedded in a mutate_each() call. I need the POSIXlt format as I have to work with local times (at different locations) later on. Reading in the data delivers a data frame of characters. The following is a simplistic example:

ICAO_ADEP <- c("DGAA","ZSPD","UAAA","RJTT","KJFK","WSSS")
MVT_TIME_UTC <- c("01-Jan-2013 04:02:24", NA,"01-Jan-2013 04:08:18", NA,"01-Jan-2013 04:17:11","01-Jan-2013 04:21:52")
flights <- data.frame(ICAO_ADEP, MVT_TIME_UTC)

The function I wrote reads as follows:

make_POSIXlt <- function(vec, tz="UTC"){
vec <- parse_date_time(vec, orders="dmy_hms", tz=tz)
vec <- as.POSIXlt(vec, tz=tz)
}

The code works fine when executed with a single column:

flights$MVT_TIME_UTC <- make_POSIXlt(flights$MVT_TIME_UTC)

If I run the following dplyr code the function fails:

flights$BLOCK_TIME_UTC <- mutate_each(flights, funs(make_POSIXlt(.)), MVT_TIME_UTC)
Error: wrong result size (9), expected 6 or 1

The issue should be linked with the as.POSIXlt call. If this line is commented out the code works within mutate_each and coerces the timestamp into POSIXct.

Any idea/help on what is wrong? Obviously, my data has several timestamps that I would like to coerce with mutate_each (or any other suitable dplyr function) ...


回答1:


Revisiting my question about 4 years later, I realised that I forgot to mark it as answered. However, this also gives me the chance to document how this (relatively) simple type coercion can (meanwhile) elegantly solved with dplyr and lubridate.

Key lesson learned:

  1. never use POSIXlt with a data frame (and its later brother tibble, although you can now work with list columns).
  2. coerce date-timestamps with the helpful parser functions from the lubridate package.

For the example from above

ICAO_ADEP <- c("DGAA","ZSPD","UAAA","RJTT","KJFK","WSSS")
MVT_TIME_UTC <- c("01-Jan-2013 04:02:24", NA,"01-Jan-2013 04:08:18", NA,"01-Jan-2013   04:17:11","01-Jan-2013 04:21:52")
flights <- data.frame(ICAO_ADEP, MVT_TIME_UTC)

flights <- flights %>% mutate(MVT_TIME_UTC = lubridate::dmy_hms(MVT_TIME_UTC)

will coerce the timestamps in MVT_TIME_UTC. Check the documentation on lubridate for other parsers and/or how to handle local time zones.



来源:https://stackoverflow.com/questions/27641129/dplyr-mutate-each-colswise-coercion-to-posixlt-fails

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!