converting multiple date formats into one in r

笑着哭i 提交于 2021-02-16 23:39:09

问题


I am working with messy excel file with multiple date formats

2016-10-17T12:38:41Z 
Mon Oct 17 08:03:08 GMT 2016
10-Sep-15
13-Oct-09
18-Oct-2016 05:42:26 UTC

I want to convert all of the above in yyyy-mm-dd format. I am using following code for the conversion but lot of values are coming NA.

as.Date(parse_date_time(df$date,c('mdy', 'ymd_hms','a b d HMS y','d b y HMS')))

How can I do it all of them together. I have read other threads on similar case,but nothing seems to work for my case. Please help


回答1:


If I add 'dmy' to the list then at least all of the cases in your example are succesfully parsed:

 z <- c("2016-10-17T12:38:41Z", "Mon Oct 17 08:03:08 GMT 2016", 
 "10-Sep-15",  "13-Oct-09", "18-Oct-2016 05:42:26 UTC")

library(lubridate)
parse_date_time(z,c('mdy', 'dmy', 'ymd_HMS','a b d HMS y','d b y HMS'))
## [1] "2016-10-17 12:38:41 UTC" "2016-10-17 08:03:08 UTC"
## [3] "2015-09-10 00:00:00 UTC" "2009-10-13 00:00:00 UTC"
## [5] "2016-10-18 05:42:26 UTC"

Your big problem will be the third and fourth elements: are these actually meant to be 'ymd' and 'dmy' respectively? I'm not sure how any logic will let you auto-detect these differences ... out of context, "15 Sep 2010" and "10 September 2015" both seem perfectly reasonable possibilities ...

For what it's worth I also tried the new anytime package - it only handled the first and last element.




回答2:


Removing the times first makes it possible to specify only three alternatives in orders to parse the sample data in the question. This interprets 10-Sep-15 and 13-Oct-09 as dmy but if you want them interpreted as ymd then uncomment the commented out line:

orders <- c("dmy", "mdy", "ymd")
# orders <- c("ymd", "dmy", "mdy")

as.Date(parse_date_time(gsub("..:..:..", " ", x), orders = orders))

giving:

[1] "2016-10-17" "2016-10-17" "2015-09-10" "2009-10-13" "2016-10-18"

or if the commented out line is uncommented then:

[1] "2016-10-17" "2016-10-17" "2010-09-15" "2013-10-09" "2016-10-18"

Note: The input is:

x <- c("2016-10-17T12:38:41Z ", "Mon Oct 17 08:03:08 GMT 2016", "10-Sep-15", 
"13-Oct-09", "18-Oct-2016 05:42:26 UTC")


来源:https://stackoverflow.com/questions/40222283/converting-multiple-date-formats-into-one-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!