Can I apply a function over a vector using base tryCatch?

后端 未结 3 699
既然无缘
既然无缘 2021-01-16 16:24

I\'m trying to parse dates (using lubridate functions) from a vector which has mixed date formats.

departureDate <- c(\"Aug 17, 2020 12:00:00 AM\", \"Nov          


        
相关标签:
3条回答
  • 2021-01-16 17:05

    The ideal situation is that the code should be able to deal with every format on its own, without letting it fall to an exception.

    Another issue to take into account is that the myd_hms() function returns dates in the POSIXct data type, whereas dmy() returns the Date type, so they wouldn't mix well together.

    The code below applies mdy_hms(), then converts it to Date. It then tests for NA's and applies the second function dmy() on the missing values. More rules can be added in the pipeline at will if more formats are to be recognized.

    library(dplyr)
    
    dates.converted <- 
      mdy_hms(departureDate, tz = ) %>% 
      as.Date() %>%
      ifelse(!is.na(.), ., dmy(departureDate)) %>%
      structure(class = "Date")
    
    print(dates.converted)
    

    Output

     [1] "2020-08-17" "2019-11-19" "2020-12-21" "2020-12-24" "2020-12-24" "2020-04-19" "2019-06-28" "2019-08-16"
     [9] "2019-02-04" "2019-04-10" "2019-07-28" "2019-07-26" "2020-06-22" "2020-04-05" "2021-05-01"
    
    0 讨论(0)
  • 2021-01-16 17:09

    One method would be to iterate through a list of candidate formats and apply it only to dates not previously parsed correctly.

    fmts <- c("%b %d, %Y %H:%M:%S %p", "%d/%m/%Y")
    dates <- rep(Sys.time()[NA], length(departureDate))
    for (fmt in fmts) {
      isna <- is.na(dates)
      if (!any(isna)) break
      dates[isna] <- as.POSIXct(departureDate[isna], format = fmt)
    }
    dates
    #  [1] "2020-08-17 12:00:00 PDT" "2019-11-19 12:00:00 PST" "2020-12-21 12:00:00 PST"
    #  [4] "2020-12-24 12:00:00 PST" "2020-12-24 12:00:00 PST" "2020-04-19 12:00:00 PDT"
    #  [7] "2019-06-28 00:00:00 PDT" "2019-08-16 00:00:00 PDT" "2019-02-04 00:00:00 PST"
    # [10] "2019-04-10 00:00:00 PDT" "2019-07-28 00:00:00 PDT" "2019-07-26 00:00:00 PDT"
    # [13] "2020-06-22 12:00:00 PDT" "2020-04-05 12:00:00 PDT" "2021-05-01 12:00:00 PDT"
    as.Date(dates)
    #  [1] "2020-08-17" "2019-11-19" "2020-12-21" "2020-12-24" "2020-12-24" "2020-04-19" "2019-06-28"
    #  [8] "2019-08-16" "2019-02-04" "2019-04-10" "2019-07-28" "2019-07-26" "2020-06-22" "2020-04-05"
    # [15] "2021-05-01"
    

    I encourage you to put the most-likely formats first in the fmts vector.

    The way this is set up, as soon as all elements are correctly found, no further formats are attempted (i.e., break).


    Edit: if there is a difference in LOCALE where AM/PM are not locally recognized, then one method would be to first remove them from the strings:

    departureDate <- gsub("\\s[AP]M$", "", departureDate)
    departureDate
    #  [1] "Aug 17, 2020 12:00:00" "Nov 19, 2019 12:00:00" "Dec 21, 2020 12:00:00"
    #  [4] "Dec 24, 2020 12:00:00" "Dec 24, 2020 12:00:00" "Apr 19, 2020 12:00:00"
    #  [7] "28/06/2019"            "16/08/2019"            "04/02/2019"           
    # [10] "10/04/2019"            "28/07/2019"            "26/07/2019"           
    # [13] "Jun 22, 2020 12:00:00" "Apr 5, 2020 12:00:00"  "May 1, 2021 12:00:00" 
    

    and then use a simpler format:

    fmts <- c("%b %d, %Y %H:%M:%S", "%d/%m/%Y")
    
    0 讨论(0)
  • 2021-01-16 17:16

    We can use lubridate::parse_date_time which can take multiple formats.

    lubridate::parse_date_time(departureDate, c('%b %d, %Y %I:%M:%S %p', '%d/%m/%Y'))
    
    #[1] "2020-08-17 UTC" "2019-11-19 UTC" "2020-12-21 UTC" "2020-12-24 UTC"
    #[5] "2020-12-24 UTC" "2020-04-19 UTC" "2019-06-28 UTC" "2019-08-16 UTC"
    #[9] "2019-02-04 UTC" "2019-04-10 UTC" "2019-07-28 UTC" "2019-07-26 UTC"
    #[13] "2020-06-22 UTC" "2020-04-05 UTC" "2021-05-01 UTC"
    

    Since in departureDate month name is in English, you need the locale to be English as well.

    Refer How to change the locale of R? if you have non-English locale.

    0 讨论(0)
提交回复
热议问题