as.Date yields NA for month name “März” (march)

前端 未结 3 854
挽巷
挽巷 2021-01-20 05:46

I got a scraped character vector with dates. My problem: When using as.Date(), every date containing the month name \"März\" (= which means \"march\" in German)

相关标签:
3条回答
  • 2021-01-20 06:23

    I could reproduce this on Windows 7 x64. There are many issues with how R and Windows interact with character encoding, and I don't pretend to understand them. In your case, simply converting to latin1 encoding before converting to a Date should work.

    as.Date(iconv(dates,from='UTF-8',to='latin1'),'%d. %B %Y')
    #  [1] "2009-02-12" "2006-11-12" "2010-03-19" "2007-06-30" "2006-03-07" "2007-03-19"
    #  [7] "2006-01-22" "2005-09-24" "2012-02-15" "2007-03-28"
    

    There might be a way to get as.Date to recognize different encodings in Windows, but I don't know it.

    0 讨论(0)
  • 2021-01-20 06:26

    This is a long comment/answer extension.

    I had almost the same problem.

    For example, with

    months <- c("JAN", "FEB", "MAR", "APR", "MAY", "JUN", 
                "JUL", "AUG", "SEP", "OCT", "NOV", "DEC")
    for (month in months) print(
         as.Date(iconv(paste("01", month, "2014", sep=""), 
                      from='UTF-8', to='latin1'), "%d%b%Y"))
    

    I got

    [1] "2014-01-01"
    [1] "2014-02-01"
    [1] NA
    [1] "2014-04-01"
    [1] NA
    [1] "2014-06-01"
    [1] "2014-07-01"
    [1] "2014-08-01"
    [1] "2014-09-01"
    [1] NA
    [1] "2014-11-01"
    [1] "2014-12-01"
    

    So I do not have dates for March, May and October (using iconv() or not was irrelevant with the specific arguments).

    What solved it was:

    Sys.setlocale("LC_TIME", "en_US.UTF-8")
    

    Then I got everything correctly (iconv() wasn't necessary).

    0 讨论(0)
  • 2021-01-20 06:31

    I also had a quite similar issue. I'm going to write the solution I found hoping to help users with Italian local system setting

     Sys.setlocale("LC_TIME")
    

    [1] "Italian_Italy.1252"

    and I had to convert factors to date: factors were

    levels(dates)
    

    [1] "1. Jun. 2012" "11. Sep. 2012" "19. Oct. 2012" "20. Mar. 2013" "28. Jun. 2012" [6] "7. May. 2012"

    This produced NA in the conversion for all months but March (because the abbreviation is the same in Italian)

     head(as.Date(dates, format= "%d. %b. %Y"))
    

    [1] NA NA NA NA NA NA

     summary(GEM_variability$date)
    
        Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
    

    "2013-03-20" "2013-03-20" "2013-03-20" "2013-03-20" "2013-03-20" "2013-03-20" NA's "559"

    I found the solution in the help file of ?strftime

    lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C")
    dates<- as.Date(date, format="%d. %b. %Y")
    #dates<- strptime(date, format="%d. %b. %Y")
    Sys.setlocale("LC_TIME", lct)
    
    0 讨论(0)
提交回复
热议问题