Match ISO 8601 week-of-year numbers to month-of-year numbers on Windows with German locale

纵饮孤独 提交于 2019-12-11 05:49:35

问题


This is directly related to my question POSIX date from dates in weekly time format.

However, in this question I'd like to specifically ask for how to map ISO 8601 week numbers to month of the year numbers.

To me, it seems it is not possible and/or involves some non-intuitive hacks (and even these don't really work reliably) and IMO should thus be considered as something that needs to be fixed in base R. Please correct me if I'm wrong, though

EDIT: seems like it the issue is closely related to either running on Windows and/or the locale you're on (standard German, in my case)

posix <- as.POSIXct(c("2015-12-24", "2015-12-31", "2016-01-01", "2016-01-08"))

ISO 8601

(yw <- format(posix, "%Y-%V"))
# [1] "2015-52" "2015-53" "2016-53" "2016-01"
ywd <- sprintf("%s-1", yw)
(as.POSIXct(ywd, format = "%Y-%V-%u"))
# [1] "2015-01-12 CET" "2015-01-12 CET" "2016-01-12 CET" "2016-01-12 CET"
# -> utterly wrong!!!

ywd <- sprintf("%s-4", yw)
(as.POSIXct(ywd, format = "%Y-%V-%u"))
# -> still wrong -> the day of the week is not the reason

# -> no way to use ISO 8601 convention to map week of the year to month of the year

For the sake of due dilligence: it's also not possible when trying to use the US or UK conventions:

US convention

(yw <- format(posix, "%Y-%U"))
# [1] "2015-51" "2015-52" "2016-00" "2016-01"
ywd <- sprintf("%s-1", yw)
(as.POSIXct(ywd, format = "%Y-%U-%u"))
# [1] "2015-12-21 CET" "2015-12-28 CET" NA               "2016-01-04 CET"
# -> NA problem for week 00

ywd <- sprintf("%s-4", yw)
# -> does not work for week 00
(as.POSIXct(ywd, format = "%Y-%U-%u"))
# The day of the week is not the reason

# -> no way to use this convention to reliably map week of the year to month of the year

UK convention

(yw <- format(posix, "%Y-%W"))
# [1] "2015-51" "2015-52" "2016-00" "2016-01"
ywd <- sprintf("%s-1", yw)
(as.POSIXct(ywd, format = "%Y-%W-%u"))
# [1] "2015-12-21 CET" "2015-12-28 CET" NA               "2016-01-04 CET"
# -> NA problem for week 00

ywd <- sprintf("%s-4", yw)
# -> does not work for week 00
(as.POSIXct(ywd, format = "%Y-%W-%u"))
# The day of the week is not the reason

# -> no way to use this convention to reliably map week of the year to month of the year

Session info

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=German_Germany.1252     LC_CTYPE=German_Germany.1252       LC_MONETARY=German_Germany.1252   
[4] LC_NUMERIC=C                       LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] fva_0.1.0       digest_0.6.10   readxl_0.1.1    dplyr_0.5.0     plyr_1.8.4      magrittr_1.5   
 [7] memoise_1.0.0   testthat_1.0.2  roxygen2_5.0.1  devtools_1.12.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8     lubridate_1.6.0 assertthat_0.1  packrat_0.4.8-1 crayon_1.3.2    withr_1.0.2    
 [7] R6_2.2.0        DBI_0.5-1       stringi_1.1.2   rstudioapi_0.6  tools_3.3.2     stringr_1.1.0  
[13] tibble_1.2     

> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.2 (2016-10-31)
 system   x86_64, mingw32             
 ui       RStudio (1.0.136)           
 language en                          
 collate  German_Germany.1252         
 tz       Europe/Berlin               
 date     2017-01-12                  

Packages ---------------------------------------------------------------------------------------------------
 package    * version date       source        
 assertthat   0.1     2013-12-06 CRAN (R 3.3.2)
 crayon       1.3.2   2016-06-28 CRAN (R 3.3.2)
 DBI          0.5-1   2016-09-10 CRAN (R 3.3.2)
 devtools   * 1.12.0  2016-06-24 CRAN (R 3.3.2)
 digest     * 0.6.10  2016-08-02 CRAN (R 3.3.2)
 dplyr      * 0.5.0   2016-06-24 CRAN (R 3.3.2)
 fva        * 0.1.0   <NA>       local         
 lubridate    1.6.0   2016-09-13 CRAN (R 3.3.2)
 magrittr   * 1.5     2014-11-22 CRAN (R 3.3.2)
 memoise    * 1.0.0   2016-01-29 CRAN (R 3.3.2)
 packrat      0.4.8-1 2016-09-07 CRAN (R 3.3.2)
 plyr       * 1.8.4   2016-06-08 CRAN (R 3.3.2)
 R6           2.2.0   2016-10-05 CRAN (R 3.3.2)
 Rcpp         0.12.8  2016-11-17 CRAN (R 3.3.2)
 readxl     * 0.1.1   2016-03-28 CRAN (R 3.3.2)
 roxygen2   * 5.0.1   2015-11-11 CRAN (R 3.3.2)
 stringi      1.1.2   2016-10-01 CRAN (R 3.3.2)
 stringr      1.1.0   2016-08-19 CRAN (R 3.3.2)
 testthat   * 1.0.2   2016-04-23 CRAN (R 3.3.2)
 tibble       1.2     2016-08-26 CRAN (R 3.3.2)
 withr        1.0.2   2016-06-20 CRAN (R 3.3.2)

回答1:


Disclosure: As mentioned in this answer I have created the ISOweek package to deal with ISO 8601 week-based dates.

The question contains several flaws:

  1. The ISO 8601 week-based year is different from the calendar year.
  2. Without specifing a day of week, the conversion of year-week to year-month is ambiguous.

Week-based year vs calendar year

The OP has created sample data using

posix <- as.POSIXct(c("2015-12-24", "2015-12-31", "2016-01-01", "2016-01-08"))
(yw <- format(posix, "%Y-%V"))
[1] "2015-52" "2015-53" "2016-53" "2016-01"

The format specification %Y returns the calendar year which apparently is wrong for the third element.

With the correct format specification %G we do get

(yw <- format(posix, "%G-%V"))
[1] "2015-52" "2015-53" "2015-53" "2016-01"

Conversion of week-of-the-year to month-of-the-year

Just providing the ISO week-based year and week number without the day of week will yield ambiguous results.

This can be demonstrated with the (corrected) sample data which now contain three consecutive weeks in the OP's own (non-standard) year-week format:

yw
[1] "2015-52" "2015-53" "2016-01"

With help of the ISOweek2date() function from the ISOweek package the data are converted to calendar dates. Note that ISOweek2date() requires a full ISO 8601 week-based date in the format yyyy-Www-d including the day of week. If we choose the first day of the week (Monday) we do get:

library(ISOweek)
library(magrittr)
yw %>% 
  # insert "W" to conform with ISO 8601 format
  sub("-", "-W", .) %>% 
  # append day of week
  paste0("-1") %>%
  # convert to class Date and print as yyyy-mm 
  ISOweek2date() %>% 
  format("%Y-%m")
[1] "2015-12" "2015-12" "2016-01"

Now, we repeat this using the last day of the week (Sunday):

yw %>% 
  sub("-", "-W", .) %>% 
  paste0("-7") %>% 
  ISOweek2date() %>% 
  format("%Y-%m")
[1] "2015-12" "2016-01" "2016-01"

Note that the second element now refers to January 2016 instead of December 2015 because the Sunday of week 53 is in January and the Monday of this week still is in December.




回答2:


Pretty sure something else besides base R needs changing (see note at end tho):

some_dates <- as.POSIXct(c("2015-12-24", "2015-12-31", "2016-01-01", "2016-01-08"))

(year_week <- format(some_dates, "%Y %U"))
## [1] "2015 51" "2015 52" "2016 00" "2016 01"

(year_week_day <- sprintf("%s 1", year_week))
## [1] "2015 51 1" "2015 52 1" "2016 00 1" "2016 01 1"

(as.POSIXct(year_week_day, format = "%Y %U %u"))
## [1] "2015-12-21 EST" "2015-12-28 EST" "2016-01-04 EST" "2016-01-04 EST"

It works with the dashes, too:

(year_week <- format(some_dates, "%Y-%U"))
## [1] "2015-51" "2015-52" "2016-00" "2016-01"

(year_week_day <- sprintf("%s-1", year_week))
## [1] "2015-51-1" "2015-52-1" "2016-00-1" "2016-01-1"

(as.POSIXct(year_week_day, format = "%Y-%U-%u"))
## [1] "2015-12-21 EST" "2015-12-28 EST" "2016-01-04 EST" "2016-01-04 EST"

and, despite dashes being OK ISO form, they can lead to confusion in readers when various values aren't >12 or <0

NOTE

As the comment thread indicates this is the behaviour on Windows:

(year_week <- format(some_dates, "%Y-%U"))
## [1] "2015-51" "2015-52" "2016-00" "2016-01"

(year_week_day <- sprintf("%s-1", year_week))
## [1] "2015-51-1" "2015-52-1" "2016-00-1" "2016-01-1"

(as.POSIXct(year_week_day, format = "%Y-%U-%u"))
## [1] "2015-12-21 PST" "2015-12-28 PST" NA               "2016-01-04 PST"

(Windows 10 64bit, R 3.3.2 for me/this example)




回答3:


The documentation for R datetime format parameters ?strptime says "%V" will be ignored on input.



来源:https://stackoverflow.com/questions/41616407/match-iso-8601-week-of-year-numbers-to-month-of-year-numbers-on-windows-with-ger

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!