I have a vector of character representation of dates, where formats mostly are dmY
(e.g. 27-09-2013), dmy
(e.g. 27-09-13), and occasionally some
This is actually intentional. I recall it now. It is assumed that if you have dates of the form 01-02-1845 and 01-02-03 in the same vector, then it is probably 01-02-0003 what is meant. It also avoids confusion with dates from different centuries. You cannot know if 17-05-13
refers to 20th or 21st century.
There might have also been a technical reason for this decision, but I don't remember right now.
.select_formats
argument is the way to go:
my_select <- function(trained){
n_fmts <- nchar(gsub("[^%]", "", names(trained))) +
grepl("%y", names(trained))*1.5
names(trained[ which.max(n_fmts) ])
}
parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select)
## [1] "2013-09-27 UTC" "2013-09-27 UTC"
select_formats
should return formats to be applied sequentially to the input character vector. In the above example you give precedence to %y format.
I am adding this example to the docs.
It looks like a bug. I am not sure So you should contact the maintainer.
Building the package source and changing one line in this internal function ( I replace which.max
by wich.min
):
.select_formats <- function(trained){
n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%Y", names(trained))*1.5
names(trained[ which.min(n_fmts) ]) ## replace which.max by which.min
}
seems to correct the problem. Frankly I don't know why this works, but I guess it is a kind of ranking..
parse_date_time(c("27-09-13", "27-09-2013"), orders = c("d m y", "d m Y"))
[1] "2013-09-27 UTC" "2013-09-27 UTC"
parse_date_time(c("2013-09-27", "13-09-13"), orders = c("Y m d", "y m d"))
[1] "2013-09-27 UTC" "2013-09-13 UTC"