问题
I have a date variable, which originally comes from an excel. However, it is so heterogeneous. Even though all look like yyyy/mm/dd in the excel, when read in R, the variable look like:
person_1 39257
person_2 2015/2/20
person_3 NA
How to clean up the date variable so that every and each shows yyyy/mm/dd format?
回答1:
Or an option with anydate
and excel_numeric_to_date
library(janitor)
library(anytime)
library(dplyr)
coalesce( excel_numeric_to_date(as.numeric(dat$V2)), anydate(dat$V2))
#[1] "2007-06-24" "2015-02-20" NA
data
dat <- structure(list(V1 = c("person_1", "person_2", "person_3"), V2 = c("39257",
"2015/2/20", NA)), class = "data.frame", row.names = c(NA, -3L
))
回答2:
An iterative approach, similar to how packages like lubridate
and others try to find a match. This uses a few including the excel model (which I think uses an origin of "1900-01-01", btw). The order is a little important: in the face of ambiguity, a better heuristic would find the one with the most matches and use that for all ... but that's over to you.
dat <- read.table(header=FALSE, stringsAsFactors=FALSE, text="
person_1 39257
person_2 2015/2/20
person_3 NA")
conv_dates <- function(dates, origin = "1900-01-01") {
out <- Sys.Date()[rep(NA, length(dates))]
notna0 <- !is.na(dates)
allnum <- notna0 & grepl("^[.0-9]+$", dates)
if (any(allnum)) out[allnum] <- suppressWarnings(as.Date(as.numeric(dates[allnum]), origin = origin))
fmts <- c("%Y/%m/%d", "%d/%m/%Y", "%m/%d/%Y")
for (fmt in fmts) {
isna <- notna0 & is.na(out)
if (!any(isna)) break
out[isna] <- as.Date(dates[isna], format = fmt)
}
out
}
str(conv_dates(dat$V2))
# Date[1:3], format: "2007-06-26" "2015-02-20" NA
回答3:
You can first change the dates using their appropriate format in YMD
, then change the numeric excel date with their origin.
dat$date <- as.Date(dat$V2, '%Y/%m/%d')
#Can also use
#dat$V2 <- lubridate::ymd(dat$V2)
inds <- is.na(dat$date)
dat$date[inds] <- as.Date(as.numeric(dat$V2[inds]),origin = "1899-12-30")
dat
# V1 V2 date
#1 person_1 39257 2007-06-24
#2 person_2 2015/2/20 2015-02-20
#3 person_3 <NA> <NA>
来源:https://stackoverflow.com/questions/61689061/r-inconsistent-date-format