问题
Question:
Is there a way to specify the Date format when using the colClasses argument in read.table/read.csv?
(I realise I can convert after importing, but with many date columns like this, it would be easier to do it in the import step)
Example:
I have a .csv with date columns in the format %d/%m/%Y
.
dataImport <- read.csv(\"data.csv\", colClasses = c(\"factor\",\"factor\",\"Date\"))
This gets the conversion wrong. For example, 15/07/2008
becomes 0015-07-20
.
Reproducible code:
data <-
structure(list(func_loc = structure(c(1L, 2L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 5L), .Label = c(\"3076WAG0003\", \"3076WAG0004\", \"3076WAG0007\",
\"3076WAG0009\", \"3076WAG0010\"), class = \"factor\"), order_type = structure(c(3L,
3L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 1L), .Label = c(\"PM01\", \"PM02\",
\"PM03\"), class = \"factor\"), actual_finish = structure(c(4L, 6L,
1L, 2L, 3L, 7L, 1L, 8L, 1L, 5L), .Label = c(\"\", \"11/03/2008\",
\"14/08/2008\", \"15/07/2008\", \"17/03/2008\", \"19/01/2009\", \"22/09/2008\",
\"6/09/2007\"), class = \"factor\")), .Names = c(\"func_loc\", \"order_type\",
\"actual_finish\"), row.names = c(NA, 10L), class = \"data.frame\")
write.csv(data,\"data.csv\", row.names = F)
dataImport <- read.csv(\"data.csv\")
str(dataImport)
dataImport
dataImport <- read.csv(\"data.csv\", colClasses = c(\"factor\",\"factor\",\"Date\"))
str(dataImport)
dataImport
And here\'s what the output looks like:
回答1:
You can write your own function that accepts a string and converts it to a Date using the format you want, then use the setAs
to set it as an as
method. Then you can use your function as part of the colClasses.
Try:
setAs("character","myDate", function(from) as.Date(from, format="%d/%m/%Y") )
tmp <- c("1, 15/08/2008", "2, 23/05/2010")
con <- textConnection(tmp)
tmp2 <- read.csv(con, colClasses=c('numeric','myDate'), header=FALSE)
str(tmp2)
Then modify if needed to work for your data.
Edit ---
You might want to run setClass('myDate')
first to avoid the warning (you can ignore the warning, but it can get annoying if you do this a lot and this is a simple call that gets rid of it).
回答2:
If there is only 1 date format you want to change, you could use the Defaults
package to change the default format within as.Date.character
library(Defaults)
setDefaults('as.Date.character', format = '%d/%M/%Y')
dataImport <- read.csv("data.csv", colClasses = c("factor","factor","Date"))
str(dataImport)
## 'data.frame': 10 obs. of 3 variables:
## $ func_loc : Factor w/ 5 levels "3076WAG0003",..: 1 2 3 3 3 3 3 4 4 5
## $ order_type : Factor w/ 3 levels "PM01","PM02",..: 3 3 1 1 1 1 2 2 3 1
## $ actual_finish: Date, format: "2008-10-15" "2009-10-19" NA "2008-10-11" ...
I think @Greg Snow's answer is far better, as it does not change the default behaviour of an often used function.
回答3:
In case you need time also:
setClass('yyyymmdd-hhmmss')
setAs("character","yyyymmdd-hhmmss", function(from) as.POSIXct(from, format="%Y%m%d-%H%M%S"))
d <- read.table(colClasses="yyyymmdd-hhmmss", text="20150711-130153")
str(d)
## 'data.frame': 1 obs. of 1 variable:
## $ V1: POSIXct, format: "2015-07-11 13:01:53"
回答4:
A long time ago, in the meantime the problem has been solved by Hadley Wickham. So nowadays the solution is reduced to a oneliner:
library(readr)
data <- read_csv("data.csv",
col_types = cols(actual_finish = col_datetime(format = "%d/%m/%Y")))
Maybe we want even to get rid of unnecessary stuff:
data <- as.data.frame(data)
来源:https://stackoverflow.com/questions/13022299/specify-custom-date-format-for-colclasses-argument-in-read-table-read-csv