Specifying colClasses in the read.csv

我是研究僧i 提交于 2019-11-26 06:17:08

问题


I am trying to specify the colClasses options in the read.csv function in R. In my data, the first column \"time\" is basically a character vector while the rest of the columns are numeric.

data <- read.csv(\"test.csv\", comment.char=\"\" , 
                 colClasses=c(time=\"character\", \"numeric\"), 
                 strip.white=FALSE)

In the above command, I would want R to read in the \"time\" column as \"character\" and the rest as numeric. Although, the \"data\" variable did have the correct result after the command completed, R returned the following warnings. I am wondering how I could fix these warnings?

Warning messages:
 1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
    not all columns named in \'colClasses\' exist
 2: In tmp[i[i > 0L]] <- colClasses :
    number of items to replace is not a multiple of replacement length

Derek


回答1:


The colClasses vector must have length equal to the number of imported columns. Supposing the rest of your dataset columns are 5:

colClasses=c("character",rep("numeric",5))



回答2:


You can specify the colClasse for only one columns.

So in your example you should use:

data <- read.csv('test.csv', colClasses=c("time"="character"))



回答3:


Assuming your 'time' column has at least one observation with a non-numeric character and all your other columns only have numbers, then 'read.csv's default will be to read in 'time' as a 'factor' and all the rest of the columns as 'numeric'. Therefore setting 'stringsAsFactors=F' will have the same result as setting the 'colClasses' manually i.e.,

data <- read.csv('test.csv', stringsAsFactors=F)



回答4:


If you want to refer to names from the header rather than column numbers, you can use something like this:

fname <- "test.csv"
headset <- read.csv(fname, header = TRUE, nrows = 10)
classes <- sapply(headset, class)
classes[names(classes) %in% c("time")] <- "character"
dataset <- read.csv(fname, header = TRUE, colClasses = classes)



回答5:


For multiple datetime columns with no header, and a lot of columns, say my datetime fields are in columns 36 and 38, and I want them read in as character fields:

data<-read.csv("test.csv", head=FALSE,   colClasses=c("V36"="character","V38"="character"))                        



回答6:


I know OP asked about the utils::read.csv function, but let me provide an answer for these that come here searching how to do it using readr::read_csv from the tidyverse.

read_csv ("test.csv", col_names=FALSE, col_types = cols (.default = "c", time = "i"))

This should set the default type for all columns as character, while time would be parsed as integer.




回答7:


If we combine what @Hendy and @Oddysseus Ithaca contributed, we get cleaner and a more general (i.e., adaptable?) chunk of code.

    data <- read.csv("test.csv", head = F, colClasses = c(V36 = "character", V38 = "character"))                        


来源:https://stackoverflow.com/questions/2805357/specifying-colclasses-in-the-read-csv

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!