Read csv data file in R

为君一笑 提交于 2019-12-02 16:50:11

问题


I am using read.table to read a data file. and got the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'a real', got 'true'

I know that means there's some error in my data file, the problem is how can I find where is it. The error message did not tell which row has the issue, it's hard for me to find it. Or how can I skip these rows?

Here's my R code:

data<-read.csv("/home/jianfezhang/prod/conversion_yaap/data/part-r-00000",
                   sep="\t",
                   col.names=c("site",
                               "treatment",
                               "mode",
                              "segment",
                              "source",
                              "itemId",
                              "leaf_categ_id",
                              "condition_id",
                              "auct_type_code",
                              "start_price_lstg_curncy",
                              "bin_price_lstg_curncy",
                              "start_price_variance",
                              "start_price_mean",
                              "start_price_media",
                              "bin_price_variance",
                              "bin_price_mean",
                              "bin_price_media",
                              "is_sold"),
                   colClasses=c(rep("factor",5),"numeric",rep("factor",3),rep("numeric",8),"factor")
                   );

回答1:


The error you get is caused by a the colClasses argument - some values in the file to not match the datatypes you specified.

Most of the time I encounter something like this, I probably just had some counting problem with the colClasses argument, e.g it would maybe be

colClasses=c(rep("factor",5),"numeric", rep("factor",4), rep("numeric",7),"factor")

instead of your default values. That may be simply checked by carefully comparing the contents of the first lines of your file with the datatypes you specified.

If this does not do the trick for you, you probably have some wrong datatype where you do not expect it. A simple, yet slow approach is to remove the colClasses argument and first read the whole file without specific options - probably add stringsAsFactors=FALSE to get only character values. This probably should work.

Then you may try to convert each column one by one, like

data$itemId <- as.numeric(data$itemId)

and then check the result for NA values, easily done by summary(data$itemId). If you got NA values, you can call which(is.na(data$itemId)) to get the row number and check your original file whether the NA in fact is valid or if you have some data problems there.

Most of the time you will be able to narrow down your problem this way.

If your file a lot of columns, however, this quickly becomes a lot of work....



来源:https://stackoverflow.com/questions/16911343/read-csv-data-file-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!