问题
I am using read.table
to read a data file. and got the following error:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'a real', got 'true'
I know that means there's some error in my data file, the problem is how can I find where is it. The error message did not tell which row has the issue, it's hard for me to find it. Or how can I skip these rows?
Here's my R code:
data<-read.csv("/home/jianfezhang/prod/conversion_yaap/data/part-r-00000",
sep="\t",
col.names=c("site",
"treatment",
"mode",
"segment",
"source",
"itemId",
"leaf_categ_id",
"condition_id",
"auct_type_code",
"start_price_lstg_curncy",
"bin_price_lstg_curncy",
"start_price_variance",
"start_price_mean",
"start_price_media",
"bin_price_variance",
"bin_price_mean",
"bin_price_media",
"is_sold"),
colClasses=c(rep("factor",5),"numeric",rep("factor",3),rep("numeric",8),"factor")
);
回答1:
The error you get is caused by a the colClasses
argument - some values in the file to not match the datatypes you specified.
Most of the time I encounter something like this, I probably just had some counting problem with the colClasses
argument, e.g it would maybe be
colClasses=c(rep("factor",5),"numeric", rep("factor",4), rep("numeric",7),"factor")
instead of your default values. That may be simply checked by carefully comparing the contents of the first lines of your file with the datatypes you specified.
If this does not do the trick for you, you probably have some wrong datatype where you do not expect it. A simple, yet slow approach is to remove the colClasses
argument and first read the whole file without specific options - probably add stringsAsFactors=FALSE
to get only character values. This probably should work.
Then you may try to convert each column one by one, like
data$itemId <- as.numeric(data$itemId)
and then check the result for NA
values, easily done by summary(data$itemId)
. If you got NA
values, you can call which(is.na(data$itemId))
to get the row number and check your original file whether the NA
in fact is valid or if you have some data problems there.
Most of the time you will be able to narrow down your problem this way.
If your file a lot of columns, however, this quickly becomes a lot of work....
来源:https://stackoverflow.com/questions/16911343/read-csv-data-file-in-r