问题
In R, I'm trying to read in a basic CSV file of about 42,900 rows (confirmed by Unix's wc -l). The relevant code is
vecs <- read.csv("feature_vectors.txt", header=FALSE, nrows=50000)
where nrows is a slight overestimate because why not. However,
>> dim(vecs)
[1] 16853 5
indicating that the resultant data frame has on the order of 17,000 rows. Is this a memory issue? Each row consists of a ~30 character hash code, a ~30 character string, and 3 integers, so the total size of the file is only about 4MB.
If it's relevant, I should also note that a lot of the rows have missing fields.
Thanks for your help!
回答1:
This sort of problem is often easy to resolve using count.fields
, which tells you how many columns the resulting data frame would have if you called read.csv
.
(n_fields <- count.fields("feature_vectors.txt"))
If not all the values of n_fields are the same, you have a problem.
if(any(diff(n_fields)))
{
warning("There's a problem with the file")
}
In that case look at values of n_fields
that are different to what you expect: the problems occur in these rows.
As Justin mentioned, a common problem is unmatched quotes. Open you CSV file and find out how strings are quoted there. Then call read.csv
, specifying the quote
argument.
回答2:
My guess is that you have embedded unmatched "
. So some of your rows are actually much longer than they should be. I'd do something like apply(vecs, 2, function(x), max(nchar(as.character(x)))
to check.
来源:https://stackoverflow.com/questions/11320372/rs-read-csv-omitting-rows