R's read.csv() omitting rows

旧街凉风 提交于 2019-12-23 15:14:22

问题


In R, I'm trying to read in a basic CSV file of about 42,900 rows (confirmed by Unix's wc -l). The relevant code is

vecs <- read.csv("feature_vectors.txt", header=FALSE, nrows=50000)

where nrows is a slight overestimate because why not. However,

>> dim(vecs)
[1] 16853     5

indicating that the resultant data frame has on the order of 17,000 rows. Is this a memory issue? Each row consists of a ~30 character hash code, a ~30 character string, and 3 integers, so the total size of the file is only about 4MB.

If it's relevant, I should also note that a lot of the rows have missing fields.

Thanks for your help!


回答1:


This sort of problem is often easy to resolve using count.fields, which tells you how many columns the resulting data frame would have if you called read.csv.

(n_fields <- count.fields("feature_vectors.txt"))

If not all the values of n_fields are the same, you have a problem.

if(any(diff(n_fields)))
{
  warning("There's a problem with the file")
}

In that case look at values of n_fields that are different to what you expect: the problems occur in these rows.

As Justin mentioned, a common problem is unmatched quotes. Open you CSV file and find out how strings are quoted there. Then call read.csv, specifying the quote argument.




回答2:


My guess is that you have embedded unmatched ". So some of your rows are actually much longer than they should be. I'd do something like apply(vecs, 2, function(x), max(nchar(as.character(x))) to check.



来源:https://stackoverflow.com/questions/11320372/rs-read-csv-omitting-rows

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!