R - read.table imports half of the dataset - no errors nor warnings

流过昼夜 提交于 2019-12-23 17:23:37

问题


I have a csv file with ~200 columns and ~170K rows. The data has been extensively groomed and I know that it is well-formed. When read.table completes, I see that approximately half of the rows have been imported. There are no warnings nor errors. I set options( warn = 2 ). I'm using 64-bit latest version and I increased the memory limit to 10gig. Scratching my head here...no idea how to proceed debugging this.

Edit
When I said half the file, I don't mean the first half. The last observation read is towards the end of the file....so its seemingly random.


回答1:


You may have a comment character (#) in the file (try setting the option comment.char = "" in read.table). Also, check that the quote option is set correctly.




回答2:


I've had this problem before how I approached it was to read in a set number of lines at a time and then combine after the fact.

df1 <- read.csv(..., nrows=85000) 
df2 <- read.csv(..., skip=84999, nrows=85000) 
colnames(df1) <- colnames(df2)

df <- rbind(df1,df2) 
rm(df1,df2)



回答3:


I had a similar problem when reading in a large txt file which had a "|" separator. Scattered about the txt file were some text blocks that contained a quote (") which caused the read.xxx function to stop at the prior record without throwing an error. Note that the text blocks mentioned were not encased in double quotes; rather, they just contained one double quote character here and there (") which tripped it up.

I did a global search and replace on the txt file, replacing the double quote (") with a single quote ('), solving the problem (all rows were then read in without aborting).



来源:https://stackoverflow.com/questions/5684525/r-read-table-imports-half-of-the-dataset-no-errors-nor-warnings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!