问题
I have a csv file with ~200 columns and ~170K rows. The data has been extensively groomed and I know that it is well-formed. When read.table completes, I see that approximately half of the rows have been imported. There are no warnings nor errors. I set options( warn = 2 ). I'm using 64-bit latest version and I increased the memory limit to 10gig. Scratching my head here...no idea how to proceed debugging this.
Edit
When I said half the file, I don't mean the first half. The last observation read is towards the end of the file....so its seemingly random.
回答1:
You may have a comment character (#) in the file (try setting the option comment.char = ""
in read.table). Also, check that the quote option is set correctly.
回答2:
I've had this problem before how I approached it was to read in a set number of lines at a time and then combine after the fact.
df1 <- read.csv(..., nrows=85000)
df2 <- read.csv(..., skip=84999, nrows=85000)
colnames(df1) <- colnames(df2)
df <- rbind(df1,df2)
rm(df1,df2)
回答3:
I had a similar problem when reading in a large txt file which had a "|" separator. Scattered about the txt file were some text blocks that contained a quote (") which caused the read.xxx function to stop at the prior record without throwing an error. Note that the text blocks mentioned were not encased in double quotes; rather, they just contained one double quote character here and there (") which tripped it up.
I did a global search and replace on the txt file, replacing the double quote (") with a single quote ('), solving the problem (all rows were then read in without aborting).
来源:https://stackoverflow.com/questions/5684525/r-read-table-imports-half-of-the-dataset-no-errors-nor-warnings