问题
I have a large file with 6mil rows and I'm trying to read the data in chunks for processing so I don't hit my RAM limit. Here is my code (note temp.csv is just a dummy file with 41 records):
infile <- file("data/temp.csv", open="r")
headers <- as.character(read.table(infile, header = FALSE, nrows=1, sep=",", stringsAsFactors=FALSE))
while(length(temp <-read.table(infile, header = FALSE, nrows=10, sep=",", stringsAsFactors=FALSE)) > 0){
temp <- data.table(temp)
setnames(temp, colnames(temp), headers)
setkey(temp, Id)
print(temp[1, Tags])
}
print("hi")
close(infile)
Everything runs smoothly until the final iteration. I get this error message:
Error in read.table(infile, header = FALSE, nrows = 10, sep = ",", stringsAsFactors = FALSE) :
no lines available in input
In addition: Warning message:
In read.table(infile, header = FALSE, nrows = 10, sep = ",", stringsAsFactors = FALSE) :
incomplete final line found by readTableHeader on 'data/temp.csv'
Presumably this is because the final iteration only has 1 row of records and read.table is expect 10?
All the data is actually read in fine. Surprisingly, even in the final iteration, temp
still gets converted to a data.table
. But print("hi")
and everything after it never gets executed. Is there something I can do to get around this?
Thank you.
回答1:
Ah got it!
repeat{
temp <-read.table(infile, header = FALSE, nrows=10, sep=",", stringsAsFactors=FALSE)
temp <- data.table(temp)
setnames(temp, colnames(temp), headers)
setkey(temp, Id)
print(temp[1, Tags])
if (nrow(temp) < 10) break
}
print("hi")
This still produces warning message but no more errors:
Warning message:
In read.table(infile, header = FALSE, nrows = 10, sep = ",", stringsAsFactors = FALSE) :
incomplete final line found by readTableHeader on 'data/temp.csv'
来源:https://stackoverflow.com/questions/19441236/read-table-in-chunks-error-message