I have very large tables (30 million rows) that I would like to load as a dataframes in R. read.table()
has a lot of convenient features, but it seems like the
Strangely, no one answered the bottom part of the question for years even though this is an important one -- data.frame
s are simply lists with the right attributes, so if you have large data you don't want to use as.data.frame
or similar for a list. It's much faster to simply "turn" a list into a data frame in-place:
attr(df, "row.names") <- .set_row_names(length(df[[1]]))
class(df) <- "data.frame"
This makes no copy of the data so it's immediate (unlike all other methods). It assumes that you have already set names()
on the list accordingly.
[As for loading large data into R -- personally, I dump them by column into binary files and use readBin()
- that is by far the fastest method (other than mmapping) and is only limited by the disk speed. Parsing ASCII files is inherently slow (even in C) compared to binary data.]