Quickly reading very large tables as dataframes

后端 未结 11 1781
清歌不尽
清歌不尽 2020-11-21 04:46

I have very large tables (30 million rows) that I would like to load as a dataframes in R. read.table() has a lot of convenient features, but it seems like the

11条回答
  •  夕颜
    夕颜 (楼主)
    2020-11-21 05:11

    Strangely, no one answered the bottom part of the question for years even though this is an important one -- data.frames are simply lists with the right attributes, so if you have large data you don't want to use as.data.frame or similar for a list. It's much faster to simply "turn" a list into a data frame in-place:

    attr(df, "row.names") <- .set_row_names(length(df[[1]]))
    class(df) <- "data.frame"
    

    This makes no copy of the data so it's immediate (unlike all other methods). It assumes that you have already set names() on the list accordingly.

    [As for loading large data into R -- personally, I dump them by column into binary files and use readBin() - that is by far the fastest method (other than mmapping) and is only limited by the disk speed. Parsing ASCII files is inherently slow (even in C) compared to binary data.]

提交回复
热议问题