Quickly reading very large tables as dataframes

后端 未结 11 1767
清歌不尽
清歌不尽 2020-11-21 04:46

I have very large tables (30 million rows) that I would like to load as a dataframes in R. read.table() has a lot of convenient features, but it seems like the

11条回答
  •  栀梦
    栀梦 (楼主)
    2020-11-21 05:10

    An alternative is to use the vroom package. Now on CRAN. vroom doesn't load the entire file, it indexes where each record is located, and is read later when you use it.

    Only pay for what you use.

    See Introduction to vroom, Get started with vroom and the vroom benchmarks.

    The basic overview is that the initial read of a huge file, will be much faster, and subsequent modifications to the data may be slightly slower. So depending on what your use is, it could be the best option.

    See a simplified example from vroom benchmarks below, the key parts to see is the super fast read times, but slightly sower operations like aggregate etc..

    package                 read    print   sample   filter  aggregate   total
    read.delim              1m      21.5s   1ms      315ms   764ms       1m 22.6s
    readr                   33.1s   90ms    2ms      202ms   825ms       34.2s
    data.table              15.7s   13ms    1ms      129ms   394ms       16.3s
    vroom (altrep) dplyr    1.7s    89ms    1.7s     1.3s    1.9s        6.7s
    

提交回复
热议问题