I have very large tables (30 million rows) that I would like to load as a dataframes in R. read.table()
has a lot of convenient features, but it seems like the
An alternative is to use the vroom
package. Now on CRAN.
vroom
doesn't load the entire file, it indexes where each record is located, and is read later when you use it.
Only pay for what you use.
See Introduction to vroom, Get started with vroom and the vroom benchmarks.
The basic overview is that the initial read of a huge file, will be much faster, and subsequent modifications to the data may be slightly slower. So depending on what your use is, it could be the best option.
See a simplified example from vroom benchmarks below, the key parts to see is the super fast read times, but slightly sower operations like aggregate etc..
package read print sample filter aggregate total
read.delim 1m 21.5s 1ms 315ms 764ms 1m 22.6s
readr 33.1s 90ms 2ms 202ms 825ms 34.2s
data.table 15.7s 13ms 1ms 129ms 394ms 16.3s
vroom (altrep) dplyr 1.7s 89ms 1.7s 1.3s 1.9s 6.7s