I am trying to read a single column of a CSV
file to R
as quickly as possible. I am hoping to cut down on standard methods in terms of the time it take
There is a speed comparison of methods to read large CSV files in this blog. fread is the fastest by an order of magnitude.
As mentioned in the comments above, you can use the select parameter to select which columns to read - so:
fread("main.csv",sep = ",", select = c("f1") )
will work
I would suggest
scan(pipe("cut -f1 -d, Main.csv"))
This differs from the original proposal (read.table(pipe("cut -f1 Main.csv"))
) in a couple of different ways:
cut
assumes tab-separation by default, you need to specify d,
to specify comma-separationscan()
is much faster than read.table
for simple/unstructured data reads.According to the comments by the OP this takes about 4 rather than 40+ seconds.