I have very large tables (30 million rows) that I would like to load as a dataframes in R. read.table()
has a lot of convenient features, but it seems like the
Instead of the conventional read.table I feel fread is a faster function. Specifying additional attributes like select only the required columns, specifying colclasses and string as factors will reduce the time take to import the file.
data_frame <- fread("filename.csv",sep=",",header=FALSE,stringsAsFactors=FALSE,select=c(1,4,5,6,7),colClasses=c("as.numeric","as.character","as.numeric","as.Date","as.Factor"))