I am relatively new in the \"large data process\" in r here, hope to look for some advise about how to deal with 50 GB csv file. The current problem is following:
Table
You can use R with SQLite behind the curtains with the sqldf package. You'd use the read.csv.sql
function in the sqldf
package and then you can query the data however you want to obtain the smaller data frame.
The example from the docs:
library(sqldf)
iris2 <- read.csv.sql("iris.csv",
sql = "select * from file where Species = 'setosa' ")
I've used this library on VERY large CSV files with good results.