Suppose I have a gigantic m*n matrix X
(that is too big to read into memory) and binary numeric vector V
with length m
. My objective is to
I think you can use sqldf
package to achieve what you want.
sqldf
reads the csv file directly into SQLlite database bypassing R environment altogether.
library(sqldf)
Xfile <- file('target.csv')
sqlcmd <- paste0('select * from Xfile where rowid in (', paste(which(V==1), collapse=','), ')')
sqldf(sqlcmd, file.format=list(header=TRUE))
or:
library(sqldf)
Vdf <- data.frame(V)
sqlcmd <- "select file.* from file, Vdf on file.rowid = Vdf.rowid and V = 1"
read.csv.sql("target.csv", sql = sqlcmd)