I have a huge .csv
file, its size is ~ 1.4G and reading with read.csv
takes time. There are several variables in that file and all i want is to ext
Check out the LaF
package, it allows to read very large textfiles in blocks, so you don't have to read the entire file into memory.
library(LaF)
data_model <- detect_dm_csv("yourFile.csv", skip = 1) # detects the file structure
dat <- laf_open(data_model) # opens connection to the file
block_list <- lapply(seq(1,100000,1000), function(row_num){
goto(dat, row_num)
data_block <- next_block(dat, nrows = 1000) # reads data blocks of 1000 rows
data_block <- data_block[data_block$Variables == "X",]
return(data_block)
})
your_df <- do.call("rbind", block_list)
Admittedly, the package sometimes feels a bit bulky and in some situations I had to find small hacks to get my results (you might have to adapt my solution for your data). Nevertheless, I found it a immensely useful solution for dealing with files that exceeded my RAM.