R: Is there a way to subset a file while reading

后端 未结 3 333
甜味超标
甜味超标 2021-01-14 14:10

I have a huge .csv file, its size is ~ 1.4G and reading with read.csv takes time. There are several variables in that file and all i want is to ext

3条回答
  •  囚心锁ツ
    2021-01-14 14:53

    Check out the LaF package, it allows to read very large textfiles in blocks, so you don't have to read the entire file into memory.

    library(LaF)
    
    data_model <- detect_dm_csv("yourFile.csv", skip = 1) # detects the file structure
    dat <- laf_open(data_model) # opens connection to the file
    
    block_list <- lapply(seq(1,100000,1000), function(row_num){
        goto(dat, row_num)
        data_block <- next_block(dat, nrows = 1000) # reads data blocks of 1000 rows
        data_block <- data_block[data_block$Variables == "X",]
        return(data_block)
    })
    your_df <- do.call("rbind", block_list)
    

    Admittedly, the package sometimes feels a bit bulky and in some situations I had to find small hacks to get my results (you might have to adapt my solution for your data). Nevertheless, I found it a immensely useful solution for dealing with files that exceeded my RAM.

提交回复
热议问题