Quickly reading very large tables as dataframes

后端 未结 11 1775
清歌不尽
清歌不尽 2020-11-21 04:46

I have very large tables (30 million rows) that I would like to load as a dataframes in R. read.table() has a lot of convenient features, but it seems like the

11条回答
  •  终归单人心
    2020-11-21 05:08

    A minor additional points worth mentioning. If you have a very large file you can on the fly calculate the number of rows (if no header) using (where bedGraph is the name of your file in your working directory):

    >numRow=as.integer(system(paste("wc -l", bedGraph, "| sed 's/[^0-9.]*\\([0-9.]*\\).*/\\1/'"), intern=T))
    

    You can then use that either in read.csv , read.table ...

    >system.time((BG=read.table(bedGraph, nrows=numRow, col.names=c('chr', 'start', 'end', 'score'),colClasses=c('character', rep('integer',3)))))
       user  system elapsed 
     25.877   0.887  26.752 
    >object.size(BG)
    203949432 bytes
    

提交回复
热议问题