Quickly reading very large tables as dataframes

后端未结

关注

 11  1831

清歌不尽 2020-11-21 04:46

I have very large tables (30 million rows) that I would like to load as a dataframes in R. read.table() has a lot of convenient features, but it seems like the

11条回答

终归单人心 (楼主)

2020-11-21 05:08
A minor additional points worth mentioning. If you have a very large file you can on the fly calculate the number of rows (if no header) using (where bedGraph is the name of your file in your working directory):
```
>numRow=as.integer(system(paste("wc -l", bedGraph, "| sed 's/[^0-9.]*\\([0-9.]*\\).*/\\1/'"), intern=T))
```
You can then use that either in read.csv , read.table ...
```
>system.time((BG=read.table(bedGraph, nrows=numRow, col.names=c('chr', 'start', 'end', 'score'),colClasses=c('character', rep('integer',3)))))
   user  system elapsed 
 25.877   0.887  26.752 
>object.size(BG)
203949432 bytes
```
0 讨论(0)

查看其它11个回答
发布评论:

提交评论
- 加载中...