How to handle data with no space between separators when using fread in R

折月煮酒 提交于 2019-12-11 07:27:20

问题


I am reading a large .txt file (>1GB) into R via fread. I am reading the file in directly from a .zip archive, via a bash command:

base = fread('unzip -p Folder.zip File.txt', sep = '|', header = FALSE, 
stringsAsFactors = FALSE, na.strings="", quote = "", col.names = col_namesMain)

The text file separates entries via | so that a typical line might look like:

RRX|||02020||333293||||12123

However, there are many places where empty entries are denoted by separators with no space between them, e.g. || in the example line above.

When using fread, these adjacent separators are typically read in altogether, so that the above line returns the following entries:

RRX, ||02020|, 333293|||, 12123

when it should read in as:

RRX, NA, NA, 02020, NA, 333293, NA, NA, NA, 12123

I have tried using read.table with the option skipNul = TRUE, and this works perfectly. However, there doesn't seem to be any option similar to skipNul for fread. I would much prefer to use fread over read.table if possible, since I have several very large files. Despite my searching, I haven't come across much discussion of this problem. Any help much appreciated.


回答1:


I have tried using read.table with the option skipNul = TRUE, and this works perfectly. However, there doesn't seem to be any option similar to skipNul for fread.

This has been fixed in dev 1.12.3 on 15 Apr 2019 (see NEWS) :

  1. fread() now skips embedded NUL (\0), #3400. Thanks to Marcus Davy for reporting with examples, and Roy Storey for the initial PR.


来源:https://stackoverflow.com/questions/45973059/how-to-handle-data-with-no-space-between-separators-when-using-fread-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!