问题
I am reading a large .txt
file (>1GB) into R
via fread
. I am reading the file in directly from a .zip
archive, via a bash command:
base = fread('unzip -p Folder.zip File.txt', sep = '|', header = FALSE,
stringsAsFactors = FALSE, na.strings="", quote = "", col.names = col_namesMain)
The text file separates entries via |
so that a typical line might look like:
RRX|||02020||333293||||12123
However, there are many places where empty entries are denoted by separators with no space between them, e.g. ||
in the example line above.
When using fread
, these adjacent separators are typically read in altogether, so that the above line returns the following entries:
RRX, ||02020|, 333293|||, 12123
when it should read in as:
RRX, NA, NA, 02020, NA, 333293, NA, NA, NA, 12123
I have tried using read.table
with the option skipNul = TRUE
, and this works perfectly. However, there doesn't seem to be any option similar to skipNul
for fread
. I would much prefer to use fread
over read.table
if possible, since I have several very large files. Despite my searching, I haven't come across much discussion of this problem. Any help much appreciated.
回答1:
I have tried using read.table with the option skipNul = TRUE, and this works perfectly. However, there doesn't seem to be any option similar to skipNul for fread.
This has been fixed in dev 1.12.3 on 15 Apr 2019 (see NEWS) :
- fread() now skips embedded NUL (\0), #3400. Thanks to Marcus Davy for reporting with examples, and Roy Storey for the initial PR.
来源:https://stackoverflow.com/questions/45973059/how-to-handle-data-with-no-space-between-separators-when-using-fread-in-r