fread() fails with missing values in integer64 columns

前端未结

关注

 2  1059

When reading the text below, fread() fails to detect the missing values in columns 8 and 9. This is only with the default option integer64=\"integer64\"

相关标签:

2条回答

予麋鹿

2021-02-14 08:20
This apparently is an issue with the bit64 package, not fread() or data.table. From the bit64 documentation http://cran.r-project.org/web/packages/bit64/bit64.pdf

"Subscripting non-existing elements and subscripting with NAs is currently not supported. Such subscripting currently returns 9218868437227407266 instead of NA (the NA value of the un-derlying double code). Following the full R behaviour here would either destroy performance or require extensive C-coding."

I tried reassigning the 9218868437227407266 value to NA thinking it would work

Ex.
```
DT[V8==9218868437227407266, ]
#actually returns nothing, but
DT[V8==max(V8), ]
#returns the rows with 9218868437227407266 in V8
#but this does not reassign the value 
DT[V8==max(V8), V8:=NA]
#not that this makes sense, but I tried just in case...
DT[V8==max(V8), V8:=NA_character_]
```
So as the documentation pretty clearly states, if a vector is class integer64 it won't recognize NA or missing values. I've going to avoid bit64 just to not have to deal with this...
0 讨论(0)
发布评论:

提交评论
- 加载中...

耶瑟儿～

2021-02-14 08:30

This bug, #488, is now fixed with this commit in development version of data.table v1.9.5, and values are assigned (and displayed) properly as NA if bit64 is loaded.

require(data.table) # v1.9.5
require(bit64)
ans = fread("test.csv")
#      V1  V2 V3 V4 V5  V6 V7         V8         V9        V10 V11
# 1: 2012 276 NA  0 S1 001  1         NA  724135215 1590915056  NA
# 2: 2012 276  2  8 S1 001  1         NA         NA     154598   0
# 3: 2012 276  2 12 S1 001  1         NA    5118863   21819477  NA
# 4: 2012 276  2  0 S1 011  8 3127133583 3127133583 9003982501   0

0 讨论(0)