When reading the text below, fread()
fails to detect the missing values in columns 8 and 9. This is only with the default option integer64=\"integer64\"
This apparently is an issue with the bit64 package, not fread()
or data.table
. From the bit64
documentation http://cran.r-project.org/web/packages/bit64/bit64.pdf
"Subscripting non-existing elements and subscripting with NAs is currently not supported. Such subscripting currently returns 9218868437227407266 instead of NA (the NA value of the un-derlying double code). Following the full R behaviour here would either destroy performance or require extensive C-coding."
I tried reassigning the 9218868437227407266 value to NA thinking it would work
Ex.
DT[V8==9218868437227407266, ]
#actually returns nothing, but
DT[V8==max(V8), ]
#returns the rows with 9218868437227407266 in V8
#but this does not reassign the value
DT[V8==max(V8), V8:=NA]
#not that this makes sense, but I tried just in case...
DT[V8==max(V8), V8:=NA_character_]
So as the documentation pretty clearly states, if a vector is class integer64 it won't recognize NA or missing values. I've going to avoid bit64 just to not have to deal with this...
This bug, #488, is now fixed with this commit in development version of data.table v1.9.5
, and values are assigned (and displayed) properly as NA
if bit64
is loaded.
require(data.table) # v1.9.5
require(bit64)
ans = fread("test.csv")
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
# 1: 2012 276 NA 0 S1 001 1 NA 724135215 1590915056 NA
# 2: 2012 276 2 8 S1 001 1 NA NA 154598 0
# 3: 2012 276 2 12 S1 001 1 NA 5118863 21819477 NA
# 4: 2012 276 2 0 S1 011 8 3127133583 3127133583 9003982501 0