fread() fails with missing values in integer64 columns

前端 未结 2 1041
感情败类
感情败类 2021-02-14 07:53

When reading the text below, fread() fails to detect the missing values in columns 8 and 9. This is only with the default option integer64=\"integer64\"

相关标签:
2条回答
  • 2021-02-14 08:20

    This apparently is an issue with the bit64 package, not fread() or data.table. From the bit64 documentation http://cran.r-project.org/web/packages/bit64/bit64.pdf

    "Subscripting non-existing elements and subscripting with NAs is currently not supported. Such subscripting currently returns 9218868437227407266 instead of NA (the NA value of the un-derlying double code). Following the full R behaviour here would either destroy performance or require extensive C-coding."

    I tried reassigning the 9218868437227407266 value to NA thinking it would work

    Ex.

    DT[V8==9218868437227407266, ]
    #actually returns nothing, but
    DT[V8==max(V8), ]
    #returns the rows with 9218868437227407266 in V8
    #but this does not reassign the value 
    DT[V8==max(V8), V8:=NA]
    #not that this makes sense, but I tried just in case...
    DT[V8==max(V8), V8:=NA_character_]
    

    So as the documentation pretty clearly states, if a vector is class integer64 it won't recognize NA or missing values. I've going to avoid bit64 just to not have to deal with this...

    0 讨论(0)
  • 2021-02-14 08:30

    This bug, #488, is now fixed with this commit in development version of data.table v1.9.5, and values are assigned (and displayed) properly as NA if bit64 is loaded.

    require(data.table) # v1.9.5
    require(bit64)
    ans = fread("test.csv")
    #      V1  V2 V3 V4 V5  V6 V7         V8         V9        V10 V11
    # 1: 2012 276 NA  0 S1 001  1         NA  724135215 1590915056  NA
    # 2: 2012 276  2  8 S1 001  1         NA         NA     154598   0
    # 3: 2012 276  2 12 S1 001  1         NA    5118863   21819477  NA
    # 4: 2012 276  2  0 S1 011  8 3127133583 3127133583 9003982501   0
    
    0 讨论(0)
提交回复
热议问题