Weird error in R when importing (64-bit) integer with many digits

前端 未结 4 1765
孤独总比滥情好
孤独总比滥情好 2020-12-03 22:39

I am importing a csv that has a single column which contains very long integers (for example: 2121020101132507598)

a<-read.csv(\'temp.csv\',as.is=T

相关标签:
4条回答
  • 2020-12-03 22:44

    The maximum value of a 32-bit signed integer is 2,147,483,647. Your numbers are much larger.

    Try importing them as floating point values instead.

    There4 are a few caveats to be aware of when dealing with floating point arithmetic in R or any other language:

    http://blog.revolutionanalytics.com/2009/11/floatingpoint-errors-explained.html

    http://blog.revolutionanalytics.com/2009/03/when-is-a-zero-not-a-zero.html

    http://floating-point-gui.de/basic/

    0 讨论(0)
  • 2020-12-03 22:51

    As others have noted, you can't represent integers that large. But R isn't reading those values into integers, it's reading them into double precision numerics.

    Double precision can only represent numbers to ~16 places accurately, which is why you see your numbers rounded after 16 places. See the gmp, Rmpfr, and int64 packages for potential solutions. Though I don't see a function to read from a file in any of them, maybe you could cook something up by looking at their sources.

    UPDATE: Here's how you can get your file into an int64 object:

    # This assumes your numbers are the only column in the file
    # Read them in however, just ensure they're read in as character
    a <- scan("temp.csv", what="")
    ia <- as.int64(a)
    
    0 讨论(0)
  • 2020-12-03 22:59

    R's maximum intger value is about 2E9. As @Joshua mentions in another answer, one of the potential solutions is the int64 package.

    Import the values as character instead. Then convert to type int64.

    require(int64)
    a <- read.csv('temp.csv', colClasses = 'character', header=FALSE)[[1]]
    a <- as.int64(a)
    print(a)
    [1] 4031320121153001444 4113020071082679601 4073020091116779570
    [4] 2081720101128577687 4041720081087539887 4011120071074301496
    [7] 4021520051054304372 4082520061068996911 4082620101129165548
    
    0 讨论(0)
  • 2020-12-03 22:59

    You simply cannot represent integers that big. See

    .Machine
    

    which on my box has

    $integer.max
    [1] 2147483647
    
    0 讨论(0)
提交回复
热议问题