问题
Given a data frame I am trying to cast from long-to-wide using the dcast.data.table
function from library(data.table)
. However, when using large numeric's on the left side of the formula it some how combines.
Below is an example:
df <- structure(list(A = c(10000000007624, 10000000007619, 10000000007745,
10000000007624, 10000000007767, 10000000007729, 10000000007705,
10000000007711, 10000000007784, 10000000007745, 10000000007624,
10000000007762, 10000000007762, 10000000007631, 10000000007762,
10000000007619, 10000000007628, 10000000007705, 10000000007762,
10000000007624, 10000000007745, 10000000007706, 10000000007767,
10000000007777, 10000000007624, 10000000007745, 10000000007624,
10000000007777, 10000000007771, 10000000007631, 10000000007624,
10000000007640, 10000000007642, 10000000007708, 10000000007711,
10000000007745, 10000000007767, 10000000007655, 10000000007722,
10000000007745, 10000000007762, 10000000007771, 10000000007617
), B = c(4060697L, 7683673L, 7699192L, 1322422L, 7754939L, 7448486L,
2188027L, 1061376L, 2095950L, 7793530L, 2095950L, 6447861L, 2188027L,
7448451L, 7428427L, 7516354L, 7067801L, 2095950L, 6740142L, 405911L,
4057215L, 1061345L, 7754945L, 7501748L, 2188027L, 7780980L, 6651988L,
6649330L, 6655118L, 6556367L, 6463510L, 2347462L, 7675114L, 6556361L,
1061345L, 7224099L, 6463515L, 2188027L, 6463515L, 7311234L, 7764971L,
7224099L, 2347479L), C = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L,
3L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 25L, 2L, 1L, 2L,
1L, 1L, 1L)), .Names = c("A", "B", "C"), row.names = c(NA, -43L
), class = "data.frame")
df <- as.data.table(df)
output <- dcast.data.table(df, A ~ B, value.var = "C",
fun.aggregate = sum, fill = NA)
This will produce only 2 rows, 10000000007624
& 10000000007784
and everything will be summed up in just those two.
This error does not occur when using reshape2::dcast
function, this method produces the correct result.
Is there a reason why dcast.data.table
is producing this error?
回答1:
Issue was raised on github and responded by @jangorecki and this answer comes from the setNumericRounding
help document.
when joining or grouping, data.table rounds such data to apx 11 s.f. which is plenty of digits for many cases. This is achieved by rounding the last 2 bytes off the significand.
As such my 14 digit large numeric's where getting rounded and therefore combined.
As @jangorecki mentions this can be avoided by setting setNumericRounding(0)
. However, I personally have re-classified my large numeric's to factors. This make more sense for my particular use case.
Further to this @jangorecki also advises use of bit64
package when dealing with large numeric's.
The original post on github.
来源:https://stackoverflow.com/questions/37941867/error-with-large-numerics-in-dcast-data-table