Splitting a data.table with the by-operator: functions that return numeric values and/or NAs fail

后端未结

关注

 1  1651

遇见更好的自我 2021-01-17 13:47

I have a data.table with two columns: one ID column and one value column. I want to split up the table by the ID column a

1条回答

说谎 (楼主)

2021-01-17 13:55

You can fix this by specifying that your function should return an NA_real_, rather than an NA of the default type.

foo2 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA)}}
DT[, foo2(value), by=ID] #Throws error
# Error in `[.data.table`(DT, , foo2(value), by = ID) : 
# columns of j don't evaluate to consistent types for each group: 
# result for group 2 has column 1 type 'logical' but expecting type 'numeric'

foo3 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA_real_)}}
DT[, foo3(value), by=ID] #Works
#      ID V1
# [1,]  A  1
# [2,]  B NA

Incidentally the message that foo2() gives when it fails is nicely informative. It essentially tells you that your NA is of the wrong type. To fix the problem, you just need to look for the NA constant of the right type (or class):

NAs <- list(NA, NA_integer_, NA_real_, NA_character_, NA_complex_)
data.frame(contantName = sapply(NAs, deparse), 
           class       = sapply(NAs, class),
           type        = sapply(NAs, typeof))

#     contantName     class      type
# 1            NA   logical   logical
# 2   NA_integer_   integer   integer
# 3      NA_real_   numeric    double
# 4 NA_character_ character character
# 5   NA_complex_   complex   complex

0 讨论(0)