Count distinct in a rxSummary

◇◆丶佛笑我妖孽 提交于 2019-12-12 02:54:00

问题


I want to count distinct values of var2 grouping by var1 in a .xdf file,

I tried something like this

 myFun <- function(dataList) {
    UniqueLevel <<- unique(c(UniqueLevel, dataList$var2))
    SumUniqueLevel <<- length(UniqueLevel)
    return(NULL)
    }

rxSummary(formula = ~ var1,
data = "DefModelo2.xdf",
transformFunc = myFun,
transformObjects = list(UniqueLevel = NULL),
removeZeroCounts = F)

Thank you in advance

EDIT:

Probably using RevoPemaR is the the faster way


回答1:


One other option is to use rxCrossTabs. This way you get a cross-tabulation of the two factors, and you can just count non zero entries to determine unique values by one of the factors.

censusWorkers <- file.path(rxGetOption("sampleDataDir"), "CensusWorkers.xdf")
censusXtabAge <- rxCrossTabs(~ F(age):F(wkswork1), data = censusWorkers, 
                             removeZeroCounts = FALSE, returnXtabs = TRUE)
apply(censusXtabAge != 0, MARGIN = 1, sum)



回答2:


Split by var1, and then for each group, count up the unique values of var2. This assumes that var1 and var2 are factors, if they're not you'll have to run rxFactors first.

xdflst <- rxSplit(xdf, splitByVars="var1", varsToKeep=c("var1", "var2"))

out <- rxExec(function(grp) {
        var1 <- head(grp, 1)$var1
        var2 <- rxDataStep(grp, varsToKeep="var2")$var2
        data.frame(var2, distinct=length(unique(var2)))
    },
    grp=rxElemArg(xdflst))

do.call(rbind, out)

Or you could get my dplyrXdf package and use a dplyr group_by/summarise pipeline (which basically does all the above, including converting to factors if necessary):

xdf %>% group_by(var1) %>%
    summarise(distinct=n_distinct(var2),
              .rxArgs=list(varsToKeep=c("var1", "var2")))


来源:https://stackoverflow.com/questions/36328996/count-distinct-in-a-rxsummary

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!