Calculating most frequent level by category with plyr

前端 未结 2 1054
失恋的感觉
失恋的感觉 2021-01-19 23:44

I would like calculate the most frequent factor level by category with plyr using the code below. The data frame b shows the requested result. Why does

2条回答
  •  借酒劲吻你
    2021-01-20 00:29

    When you use summarise, plyr seems to "not see" the function declared in the global environment before checking for function in base:

    We can check this using Hadley's handy pryr package. You can install it by these commands:

    library(devtools)
    install_github("pryr")
    
    
    require(pryr)
    require(plyr)
    c <- ddply(a, .(cat), summarise, print(where("mode")))
    # 
    # 
    # 
    

    Basically, it doesn't read/know/see your mode function. There are two alternatives. The first is what @AnandaMahto suggested and I'd do the same and would advice you to stick with it. The other alternative is to not use summarise and call it using function(.) so that the mode function in your global environment is "seen".

    c <- ddply(a, .(cat), function(x) mode(x$levels))
    #   cat V1
    # 1   1  6
    # 2   2  5
    # 3   3  9
    

    Why does this work?

    c <- ddply(a, .(cat), function(x) print(where("mode")))
    # 
    # 
    # 
    

    Because as you see above, it reads your function that sits in the global environment.

    > mode # your function
    # function(x)
    #     names(table(x))[which.max(table(x))]
    > environment(mode) # where it sits
    # 
    

    as opposed to:

    > base::mode # base's mode function
    # function (x) 
    # {
    #     some lines of code to compute mode
    # }
    # 
    # 
    

    Here's an awesome wiki on environments from Hadley if you're interested in giving it a reading/exploring further.

提交回复
热议问题