I would like calculate the most frequent factor level by category with plyr using the code below. The data frame b
shows the requested result. Why does
When you use summarise
, plyr
seems to "not see" the function declared in the global environment before checking for function in base
:
We can check this using Hadley's handy pryr
package. You can install it by these commands:
library(devtools)
install_github("pryr")
require(pryr)
require(plyr)
c <- ddply(a, .(cat), summarise, print(where("mode")))
#
#
#
Basically, it doesn't read/know/see your mode
function. There are two alternatives. The first is what @AnandaMahto suggested and I'd do the same and would advice you to stick with it. The other alternative is to not use summarise
and call it using function(.)
so that the mode
function in your global environment is "seen".
c <- ddply(a, .(cat), function(x) mode(x$levels))
# cat V1
# 1 1 6
# 2 2 5
# 3 3 9
Why does this work?
c <- ddply(a, .(cat), function(x) print(where("mode")))
#
#
#
Because as you see above, it reads your function that sits in the global environment
.
> mode # your function
# function(x)
# names(table(x))[which.max(table(x))]
> environment(mode) # where it sits
#
as opposed to:
> base::mode # base's mode function
# function (x)
# {
# some lines of code to compute mode
# }
#
#
Here's an awesome wiki on environments
from Hadley if you're interested in giving it a reading/exploring further.