How to aggregate data in R with mode (most common) value for each row?

前端 未结 3 1542
刺人心
刺人心 2020-12-21 06:34

I have a data set for example,

Data <- data.frame(
  groupname = as.factor(sample(c(\"a\", \"b\", \"c\"), 10, replace = TRUE)),
  someuser = sample(c(\"x\         


        
相关标签:
3条回答
  • 2020-12-21 06:42

    Many options. Here one using table to compute frequency and which.max to select max occurred. within data.table framework:

    library(data.table)
    setDT(Data)[,list(someuser={
      tt <- table(someuser)
      names(tt)[which.max(tt)]
    }),groupname]
    

    using plyr( nearly the same) :

    library(plyr)
    ddply(Data,.(groupname),summarize,someuser={
      tt <- table(someuser)
      names(tt)[which.max(tt)]
    })
    
    0 讨论(0)
  • 2020-12-21 06:50

    You can combine this function for finding the mode with aggregate.

    Mode <- function(x) {
      ux <- unique(x)
      ux[which.max(tabulate(match(x, ux)))]
    }
    
    aggregate(someuser ~ groupname, Data, Mode)
    
      groupname someuser
    1         a        x
    2         b        x
    3         c        x
    

    Note that in the event of a tie, it will only return the first value.

    0 讨论(0)
  • 2020-12-21 07:01

    This might work for you - using base R

    set.seed(1)
    Data <- data.frame(
      groupname = as.factor(sample(c("a", "b", "c"), 10, replace = TRUE)),
      someuser = sample(c("x", "y", "z"), 10, replace = TRUE))
    Data
       groupname someuser
    1          a        x
    2          b        x
    3          b        z
    4          c        y
    5          a        z
    6          c        y
    7          c        z
    8          b        z
    9          b        y
    10         a        z
    
    res <- lapply(split(Data, Data$groupname), function(x) 
      data.frame(groupname=x$groupname[1], someuser=names(sort(table(x$someuser),
                 decreasing=TRUE))[1]))
    
    do.call(rbind, res)
      groupname someuser
    a         a        z
    b         b        z
    c         c        y
    

    And using ddply

    sort_fn2 <- function(x) {names(sort(table(x$someuser), decreasing=TRUE))[1]}
    ddply(Data, .(groupname), .fun=sort_fn2)
      groupname V1
    1         a  z
    2         b  z
    3         c  y
    
    0 讨论(0)
提交回复
热议问题