Create a variable capturing the most frequent occurence by group

人走茶凉 提交于 2019-11-27 03:32:08

问题


Define:

df1 <-data.frame(
id=c(rep(1,3),rep(2,3)),
v1=as.character(c("a","b","b",rep("c",3)))
)

s.t.

> df1
  id v1
1  1  a
2  1  b
3  1  b
4  2  c
5  2  c
6  2  c

I want to create a third variable freq that contains the most frequent observation in v1 by id s.t.

> df2
  id v1 freq
1  1  a    b
2  1  b    b
3  1  b    b
4  2  c    c
5  2  c    c
6  2  c    c

回答1:


You can do this using ddply and a custom function to pick out the most frequent value:

myFun <- function(x){
    tbl <- table(x$v1)
    x$freq <- rep(names(tbl)[which.max(tbl)],nrow(x))
    x
}

ddply(df1,.(id),.fun=myFun)

Note that which.max will return the first occurrence of the maximum value, in the case of ties. See ??which.is.max in the nnet package for an option that breaks ties randomly.




回答2:


mode <- function(x) names(table(x))[ which.max(table(x)) ]
df1$freq <- ave(df1$v1, df1$id, FUN=mode)
> df1
  id v1 freq
1  1  a    b
2  1  b    b
3  1  b    b
4  2  c    c
5  2  c    c
6  2  c    c



回答3:


Another way consists of using tidyverse functions:

  • grouping first, using group_by(), and counting the occurrence of the second variable using tally()
  • arranging by the number of occurrences with arrange()
  • summarizing and picking out the first row with summarize() and first()

Therefore:

df1 %>%
group_by(id, v1) %>%
tally() %>%
arrange(id, desc(n)) %>%
summarize(freq = first(v1))

This will give you just the mapping (which I find cleaner):

# A tibble: 2 x 2
     id   freq
  <dbl> <fctr>
1     1      b
2     2      c

You can then left_join your original data frame with that table.



来源:https://stackoverflow.com/questions/6513378/create-a-variable-capturing-the-most-frequent-occurence-by-group

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!