How to get group-level statistics while preserving the original dataframe?

前端 未结 3 939
不思量自难忘°
不思量自难忘° 2021-01-29 01:46

I have the following dataframe

one <- c(\'one\',NA,NA,NA,NA,\'two\',NA,NA)
group1 <- c(\'A\',\'A\',\'A\',\'A\',\'B\',\'B\',\'B\',\'B\')
group2 <- c(\'C\         


        
相关标签:
3条回答
  • 2021-01-29 02:25

    Let's not forget that a lot of things can be done in base R, although sometimes not as efficiently as data.table or dplyr:

    df$count<-ave(as.integer(df$one),df[,2:3],FUN=function(x) sum(!is.na(x)))
    #   one group1 group2 count
    #1  one      A      C     1
    #2 <NA>      A      C     1
    #3 <NA>      A      C     1
    #4 <NA>      A      D     0
    #5 <NA>      B      E     1
    #6  two      B      E     1
    #7 <NA>      B      F     0
    #8 <NA>      B      F     0
    
    0 讨论(0)
  • 2021-01-29 02:29
    library(dplyr)
    
    df %>% group_by(group1, group2) %>% mutate(count = sum(!is.na(one)))
    
    Source: local data frame [8 x 4]
    Groups: group1, group2 [4]
    
         one group1 group2 count
      <fctr> <fctr> <fctr> <int>
    1    one      A      C     1
    2     NA      A      C     1
    3     NA      A      C     1
    4     NA      A      D     0
    5     NA      B      E     1
    6    two      B      E     1
    7     NA      B      F     0
    8     NA      B      F     0
    
    0 讨论(0)
  • 2021-01-29 02:33

    with data.table:

    setDT(df)
    df[,count_B:=sum(!is.na(one)),by=c("group1","group2")]
    

    gives:

       one group1 group2 count_B
    1: one      A      C       1
    2:  NA      A      C       1
    3:  NA      A      C       1
    4:  NA      A      D       0
    5:  NA      B      E       1
    6: two      B      E       1
    7:  NA      B      F       0
    8:  NA      B      F       0
    

    The idea is to sum the true values (1 once converted to integer) where B is not NA while grouping by group1and group2.

    0 讨论(0)
提交回复
热议问题