dplyr: Use a custom function in summarize() after group_by()

后端 未结 1 1669
深忆病人
深忆病人 2021-01-17 22:07

How can we use a custom function after group_by()? I checked similar posts (1, 2, and 3), but my current code returns the same values for all groups.

         


        
1条回答
  •  挽巷
    挽巷 (楼主)
    2021-01-17 22:16

    It's easier to understand if you start by writing it without an extra function. In that case it would be:

    df %>%
      group_by(village) %>%
      summarize(Y_village = mean(Y[Z == z]))
    
    ## A tibble: 2 x 2
    #  village Y_village
    #         
    #1 a            450.
    #2 b            700.
    

    Hence, your function should be something like

    Y_hat_village <- function(Ycol, Zcol, z){
      mean(Ycol[Zcol == z])
    }
    

    And then using it:

    df %>%
      group_by(village) %>%
      summarize(Y_village = Y_hat_village(Y, Z, z))
    

    Note that the function I wrote only deals with atomic vectors which you can supply directly from within summarise. You don't need to supply the whole data.frame into it.

    0 讨论(0)
提交回复
热议问题