Summarising by a group variable in r

后端 未结 2 1891
广开言路
广开言路 2021-01-27 05:22

I have a data frame as follows:

 head(newStormObject)
     FATALITIES   INJURIES    PROPVALDMG CROPVALDMG      EVTYPE     total
 1           0          15    2.5         


        
相关标签:
2条回答
  • 2021-01-27 05:45

    We can take the first value for all the other columns using slice after updating the 'total' with the sum of 'total'.

    library(dplyr)
    df1 %>% 
       group_by(EVTYPE) %>% 
       mutate(total = sum(total)) %>%
       slice(1L) %>%
       arrange(desc(total))
    #      FATALITIES INJURIES PROPVALDMG CROPVALDMG    EVTYPE total
    #       <int>    <int>      <dbl>      <int>     <chr> <int>
    #1          0       15     250000          0   TORNADO    21
    #2          0        0          0          0      HAIL    12
    #3          0        0          0          0 TSTM WIND     1
    

    NOTE: The 'total' for 'EVTYPE' "HAIL" is 12 based on the example

    0 讨论(0)
  • 2021-01-27 06:02

    Here is a base R solution that returns the same values (in a slightly different order)

    merge(df[!duplicated(df$EVTYPE), -length(df)],
             aggregate(total ~ EVTYPE, data=df, sum), by="EVTYPE")
         EVTYPE FATALITIES INJURIES PROPVALDMG CROPVALDMG total
    1      HAIL          0        0          0          0    12
    2   TORNADO          0       15     250000          0    21
    3 TSTM_WIND          0        0          0          0     1
    

    duplicated is used to select the first observation of each EVTYPE level, aggregate is used to calculate the sum of the total variable. These results are merged on EVTYPE.

    The rows are ordered by the order that factor automatically stores factor variables, that is alphabetically. The columns are slightly disordered from the desired output due to merge which puts the by variables in the front of the resulting data set. Fixing the columns is a matter of passing the names of the original data.frame.

    merge(df[!duplicated(df$EVTYPE), -length(df)],
          aggregate(total ~ EVTYPE, data=df, sum), by="EVTYPE")[, names(df)]
      FATALITIES INJURIES PROPVALDMG CROPVALDMG    EVTYPE total
    1          0        0          0          0      HAIL    12
    2          0       15     250000          0   TORNADO    21
    3          0        0          0          0 TSTM_WIND     1
    
    0 讨论(0)
提交回复
热议问题