Summarising by a group variable in r

十年热恋 提交于 2021-02-17 06:31:27

问题


I have a data frame as follows:

 head(newStormObject)
     FATALITIES   INJURIES    PROPVALDMG CROPVALDMG      EVTYPE     total
 1           0          15    2.5e+05          0        TORNADO        15
 2           0           0    2.5e+04          0        TORNADO         0
 3           0           3    2.5e+07          0        TORNADO         3 
 4           0           3    2.5e+07          0        TORNADO         3
 5           0           0    0.0e+00          0      TSTM WIND         1
 6           0           0    0.0e+00          0           HAIL         2
 7           0           0    0.0e+00          0           HAIL         3
 8           0           0    0.0e+00          0      TSTM WIND         0
 9           0           0    0.0e+00          0           HAIL         0
10           0           0    0.0e+00          0      TSTM WIND         0
11           0           0    0.0e+00          0      TSTM WIND         0
12           0           0    0.0e+00          0           HAIL         1
13           0           0    0.0e+00          0           HAIL         1
14           0           0    0.0e+00          0           HAIL         5
15           0           0    0.0e+00          0      TSTM WIND         0

What I am attempting to do is group by the event type (EVTYPE) and sum the totals column accordingly so printing the data frame would look as follows:

       FATALITIES   INJURIES  PROPVALDMG CROPVALDMG      EVTYPE     total
 1           0          15    2.5e+05          0        TORNADO       21
 2           0           0    0.0e+00          0           HAIL       11
 3           0           0    0.0e+00          0      TSTM WIND        0

To try to do this, I wrote the following

newStormObject %>% group_by(EVTYPE, total) %>% summarise(EVTYPE, sum(total))

but I got an error saying 'Error: cannot modify grouping variable'.

The first two statements in the 'pipe statement' appear to work fine but just gives the output according to the first block, so the error seems to come from the 'summarise' statement.

Any suggestion to solve this would be appreciated.


回答1:


We can take the first value for all the other columns using slice after updating the 'total' with the sum of 'total'.

library(dplyr)
df1 %>% 
   group_by(EVTYPE) %>% 
   mutate(total = sum(total)) %>%
   slice(1L) %>%
   arrange(desc(total))
#      FATALITIES INJURIES PROPVALDMG CROPVALDMG    EVTYPE total
#       <int>    <int>      <dbl>      <int>     <chr> <int>
#1          0       15     250000          0   TORNADO    21
#2          0        0          0          0      HAIL    12
#3          0        0          0          0 TSTM WIND     1

NOTE: The 'total' for 'EVTYPE' "HAIL" is 12 based on the example




回答2:


Here is a base R solution that returns the same values (in a slightly different order)

merge(df[!duplicated(df$EVTYPE), -length(df)],
         aggregate(total ~ EVTYPE, data=df, sum), by="EVTYPE")
     EVTYPE FATALITIES INJURIES PROPVALDMG CROPVALDMG total
1      HAIL          0        0          0          0    12
2   TORNADO          0       15     250000          0    21
3 TSTM_WIND          0        0          0          0     1

duplicated is used to select the first observation of each EVTYPE level, aggregate is used to calculate the sum of the total variable. These results are merged on EVTYPE.

The rows are ordered by the order that factor automatically stores factor variables, that is alphabetically. The columns are slightly disordered from the desired output due to merge which puts the by variables in the front of the resulting data set. Fixing the columns is a matter of passing the names of the original data.frame.

merge(df[!duplicated(df$EVTYPE), -length(df)],
      aggregate(total ~ EVTYPE, data=df, sum), by="EVTYPE")[, names(df)]
  FATALITIES INJURIES PROPVALDMG CROPVALDMG    EVTYPE total
1          0        0          0          0      HAIL    12
2          0       15     250000          0   TORNADO    21
3          0        0          0          0 TSTM_WIND     1


来源:https://stackoverflow.com/questions/41197741/summarising-by-a-group-variable-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!