问题
I have a data frame as follows:
head(newStormObject)
FATALITIES INJURIES PROPVALDMG CROPVALDMG EVTYPE total
1 0 15 2.5e+05 0 TORNADO 15
2 0 0 2.5e+04 0 TORNADO 0
3 0 3 2.5e+07 0 TORNADO 3
4 0 3 2.5e+07 0 TORNADO 3
5 0 0 0.0e+00 0 TSTM WIND 1
6 0 0 0.0e+00 0 HAIL 2
7 0 0 0.0e+00 0 HAIL 3
8 0 0 0.0e+00 0 TSTM WIND 0
9 0 0 0.0e+00 0 HAIL 0
10 0 0 0.0e+00 0 TSTM WIND 0
11 0 0 0.0e+00 0 TSTM WIND 0
12 0 0 0.0e+00 0 HAIL 1
13 0 0 0.0e+00 0 HAIL 1
14 0 0 0.0e+00 0 HAIL 5
15 0 0 0.0e+00 0 TSTM WIND 0
What I am attempting to do is group by the event type (EVTYPE) and sum the totals column accordingly so printing the data frame would look as follows:
FATALITIES INJURIES PROPVALDMG CROPVALDMG EVTYPE total
1 0 15 2.5e+05 0 TORNADO 21
2 0 0 0.0e+00 0 HAIL 11
3 0 0 0.0e+00 0 TSTM WIND 0
To try to do this, I wrote the following
newStormObject %>% group_by(EVTYPE, total) %>% summarise(EVTYPE, sum(total))
but I got an error saying 'Error: cannot modify grouping variable'.
The first two statements in the 'pipe statement' appear to work fine but just gives the output according to the first block, so the error seems to come from the 'summarise' statement.
Any suggestion to solve this would be appreciated.
回答1:
We can take the first value for all the other columns using slice
after updating the 'total' with the sum
of 'total'.
library(dplyr)
df1 %>%
group_by(EVTYPE) %>%
mutate(total = sum(total)) %>%
slice(1L) %>%
arrange(desc(total))
# FATALITIES INJURIES PROPVALDMG CROPVALDMG EVTYPE total
# <int> <int> <dbl> <int> <chr> <int>
#1 0 15 250000 0 TORNADO 21
#2 0 0 0 0 HAIL 12
#3 0 0 0 0 TSTM WIND 1
NOTE: The 'total' for 'EVTYPE' "HAIL" is 12 based on the example
回答2:
Here is a base R solution that returns the same values (in a slightly different order)
merge(df[!duplicated(df$EVTYPE), -length(df)],
aggregate(total ~ EVTYPE, data=df, sum), by="EVTYPE")
EVTYPE FATALITIES INJURIES PROPVALDMG CROPVALDMG total
1 HAIL 0 0 0 0 12
2 TORNADO 0 15 250000 0 21
3 TSTM_WIND 0 0 0 0 1
duplicated
is used to select the first observation of each EVTYPE level, aggregate
is used to calculate the sum of the total variable. These results are merged on EVTYPE.
The rows are ordered by the order that factor
automatically stores factor variables, that is alphabetically. The columns are slightly disordered from the desired output due to merge
which puts the by variables in the front of the resulting data set. Fixing the columns is a matter of passing the names of the original data.frame.
merge(df[!duplicated(df$EVTYPE), -length(df)],
aggregate(total ~ EVTYPE, data=df, sum), by="EVTYPE")[, names(df)]
FATALITIES INJURIES PROPVALDMG CROPVALDMG EVTYPE total
1 0 0 0 0 HAIL 12
2 0 15 250000 0 TORNADO 21
3 0 0 0 0 TSTM_WIND 1
来源:https://stackoverflow.com/questions/41197741/summarising-by-a-group-variable-in-r