I have a data frame as follows:
head(newStormObject)
FATALITIES INJURIES PROPVALDMG CROPVALDMG EVTYPE total
1 0 15 2.5
We can take the first value for all the other columns using slice
after updating the 'total' with the sum
of 'total'.
library(dplyr)
df1 %>%
group_by(EVTYPE) %>%
mutate(total = sum(total)) %>%
slice(1L) %>%
arrange(desc(total))
# FATALITIES INJURIES PROPVALDMG CROPVALDMG EVTYPE total
# <int> <int> <dbl> <int> <chr> <int>
#1 0 15 250000 0 TORNADO 21
#2 0 0 0 0 HAIL 12
#3 0 0 0 0 TSTM WIND 1
NOTE: The 'total' for 'EVTYPE' "HAIL" is 12 based on the example
Here is a base R solution that returns the same values (in a slightly different order)
merge(df[!duplicated(df$EVTYPE), -length(df)],
aggregate(total ~ EVTYPE, data=df, sum), by="EVTYPE")
EVTYPE FATALITIES INJURIES PROPVALDMG CROPVALDMG total
1 HAIL 0 0 0 0 12
2 TORNADO 0 15 250000 0 21
3 TSTM_WIND 0 0 0 0 1
duplicated
is used to select the first observation of each EVTYPE level, aggregate
is used to calculate the sum of the total variable. These results are merged on EVTYPE.
The rows are ordered by the order that factor
automatically stores factor variables, that is alphabetically. The columns are slightly disordered from the desired output due to merge
which puts the by variables in the front of the resulting data set. Fixing the columns is a matter of passing the names of the original data.frame.
merge(df[!duplicated(df$EVTYPE), -length(df)],
aggregate(total ~ EVTYPE, data=df, sum), by="EVTYPE")[, names(df)]
FATALITIES INJURIES PROPVALDMG CROPVALDMG EVTYPE total
1 0 0 0 0 HAIL 12
2 0 15 250000 0 TORNADO 21
3 0 0 0 0 TSTM_WIND 1