问题
Consider the following dataframe:
df <- data.frame(numeric=c(1,2,3,4,5,6,7,8,9,10), string=c("a", "a", "b", "b", "c", "d", "d", "e", "d", "f"))
print(df)
numeric string
1 1 a
2 2 a
3 3 b
4 4 b
5 5 c
6 6 d
7 7 d
8 8 e
9 9 d
10 10 f
It has a numeric variable and a string variable. Now, I would like to create another dataframe in which the string variable displays only the list of unique values "a", "b", "c", "d", "e", "f", and the numeric variable is the result of the sum of the numeric valuesin the previous dataframe, resulting in this data frame:
print(new_df)
numeric string
1 3 a
2 7 b
3 5 c
4 22 d
5 8 e
6 10 f
This can be done using a for loop, but it would be rather inefficient in large datasets, and I would prefer other options. I have tried using dplyr
package, but I did not get the expected result:
library(dplyr)
> df %>% group_by(string) %>% summarize(result = sum(numeric))
result
1 55
回答1:
It could be an issue of masking function from plyr
(summarise/mutate
functions are also there in plyr
). We can explicitly specify the summarise
from dplyr
library(dplyr)
df %>%
group_by(string) %>%
dplyr::summarise(numeric = sum(numeric))
回答2:
You can do this without loading any extra packages using tapply
or aggregate
.
来源:https://stackoverflow.com/questions/56028131/how-to-sum-the-values-of-a-numeric-variable-based-on-a-string-variable