问题
I have a dataset in R of student weekly allowances by class, which looks like:
Year ID Class Allowance
2013 123 Freshman 100
2013 234 Freshman 110
2013 345 Sophomore 150
2013 456 Sophomore 200
2013 567 Junior 250
2014 678 Junior 100
2014 789 Junior 230
2014 890 Freshman 110
2014 891 Freshman 250
2014 892 Sophomore 220
How can I summarize the results by group (Year/Class) to get sum and % (by group)? Getting sum seems easy with ddply
by just couldn't get the % by group part right.
It works for sum
:
summary <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance))
But it doesn't work for the percentage by group part:
summary <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance),
Allow_Pct=Allowance/sum(Allowance))
Ideal result should look like:
Year Class Sum_Allow Allow_Pct
2013 Freshman 210 26%
2013 Junior 250 31%
2013 Sophomore 350 43%
2014 Freshman 360 40%
2014 Junior 330 36%
2014 Sophomore 220 24%
I tried ddply from the plyr package, but please let me know of any way that this may work.
回答1:
You could do this in two steps
my_data <- read.table(header = TRUE,
text = "Year ID Class Allowance
2013 123 Freshman 100
2013 234 Freshman 110
2013 345 Sophomore 150
2013 456 Sophomore 200
2013 567 Junior 250
2014 678 Junior 100
2014 789 Junior 230
2014 890 Freshman 110
2014 891 Freshman 250
2014 892 Sophomore 220")
library(plyr)
(summ <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance)))
# Year Class Sum_Allow
# 1 2013 Freshman 210
# 2 2013 Junior 250
# 3 2013 Sophomore 350
# 4 2014 Freshman 360
# 5 2014 Junior 330
# 6 2014 Sophomore 220
ddply(summ, .(Year), mutate, Allow_pct = Sum_Allow / sum(Sum_Allow) * 100)
# Year Class Sum_Allow Allow_pct
# 1 2013 Freshman 210 25.92593
# 2 2013 Junior 250 30.86420
# 3 2013 Sophomore 350 43.20988
# 4 2014 Freshman 360 39.56044
# 5 2014 Junior 330 36.26374
# 6 2014 Sophomore 220 24.17582
I don't know if it happens for the rest of you, but when I run the original attempt, R crashes rather than throwing a warning. Or if I misspell Allow instead of allow, it crashes. I really hate that; hadley pls fix
base r forever
回答2:
Here's a possible solution using data.table
package (assuming your data called df
)
library(data.table)
setDT(df)[, list(Sum_Allow = sum(Allowance)), keyby = list(Year, Class)][,
Allow_Pct := paste0(round(Sum_Allow/sum(Sum_Allow), 2)*100, "%"), by = Year][]
# Year Class Sum_Allow Allow_Pct
# 1: 2013 Freshman 210 26%
# 2: 2013 Junior 250 31%
# 3: 2013 Sophomore 350 43%
# 4: 2014 Freshman 360 40%
# 5: 2014 Junior 330 36%
# 6: 2014 Sophomore 220 24%
Contributed to @rawr, here's a possible base R solution
df2 <- aggregate(Allowance ~ Class + Year, df, sum)
transform(df2, Allow_pct = ave(Allowance, Year, FUN = function(x) paste0(round(x/sum(x), 2)*100, "%")))
# Class Year Allowance Allow_pct
# 1 Freshman 2013 210 26%
# 2 Junior 2013 250 31%
# 3 Sophomore 2013 350 43%
# 4 Freshman 2014 360 40%
# 5 Junior 2014 330 36%
# 6 Sophomore 2014 220 24%
回答3:
So assuming what you want is:
- get the sum of the Allowance column in all groups defined by Year AND Class, and
- take that sum and divide it by the overall sum in the relevant year
Then this could do the trick in dplyr:
library(dplyr)
my_data <- read.table(header = TRUE,
text =
'Year ID Class Allowance
2013 123 Freshman 100
2013 234 Freshman 110
2013 345 Sophomore 150
2013 456 Sophomore 200
2013 567 Junior 250
2014 678 Junior 100
2014 789 Junior 230
2014 890 Freshman 110
2014 891 Freshman 250
2014 892 Sophomore 220')
summary <- my_data %>%
group_by(Year) %>%
summarise(Year_Sum_Allow = sum(Allowance)) %>%
left_join(x = my_data, y = ., by = 'Year') %>%
group_by(Year, Class) %>%
summarise(Sum_Allow = sum(Allowance),
Allow_Pct = Sum_Allow/first(Year_Sum_Allow))
summary
# Results
Source: local data frame [6 x 4]
Groups: Year
Year Class Sum_Allow Allow_Pct
1 2013 Freshman 210 0.2592593
2 2013 Junior 250 0.3086420
3 2013 Sophomore 350 0.4320988
4 2014 Freshman 360 0.3956044
5 2014 Junior 330 0.3626374
6 2014 Sophomore 220 0.2417582
If you're not familiar with dplyr the syntax might seem strange. I suggest taking a look at the introduction. It's a big time-saver.
Edit: I should have added that, if you want the pretty rounded percent formatting in your example output, you can instead substitute Allow_Pct = paste0(round(Sum_Allow/first(Year_Sum_Allow), 2), '%')
in the last line.
Edit 2: As jbaums points out, this could be simplified to:
my_data %>%
group_by(Year, Class) %>%
summarise(sum_allow=sum(Allowance)) %>%
mutate(pct_allow=sum_allow/sum(sum_allow))
来源:https://stackoverflow.com/questions/26617843/aggregation-and-percentage-calculation-by-groups