问题
I have a set of data (cost & distance) I want to aggregate those ns classes depending on the distance and find the sum of the cost for the aggregated data.
Here are some example tables.
Nam Cost distance
1 1005 10
2 52505 52
3 51421 21
4 651 10
5 656 0
6 5448 1
Classes
Class From To
1 0 5
2 5 15
3 15 100
Result
Class Sum
1 6104
2 1656
3 103926
I am doing this but it takes a lot of time to process. I sure that there is a better way to do it
for (i in 1:6)
{
for (j in 1:3)
{
if((Table_numbers[i,3]<=classes[j,2])& (Table_numbers[i,3]<classes[j,3]))
{
result_table[j,2]<-result_table[j,2]+ Table_numbers [i,2]
}
}
}
I used classIntervals as well but for each class I am getting the counts of the distance, but I need the sum of the cost.
I try to use group_by as well but i don't know if i can use classes for grouping.
Do you have any idea how I can do that more efficient?
回答1:
Here's a simple base solution combining findInterval
and tapply
tapply(Table$Cost, findInterval(Table$distance, c(0, Classes$To)), sum)
# 1 2 3
# 6104 1656 103926
If Classes names may differ (not just a counter), you could modify to
tapply(Table$Cost, Classes$Class[findInterval(Table$distance, c(0, Classes$To))], sum)
回答2:
Here is a solution with cut
to produces classes and dplyr::group_by
to sum by group:
library(dplyr)
mutate(df,class=cut(distance,c(0,5,15,100),include.lowest = TRUE)) %>%
group_by(class) %>%
summarize(sum=sum(Cost))
data
df <- read.table(text="Nam Cost distance
1 1005 10
2 52505 52
3 51421 21
4 651 10
5 656 0
6 5448 1",head=TRUE)
来源:https://stackoverflow.com/questions/34653577/create-class-intervals-in-r-and-sum-values