问题
I want sort the values of one data.frame column into predetermined bins, and then sum values that are in the same rows, but a different column.What I'm trying to do is sort dataframe column items into bins based on one value and then get a sum of a second value attached to the items for all of the items in the bin. Can someone help me?
My data looks like this
df =
Item valueX valueY
A 169849631 0.9086560
B 27612064 0.9298379
C 196651878 1.6516654
D 33007984 1.3397873
E 23019448 -0.2954385
F 54779712 -1.6888178
My bins looks like this
Bins=
start end
1 249982
249983 499963
499964 749945
749946 999926
999927 1249907
1249908 1499889
What I want a data frame that looks like this (hypothetical values in frequency and sumvalueY columns)
resultsdf=
binstart binend frequency sumvalueY
1 249982 0 0
249983 499963 5 200
499964 749945 6 400
749946 999926 0 0
999927 1249907 12 30
1249908 1499889 0 0
Here is my code (current iteration)
Start = Bins[,1]
End = Bins[,2]
myfunction <- function(Start,End) {
sum(df$valueX >= Start & df$valueX < End, df[,2])}
Binssorted = mapply(myfunction, Start,End)
回答1:
There are a number of ways to do this. Here's one using the dplyr
package. I've created some fake data for illustration.
library(dplyr)
# Fake data
set.seed(5) # For reproducibility
dat = data.frame(valueX = runif(1000, 1, 2e6), valueY = rnorm(1000))
Now we'll bin the data and summarise it using the pipe operator %>%
which allows us to chain functions one after the other, so we can perform all the operations in one go.
dat %>%
# Bin the data
mutate(bins = cut(valueX, seq(0, 2e6, 250000))) %>%
# Group data into the bins we just created
group_by(bins) %>%
# Count the number of rows in each bin and sum all the Y values in each bin
summarise(freq = n(),
sumY = sum(valueY))
bins freq sumY
1 (0,2.5e+05] 127 8.404150
2 (2.5e+05,5e+05] 127 14.988207
3 (5e+05,7.5e+05] 121 10.750580
4 (7.5e+05,1e+06] 134 -28.725949
5 (1e+06,1.25e+06] 106 17.377665
6 (1.25e+06,1.5e+06] 126 14.340313
7 (1.5e+06,1.75e+06] 119 -4.241991
8 (1.75e+06,2e+06] 140 9.312233
来源:https://stackoverflow.com/questions/28844070/how-collect-additional-row-data-on-binned-data-in-r