In R, how to sum certain rows of a data frame with certain logic?

前端未结

关注

 5  1780

情书的邮戳

Hi experienced R users,

It\'s kind of a simple thing. I want to sum x by Group.1 depending on one controllable variable.

I\'d like

相关标签:

5条回答

一个人的身影

2021-01-06 17:52

If you want to sum only a subset of your data:

my_data <- data.frame(c("TRUE","FALSE","FALSE","FALSE","TRUE"), c(1,2,3,4,5))
names(my_data)[1] <- "DESCRIPTION" #Change Column Name
names(my_data)[2] <- "NUMBER"      #Change Column Name

sum(subset(my_data, my_data$DESCRIPTION=="TRUE")$NUMBER)

You should get 6.

0 讨论(0)

情深已故

2021-01-06 17:58
Not sure why Eggs are important here ;)
```
df1 <- data.frame(Gr=seq(4),
                  x=c(230299, 263066, 266504, 177196)
                  )
```
now with n=2 i.e. first two rows:
```
n <- 2
sum(df1[, "x"][df1[, "Gr"]<=n]) 
```
The expression [df1[, "Gr"]<=n] creates a logical vector to subset the elements in df1[, "x"] before summing them.

Also, it appears your Group.1 is the same as the row no. If so this may be simpler:
```
sum(df1[, "x"][1:n])
```
or to get all at once
```
cumsum(df1[, "x"])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2021-01-06 18:02
Assuming your data is in mydata:
```
with(mydata, sum(x[Group.1 <= 2])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦毁少年i

2021-01-06 18:11
If the sums you want are always cumulative, there's a function for that, cumsum. It works like this.
```
> cumsum(c(1,2,3))
[1] 1 3 6
```
In this case you might want something like
```
> mysum <- cumsum(yourdata$x)
> mysum[2] # the sum of the first two rows
> mysum[3] # the sum of the first three rows
> mysum[number] # the sum of the first "number" rows
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

渐次进展

2021-01-06 18:16

You could use the by function.

For instance, given the following data.frame:

d <- data.frame(Group.1=c(1,1,2,1,3,3,1,3),Group.2=c('Eggs'),x=1:8)

> d
  Group.1 Group.2 x
1       1    Eggs 1
2       1    Eggs 2
3       2    Eggs 3
4       1    Eggs 4
5       3    Eggs 5
6       3    Eggs 6
7       1    Eggs 7
8       3    Eggs 8

You can do this:

num <- 3 # sum only the first 3 rows

# The aggregation function:
# it is called for each group receiving the 
# data.frame subset as input and returns the aggregated row
innerFunc <- function(subDf){
  # we create the aggregated row by taking the first row of the subset
  row <- head(subDf,1)
  # we set the x column in the result row to the sum of the first "num"
  # elements of the subset
  row$x <- sum(head(subDf$x,num))
  return(row)
}
# Here we call the "by" function:
# it returns an object of class "by" that is a list of the resulting
# aggregated rows; we want to convert it to a data.frame, so we call
# rbind repeatedly by using "do.call(rbind, ... )"
d2 <- do.call(rbind,by(data=d,INDICES=d$Group.1,FUN=innerFunc))

> d2
  Group.1 Group.2  x
1       1    Eggs  7
2       2    Eggs  3
3       3    Eggs 19

0 讨论(0)