In R, how to sum certain rows of a data frame with certain logic?

前端 未结 5 1776
情书的邮戳
情书的邮戳 2021-01-06 17:33

Hi experienced R users,

It\'s kind of a simple thing. I want to sum x by Group.1 depending on one controllable variable.

I\'d like

相关标签:
5条回答
  • 2021-01-06 17:52

    If you want to sum only a subset of your data:

    my_data <- data.frame(c("TRUE","FALSE","FALSE","FALSE","TRUE"), c(1,2,3,4,5))
    names(my_data)[1] <- "DESCRIPTION" #Change Column Name
    names(my_data)[2] <- "NUMBER"      #Change Column Name
    
    sum(subset(my_data, my_data$DESCRIPTION=="TRUE")$NUMBER)
    

    You should get 6.

    0 讨论(0)
  • 2021-01-06 17:58

    Not sure why Eggs are important here ;)

    df1 <- data.frame(Gr=seq(4),
                      x=c(230299, 263066, 266504, 177196)
                      )
    

    now with n=2 i.e. first two rows:

    n <- 2
    sum(df1[, "x"][df1[, "Gr"]<=n]) 
    

    The expression [df1[, "Gr"]<=n] creates a logical vector to subset the elements in df1[, "x"] before summing them.

    Also, it appears your Group.1 is the same as the row no. If so this may be simpler:

    sum(df1[, "x"][1:n])
    

    or to get all at once

    cumsum(df1[, "x"])
    
    0 讨论(0)
  • 2021-01-06 18:02

    Assuming your data is in mydata:

    with(mydata, sum(x[Group.1 <= 2])
    
    0 讨论(0)
  • 2021-01-06 18:11

    If the sums you want are always cumulative, there's a function for that, cumsum. It works like this.

    > cumsum(c(1,2,3))
    [1] 1 3 6
    

    In this case you might want something like

    > mysum <- cumsum(yourdata$x)
    > mysum[2] # the sum of the first two rows
    > mysum[3] # the sum of the first three rows
    > mysum[number] # the sum of the first "number" rows
    
    0 讨论(0)
  • 2021-01-06 18:16

    You could use the by function.

    For instance, given the following data.frame:

    d <- data.frame(Group.1=c(1,1,2,1,3,3,1,3),Group.2=c('Eggs'),x=1:8)
    
    > d
      Group.1 Group.2 x
    1       1    Eggs 1
    2       1    Eggs 2
    3       2    Eggs 3
    4       1    Eggs 4
    5       3    Eggs 5
    6       3    Eggs 6
    7       1    Eggs 7
    8       3    Eggs 8
    

    You can do this:

    num <- 3 # sum only the first 3 rows
    
    # The aggregation function:
    # it is called for each group receiving the 
    # data.frame subset as input and returns the aggregated row
    innerFunc <- function(subDf){
      # we create the aggregated row by taking the first row of the subset
      row <- head(subDf,1)
      # we set the x column in the result row to the sum of the first "num"
      # elements of the subset
      row$x <- sum(head(subDf$x,num))
      return(row)
    }
    # Here we call the "by" function:
    # it returns an object of class "by" that is a list of the resulting
    # aggregated rows; we want to convert it to a data.frame, so we call
    # rbind repeatedly by using "do.call(rbind, ... )"
    d2 <- do.call(rbind,by(data=d,INDICES=d$Group.1,FUN=innerFunc))
    
    > d2
      Group.1 Group.2  x
    1       1    Eggs  7
    2       2    Eggs  3
    3       3    Eggs 19
    
    0 讨论(0)
提交回复
热议问题