How to get column mean for specific rows only?

前端 未结 3 2066
不知归路
不知归路 2020-12-30 09:05

I need to get the mean of one column (here: score) for specific rows (here: years). Specifically, I would like to know the average score for three periods:

  • per
相关标签:
3条回答
  • 2020-12-30 09:14

    If the rows are ordered by year, I think the easiest way to accomplish this would be:

    m80_83 <- mean(dataframe[1:4,3]) #Finds the mean of the values of column 3 for rows 1 through 4
    m84_90 <- mean(dataframe[5:10,3])
    #etc.
    

    If the rows are not ordered by year, I would use tapply like this.

    list.of.means <- c(tapply(dataframe$score, cut(dataframe$year, c(0,1983.5, 1990.5, 3000)), mean)
    

    Here, tapply takes three parameters:

    First, the data you want to do stuff with (in this case, datafram$score).

    Second, a function that cuts that data up into groups. In this case, it will cut the data into three groups based on the dataframe$year values. Group 1 will include all rows with dataframe$year values from 0 to 1983.5, Group 2 will include all rows with dataframe$year values from 1983.5 to 1990.5, and Group 3 will include all rows with dataframe$year values from 1983.5 to 3000.

    Third, a function that is applied to each group. This function will apply to the data you selected as your first parameter.

    So, list.of.means should be a list of the 3 values you are looking for.

    0 讨论(0)
  • 2020-12-30 09:28
    datfrm$mean <-
      with (datfrm, ave( score, findInterval(year, c(-Inf, 1984, 1991, Inf)), FUN= mean) )
    

    The title question is a bit different than the real question and would be answered by using logical indexing. If one wanted only the mean for a particular subset say year >= 1984 & year <= 1990 it would be done via:

    mn84_90 <- with(datfrm, mean(score[year >= 1984 & year <= 1990]) )
    
    0 讨论(0)
  • 2020-12-30 09:30

    Since findInterval requires year to be sorted (as it is in your example) I'd be tempted to use cut in case it isn't sorted [proved wrong, thanks @DWin]. For completeness the data.table equivalent (scales for large data) is :

    require(data.table)
    DT = as.data.table(DF)   # or just start with a data.table in the first place
    
    DT[, mean:=mean(score), by=cut(year,c(-Inf,1984,1991,Inf))]
    

    or findInterval is likely faster as DWin used :

    DT[, mean:=mean(score), by=findInterval(year,c(-Inf,1984,1991,Inf))]
    
    0 讨论(0)
提交回复
热议问题