Aggregate data in one column based on values in another column

前端 未结 4 1706
轮回少年
轮回少年 2021-02-14 00:10

I know there is an easy way to do this...but, I can\'t figure it out.

I have a dataframe in my R script that looks something like this:

A      B    C
1.2         


        
相关标签:
4条回答
  • 2021-02-14 00:12

    Here is a solution using the plyr package

    plyr::ddply(df, .(A), summarize, num = length(A), totalB = sum(B))
    
    0 讨论(0)
  • 2021-02-14 00:32

    In dplyr:

    library(tidyverse)
    A <- c(1.2, 2.3, 2.3, 1.2, 3.4, 1.2)
    B <- c(4, 4, 6, 3, 2, 5)
    C <- c(8, 9, 0, 3, 1, 1)
    
    df <- data_frame(A, B, C)
    
    df %>%
        group_by(A) %>% 
        summarise(num = n(),
                  totalB = sum(B))
    
    0 讨论(0)
  • 2021-02-14 00:37

    I'd use aggregate to get the two aggregates and then merge them into a single data frame:

    > df
        A B C
    1 1.2 4 8
    2 2.3 4 9
    3 2.3 6 0
    4 1.2 3 3
    5 3.4 2 1
    6 1.2 5 1
    
    > num <- aggregate(B~A,df,length)
    > names(num)[2] <- 'num'
    
    > totalB <- aggregate(B~A,df,sum)
    > names(totalB)[2] <- 'totalB'
    
    > merge(num,totalB)
        A num totalB
    1 1.2   3     12
    2 2.3   2     10
    3 3.4   1      2
    
    0 讨论(0)
  • 2021-02-14 00:38

    Here is a solution using data.table for memory and time efficiency

    library(data.table)
    DT <- as.data.table(df)
    DT[, list(totalB = sum(B), num = .N), by = A]
    

    To subset only rows where C==1 (as per the comment to @aix answer)

    DT[C==1, list(totalB = sum(B), num = .N), by = A]
    
    0 讨论(0)
提交回复
热议问题