How to get the maximum value by group

前端 未结 5 961
太阳男子
太阳男子 2020-11-27 07:00

I have a data.frame with two columns: year and score. The years go from 2000-2012 and each year can be listed multiple times. In the s

相关标签:
5条回答
  • 2020-11-27 07:31
    data <- data.frame(year = c(2000, 2001, 2000), score = c(18, 22, 21))
    new.year <- unique(data$year)
    new.score <- sapply(new.year, function(y) max(data[data$year == y, ]$score))
    data <- data.frame(year = new.year, score = new.score)
    
    0 讨论(0)
  • 2020-11-27 07:37

    If you know sql this is easier to understand

    library(sqldf)
    sqldf('select year, max(score) from mydata group by year')
    

    Update (2016-01): Now you can also use dplyr

    library(dplyr)
    mydata %>% group_by(year) %>% summarise(max = max(score))
    
    0 讨论(0)
  • 2020-11-27 07:42

    using plyr

    require(plyr)
    set.seed(45)
    df <- data.frame(year=sample(2000:2012, 25, replace=T), score=sample(25))
    ddply(df, .(year), summarise, max.score=max(score))
    

    using data.table

    require(data.table)
    dt <- data.table(df, key="year")
    dt[, list(max.score=max(score)), by=year]
    

    using aggregate:

    o <- aggregate(df$score, list(df$year) , max)
    names(o) <- c("year", "max.score")
    

    using ave:

    df1 <- df
    df1$max.score <- ave(df1$score, df1$year, FUN=max)
    df1 <- df1[!duplicated(df1$year), ]
    

    Edit: In case of more columns, a data.table solution would be the best (my opinion :))

    set.seed(45)
    df <- data.frame(year=sample(2000:2012, 25, replace=T), score=sample(25), 
                   alpha = sample(letters[1:5], 25, replace=T), beta=rnorm(25))
    
    # convert to data.table with key=year
    dt <- data.table(df, key="year")
    # get the subset of data that matches this criterion
    dt[, .SD[score %in% max(score)], by=year]
    
    #     year score alpha       beta
    #  1: 2000    20     b  0.8675148
    #  2: 2001    21     e  1.5543102
    #  3: 2002    22     c  0.6676305
    #  4: 2003    18     a -0.9953758
    #  5: 2004    23     d  2.1829996
    #  6: 2005    25     b -0.9454914
    #  7: 2007    17     e  0.7158021
    #  8: 2008    12     e  0.6501763
    #  9: 2011    24     a  0.7201334
    # 10: 2012    19     d  1.2493954
    
    0 讨论(0)
  • 2020-11-27 07:52

    using base packages

    > df
      year score
    1 2000    18
    2 2001    22
    3 2000    21
    > aggregate(score ~ year, data=df, max)
      year score
    1 2000    21
    2 2001    22
    

    EDIT

    If you have additional columns that you need to keep, then you can user merge with aggregate to get those columns

    > df <- data.frame(year = c(2000, 2001, 2000), score = c(18, 22, 21) , hrs = c( 10, 11, 12))
    > df
      year score hrs
    1 2000    18  10
    2 2001    22  11
    3 2000    21  12
    > merge(aggregate(score ~ year, data=df, max), df, all.x=T)
      year score hrs
    1 2000    21  12
    2 2001    22  11
    
    0 讨论(0)
  • 2020-11-27 07:52

    one liner,

    df_2<-data.frame(year=sort(unique(df$year)),score = tapply(df$score,df$year,max));
    
    0 讨论(0)
提交回复
热议问题