How to select the row with the maximum value in each group

后端 未结 16 1921
北荒
北荒 2020-11-21 04:18

In a dataset with multiple observations for each subject I want to take a subset with only the maximum data value for each record. For example, with a following dataset:

相关标签:
16条回答
  • 2020-11-21 04:47

    Here's a data.table solution:

    require(data.table) ## 1.9.2
    group <- as.data.table(group)
    

    If you want to keep all the entries corresponding to max values of pt within each group:

    group[group[, .I[pt == max(pt)], by=Subject]$V1]
    #    Subject pt Event
    # 1:       1  5     2
    # 2:       2 17     2
    # 3:       3  5     2
    

    If you'd like just the first max value of pt:

    group[group[, .I[which.max(pt)], by=Subject]$V1]
    #    Subject pt Event
    # 1:       1  5     2
    # 2:       2 17     2
    # 3:       3  5     2
    

    In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.

    0 讨论(0)
  • 2020-11-21 04:47

    Another data.table solution:

    library(data.table)
    setDT(group)[, head(.SD[order(-pt)], 1), by = .(Subject)]
    
    0 讨论(0)
  • 2020-11-21 04:49

    Since {dplyr} v1.0.0 (May 2020) there is the new slice_* syntax which supersedes top_n().

    See also https://dplyr.tidyverse.org/reference/slice.html.

    library(tidyverse)
    
    ID    <- c(1,1,1,2,2,2,2,3,3)
    Value <- c(2,3,5,2,5,8,17,3,5)
    Event <- c(1,1,2,1,2,1,2,2,2)
    
    group <- data.frame(Subject=ID, pt=Value, Event=Event)
    
    group %>% 
      group_by(Subject) %>% 
      slice_max(pt)
    #> # A tibble: 3 x 3
    #> # Groups:   Subject [3]
    #>   Subject    pt Event
    #>     <dbl> <dbl> <dbl>
    #> 1       1     5     2
    #> 2       2    17     2
    #> 3       3     5     2
    

    Created on 2020-08-18 by the reprex package (v0.3.0.9001)

    Session info
    sessioninfo::session_info()
    #> ─ Session info ───────────────────────────────────────────────────────────────
    #>  setting  value                                      
    #>  version  R version 4.0.2 Patched (2020-06-30 r78761)
    #>  os       macOS Catalina 10.15.6                     
    #>  system   x86_64, darwin17.0                         
    #>  ui       X11                                        
    #>  language (EN)                                       
    #>  collate  en_US.UTF-8                                
    #>  ctype    en_US.UTF-8                                
    #>  tz       Europe/Berlin                              
    #>  date     2020-08-18                                 
    #> 
    #> ─ Packages ───────────────────────────────────────────────────────────────────
    #>  package     * version    date       lib source                            
    #>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.0.0)                    
    #>  backports     1.1.8      2020-06-17 [1] CRAN (R 4.0.1)                    
    #>  blob          1.2.1      2020-01-20 [1] CRAN (R 4.0.0)                    
    #>  broom         0.7.0      2020-07-09 [1] CRAN (R 4.0.2)                    
    #>  cellranger    1.1.0      2016-07-27 [1] CRAN (R 4.0.0)                    
    #>  cli           2.0.2      2020-02-28 [1] CRAN (R 4.0.0)                    
    #>  colorspace    1.4-1      2019-03-18 [1] CRAN (R 4.0.0)                    
    #>  crayon        1.3.4      2017-09-16 [1] CRAN (R 4.0.0)                    
    #>  DBI           1.1.0      2019-12-15 [1] CRAN (R 4.0.0)                    
    #>  dbplyr        1.4.4      2020-05-27 [1] CRAN (R 4.0.0)                    
    #>  digest        0.6.25     2020-02-23 [1] CRAN (R 4.0.0)                    
    #>  dplyr       * 1.0.1      2020-07-31 [1] CRAN (R 4.0.2)                    
    #>  ellipsis      0.3.1      2020-05-15 [1] CRAN (R 4.0.0)                    
    #>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.0.0)                    
    #>  fansi         0.4.1      2020-01-08 [1] CRAN (R 4.0.0)                    
    #>  forcats     * 0.5.0      2020-03-01 [1] CRAN (R 4.0.0)                    
    #>  fs            1.5.0      2020-07-31 [1] CRAN (R 4.0.2)                    
    #>  generics      0.0.2      2018-11-29 [1] CRAN (R 4.0.0)                    
    #>  ggplot2     * 3.3.2      2020-06-19 [1] CRAN (R 4.0.1)                    
    #>  glue          1.4.1      2020-05-13 [1] CRAN (R 4.0.0)                    
    #>  gtable        0.3.0      2019-03-25 [1] CRAN (R 4.0.0)                    
    #>  haven         2.3.1      2020-06-01 [1] CRAN (R 4.0.0)                    
    #>  highr         0.8        2019-03-20 [1] CRAN (R 4.0.0)                    
    #>  hms           0.5.3      2020-01-08 [1] CRAN (R 4.0.0)                    
    #>  htmltools     0.5.0      2020-06-16 [1] CRAN (R 4.0.1)                    
    #>  httr          1.4.2      2020-07-20 [1] CRAN (R 4.0.2)                    
    #>  jsonlite      1.7.0      2020-06-25 [1] CRAN (R 4.0.2)                    
    #>  knitr         1.29       2020-06-23 [1] CRAN (R 4.0.2)                    
    #>  lifecycle     0.2.0      2020-03-06 [1] CRAN (R 4.0.0)                    
    #>  lubridate     1.7.9      2020-06-08 [1] CRAN (R 4.0.1)                    
    #>  magrittr      1.5        2014-11-22 [1] CRAN (R 4.0.0)                    
    #>  modelr        0.1.8      2020-05-19 [1] CRAN (R 4.0.0)                    
    #>  munsell       0.5.0      2018-06-12 [1] CRAN (R 4.0.0)                    
    #>  pillar        1.4.6      2020-07-10 [1] CRAN (R 4.0.2)                    
    #>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.0.0)                    
    #>  purrr       * 0.3.4      2020-04-17 [1] CRAN (R 4.0.0)                    
    #>  R6            2.4.1      2019-11-12 [1] CRAN (R 4.0.0)                    
    #>  Rcpp          1.0.5      2020-07-06 [1] CRAN (R 4.0.2)                    
    #>  readr       * 1.3.1      2018-12-21 [1] CRAN (R 4.0.0)                    
    #>  readxl        1.3.1      2019-03-13 [1] CRAN (R 4.0.0)                    
    #>  reprex        0.3.0.9001 2020-08-13 [1] Github (tidyverse/reprex@23a3462) 
    #>  rlang         0.4.7      2020-07-09 [1] CRAN (R 4.0.2)                    
    #>  rmarkdown     2.3.3      2020-07-26 [1] Github (rstudio/rmarkdown@204aa41)
    #>  rstudioapi    0.11       2020-02-07 [1] CRAN (R 4.0.0)                    
    #>  rvest         0.3.6      2020-07-25 [1] CRAN (R 4.0.2)                    
    #>  scales        1.1.1      2020-05-11 [1] CRAN (R 4.0.0)                    
    #>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 4.0.2)                    
    #>  stringi       1.4.6      2020-02-17 [1] CRAN (R 4.0.0)                    
    #>  stringr     * 1.4.0      2019-02-10 [1] CRAN (R 4.0.0)                    
    #>  styler        1.3.2.9000 2020-07-05 [1] Github (pat-s/styler@51d5200)     
    #>  tibble      * 3.0.3      2020-07-10 [1] CRAN (R 4.0.2)                    
    #>  tidyr       * 1.1.1      2020-07-31 [1] CRAN (R 4.0.2)                    
    #>  tidyselect    1.1.0      2020-05-11 [1] CRAN (R 4.0.0)                    
    #>  tidyverse   * 1.3.0      2019-11-21 [1] CRAN (R 4.0.0)                    
    #>  utf8          1.1.4      2018-05-24 [1] CRAN (R 4.0.0)                    
    #>  vctrs         0.3.2      2020-07-15 [1] CRAN (R 4.0.2)                    
    #>  withr         2.2.0      2020-04-20 [1] CRAN (R 4.0.0)                    
    #>  xfun          0.16       2020-07-24 [1] CRAN (R 4.0.2)                    
    #>  xml2          1.3.2      2020-04-23 [1] CRAN (R 4.0.0)                    
    #>  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.0.0)                    
    #> 
    #> [1] /Users/pjs/Library/R/4.0/library
    #> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
    
    0 讨论(0)
  • 2020-11-21 04:50

    Another base solution

    group_sorted <- group[order(group$Subject, -group$pt),]
    group_sorted[!duplicated(group_sorted$Subject),]
    
    # Subject pt Event
    #       1  5     2
    #       2 17     2
    #       3  5     2
    

    Order the data frame by pt (descending) and then remove rows duplicated in Subject

    0 讨论(0)
  • 2020-11-21 04:50

    Here's another data.table solution, since which.max does not work on characters

    library(data.table)
    group <- data.table(Subject=ID, pt=Value, Event=Event)
    
    group[, .SD[order(pt, decreasing = TRUE) == 1], by = Subject]
    
    0 讨论(0)
  • 2020-11-21 04:53

    A shorter solution using data.table:

    setDT(group)[, .SD[which.max(pt)], by=Subject]
    #    Subject pt Event
    # 1:       1  5     2
    # 2:       2 17     2
    # 3:       3  5     2
    
    0 讨论(0)
提交回复
热议问题