For each row return the column name of the largest value

前端 未结 8 2270
礼貌的吻别
礼貌的吻别 2020-11-21 07:06

I have a roster of employees, and I need to know at what department they are in most often. It is trivial to tabulate employee ID against department name, but it is trickier

相关标签:
8条回答
  • 2020-11-21 08:00

    One solution could be to reshape the date from wide to long putting all the departments in one column and counts in another, group by the employer id (in this case, the row number), and then filter to the department(s) with the max value. There are a couple of options for handling ties with this approach too.

    library(tidyverse)
    
    # sample data frame with a tie
    df <- data_frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,5))
    
    # If you aren't worried about ties:  
    df %>% 
      rownames_to_column('id') %>%  # creates an ID number
      gather(dept, cnt, V1:V3) %>% 
      group_by(id) %>% 
      slice(which.max(cnt)) 
    
    # A tibble: 3 x 3
    # Groups:   id [3]
      id    dept    cnt
      <chr> <chr> <dbl>
    1 1     V3       9.
    2 2     V1       8.
    3 3     V2       5.
    
    
    # If you're worried about keeping ties:
    df %>% 
      rownames_to_column('id') %>%
      gather(dept, cnt, V1:V3) %>% 
      group_by(id) %>% 
      filter(cnt == max(cnt)) %>% # top_n(cnt, n = 1) also works
      arrange(id)
    
    # A tibble: 4 x 3
    # Groups:   id [3]
      id    dept    cnt
      <chr> <chr> <dbl>
    1 1     V3       9.
    2 2     V1       8.
    3 3     V2       5.
    4 3     V3       5.
    
    
    # If you're worried about ties, but only want a certain department, you could use rank() and choose 'first' or 'last'
    df %>% 
      rownames_to_column('id') %>%
      gather(dept, cnt, V1:V3) %>% 
      group_by(id) %>% 
      mutate(dept_rank  = rank(-cnt, ties.method = "first")) %>% # or 'last'
      filter(dept_rank == 1) %>% 
      select(-dept_rank) 
    
    # A tibble: 3 x 3
    # Groups:   id [3]
      id    dept    cnt
      <chr> <chr> <dbl>
    1 2     V1       8.
    2 3     V2       5.
    3 1     V3       9.
    
    # if you wanted to keep the original wide data frame
    df %>% 
      rownames_to_column('id') %>%
      left_join(
        df %>% 
          rownames_to_column('id') %>%
          gather(max_dept, max_cnt, V1:V3) %>% 
          group_by(id) %>% 
          slice(which.max(max_cnt)), 
        by = 'id'
      )
    
    # A tibble: 3 x 6
      id       V1    V2    V3 max_dept max_cnt
      <chr> <dbl> <dbl> <dbl> <chr>      <dbl>
    1 1        2.    7.    9. V3            9.
    2 2        8.    3.    6. V1            8.
    3 3        1.    5.    5. V2            5.
    
    0 讨论(0)
  • 2020-11-21 08:02

    A simple for loop can also be handy:

    > df<-data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
    > df
      V1 V2 V3
    1  2  7  9
    2  8  3  6
    3  1  5  4
    > df2<-data.frame()
    > for (i in 1:nrow(df)){
    +   df2[i,1]<-colnames(df[which.max(df[i,])])
    + }
    > df2
      V1
    1 V3
    2 V1
    3 V2
    
    0 讨论(0)
提交回复
热议问题