Finding the index or unique values from a dataframe column

后端 未结 4 803
慢半拍i
慢半拍i 2021-01-22 06:33

I have a dataframe

TableName Function Argument
A         func1    3
B         func1    4
A         func2    6
B         func2    2
C         func1    5


        
相关标签:
4条回答
  • 2021-01-22 07:11

    No need for any package. Try out:

    aggregate(rownames(df) ~ TableName, df, c)
      TableName rownames(df)
    1         A         1, 3
    2         B         2, 4
    3         C            5
    # or
    setNames(aggregate(rownames(df) ~ TableName, df, c),
             c("TableName", "Index"))
      TableName Index
    1         A  1, 3
    2         B  2, 4
    3         C     5
    
    0 讨论(0)
  • 2021-01-22 07:11

    I'd suggest to use either simply

    (vec <- tapply(df$Argument, df$TableName, FUN = `identity`))
    # $A
    # [1] 3 6
    #
    # $B
    # [1] 4 2
    #
    # $C
    # [1] 5
    

    or

    (dfNew <- data.frame(TableName = unique(df$TableName), Index = vec))
    #   TableName Index
    # A         A  3, 6
    # B         B  4, 2
    # C         C     5
    

    vec is a list (rather than a character with concatenated numbers) with easy access to names(vec) and, e.g.,

    vec$A
    # [1] 3 6
    

    while dfNew is a data frame whose second column is also a list:

    dfNew[2]
    #   Index
    # A  3, 6
    # B  4, 2
    # C     5
    
    dfNew[,2]
    # [[1]]
    # [1] 3 6
    #
    # [[2]]
    # [1] 4 2
    #
    # [[3]]
    # [1] 5
    
    dfNew[2]["A",][[1]]
    # [1] 3 6
    

    In this case, however, it's not so convenient to reach the indices by TableName, so I'd stick with vec.

    0 讨论(0)
  • 2021-01-22 07:17

    Here is a dplyr solution where we create a variable with the row_number(), and use that as our index, i.e.

    df %>% 
     mutate(new = row_number()) %>% 
     group_by(TableName) %>% 
     summarise(Index = toString(new))
    

    which gives,

    # A tibble: 3 x 2
      TableName Index
      <fct>     <chr>
    1 A         1, 3 
    2 B         2, 4 
    3 C         5    
    

    You can also save them as lists rather than strings, which will make future operations easier, i.e.

    df %>% 
     mutate(new = row_number()) %>% 
     group_by(TableName) %>% 
     summarise(Index = list(new))
    

    which gives,

    # A tibble: 3 x 2
      TableName Index    
      <fct>     <list>   
    1 A         <int [2]>
    2 B         <int [2]>
    3 C         <int [1]>
    
    0 讨论(0)
  • 2021-01-22 07:29

    Using data.table:

    setDT(data)[, .(Index = toString(.I)), TableName]
       TableName Index
    1:         A  1, 3
    2:         B  2, 4
    3:         C     5
    
    0 讨论(0)
提交回复
热议问题