Finding the index or unique values from a dataframe column

后端未结

关注

 4  809

I have a dataframe

TableName Function Argument
A         func1    3
B         func1    4
A         func2    6
B         func2    2
C         func1    5

相关标签:

4条回答

猫巷女王i

2021-01-22 07:11

No need for any package. Try out:

aggregate(rownames(df) ~ TableName, df, c)
  TableName rownames(df)
1         A         1, 3
2         B         2, 4
3         C            5
# or
setNames(aggregate(rownames(df) ~ TableName, df, c),
         c("TableName", "Index"))
  TableName Index
1         A  1, 3
2         B  2, 4
3         C     5

0 讨论(0)

温柔的废话

2021-01-22 07:11

I'd suggest to use either simply

(vec <- tapply(df$Argument, df$TableName, FUN = `identity`))
# $A
# [1] 3 6
#
# $B
# [1] 4 2
#
# $C
# [1] 5

(dfNew <- data.frame(TableName = unique(df$TableName), Index = vec))
#   TableName Index
# A         A  3, 6
# B         B  4, 2
# C         C     5

vec is a list (rather than a character with concatenated numbers) with easy access to names(vec) and, e.g.,

vec$A
# [1] 3 6

while dfNew is a data frame whose second column is also a list:

dfNew[2]
#   Index
# A  3, 6
# B  4, 2
# C     5

dfNew[,2]
# [[1]]
# [1] 3 6
#
# [[2]]
# [1] 4 2
#
# [[3]]
# [1] 5

dfNew[2]["A",][[1]]
# [1] 3 6

In this case, however, it's not so convenient to reach the indices by TableName, so I'd stick with vec.

0 讨论(0)

礼貌的吻别

2021-01-22 07:17

Here is a dplyr solution where we create a variable with the row_number(), and use that as our index, i.e.

df %>% 
 mutate(new = row_number()) %>% 
 group_by(TableName) %>% 
 summarise(Index = toString(new))

which gives,

# A tibble: 3 x 2
  TableName Index
  <fct>     <chr>
1 A         1, 3 
2 B         2, 4 
3 C         5

You can also save them as lists rather than strings, which will make future operations easier, i.e.

df %>% 
 mutate(new = row_number()) %>% 
 group_by(TableName) %>% 
 summarise(Index = list(new))

which gives,

# A tibble: 3 x 2
  TableName Index    
  <fct>     <list>   
1 A         <int [2]>
2 B         <int [2]>
3 C         <int [1]>

0 讨论(0)

萌比男神i

2021-01-22 07:29

Using data.table:

setDT(data)[, .(Index = toString(.I)), TableName]
   TableName Index
1:         A  1, 3
2:         B  2, 4
3:         C     5

0 讨论(0)