Find all the combinations of a particular column and find their frequencies

问题

My file is like this-

Pcol       Mcol
P1      M1,M2,M5,M6,M1,M2,M1.M5
P2      M1,M2,M3,M5,M1,M2,M1,M3
P3      M4,M5,M7,M6,M5,M7,M4,M7

I want to find all the combination of Mcol elements and find these combinatinatons are present in how many rows.

Expected output-

Mcol        freq
M1,M2        2
M1,M5        2
M1,M6        1
M2,M5        2
M2,M6        1
M5,M6        2
M1,M3        1
M2,M3        1
M4,M5        1
M4,M7        1
M4,M6        1
M7,M6        1

I have tried this-

x <- read.csv("file.csv" ,header = TRUE, stringsAsFactors = FALSE)
xx <- do.call(rbind.data.frame, 
              lapply(x$Mcol, function(i){
                n <- sort(unlist(strsplit(i, ",")))
                t(combn(n, 2))
              }))

data.frame(table(paste(xx[, 1], xx[, 2], sep = ",")))

It doesn't give the expected output

I have also tried this as well-

library(tidyverse)
df1 %>%
   separate_rows(Mcol) %>%
   group_by(Pcol) %>%
   summarise(Mcol = list(combn(Mcol, 2, FUN= toString, simplify = FALSE))) %>% 
   unnest %>% 
   unnest %>%
   count(Mcol)

But it is not giving the frequency of combination that are present in number of rows.I want the frequency of row in which these combinations are present. That means if M1,M2 are present in P1 and P2 so it will calculate the frequency as 2.

回答1:

An option in tidyverse would be to be split the 'Mcol' with separate_row, grouped by 'Pcol', get the combn of 'Mcol' and after unnesting take the count of 'Mcol' column

library(tidyverse)
df1 %>%
   separate_rows(Mcol) %>%
   group_by(Pcol) %>%
   summarise(Mcol = list(combn(Mcol, 2, FUN= toString, simplify = FALSE))) %>% 
   unnest %>% 
   unnest %>%
   count(Mcol)
# A tibble: 14 x 2
#   Mcol       n
#   <chr>  <int>
# 1 M1, M2     2
# 2 M1, M3     1
# 3 M1, M5     2
# 4 M1, M6     1
# 5 M2, M3     1
# 6 M2, M5     2
# 7 M2, M6     1
# 8 M3, M5     1
# 9 M4, M5     1
#10 M4, M6     1
#11 M4, M7     1
#12 M5, M6     2
#13 M5, M7     1
#14 M7, M6     1

来源：https://stackoverflow.com/questions/56794136/find-all-the-combinations-of-a-particular-column-and-find-their-frequencies

标签

read.csv