Find all the combinations of a particular column and find their frequencies

帅比萌擦擦* 提交于 2019-12-11 07:58:58

问题


My file is like this-

Pcol       Mcol
P1      M1,M2,M5,M6,M1,M2,M1.M5
P2      M1,M2,M3,M5,M1,M2,M1,M3
P3      M4,M5,M7,M6,M5,M7,M4,M7

I want to find all the combination of Mcol elements and find these combinatinatons are present in how many rows.

Expected output-

Mcol        freq
M1,M2        2
M1,M5        2
M1,M6        1
M2,M5        2
M2,M6        1
M5,M6        2
M1,M3        1
M2,M3        1
M4,M5        1
M4,M7        1
M4,M6        1
M7,M6        1

I have tried this-

x <- read.csv("file.csv" ,header = TRUE, stringsAsFactors = FALSE)
xx <- do.call(rbind.data.frame, 
              lapply(x$Mcol, function(i){
                n <- sort(unlist(strsplit(i, ",")))
                t(combn(n, 2))
              }))

data.frame(table(paste(xx[, 1], xx[, 2], sep = ",")))

It doesn't give the expected output

I have also tried this as well-

library(tidyverse)
df1 %>%
   separate_rows(Mcol) %>%
   group_by(Pcol) %>%
   summarise(Mcol = list(combn(Mcol, 2, FUN= toString, simplify = FALSE))) %>% 
   unnest %>% 
   unnest %>%
   count(Mcol)

But it is not giving the frequency of combination that are present in number of rows.I want the frequency of row in which these combinations are present. That means if M1,M2 are present in P1 and P2 so it will calculate the frequency as 2.


回答1:


An option in tidyverse would be to be split the 'Mcol' with separate_row, grouped by 'Pcol', get the combn of 'Mcol' and after unnesting take the count of 'Mcol' column

library(tidyverse)
df1 %>%
   separate_rows(Mcol) %>%
   group_by(Pcol) %>%
   summarise(Mcol = list(combn(Mcol, 2, FUN= toString, simplify = FALSE))) %>% 
   unnest %>% 
   unnest %>%
   count(Mcol)
# A tibble: 14 x 2
#   Mcol       n
#   <chr>  <int>
# 1 M1, M2     2
# 2 M1, M3     1
# 3 M1, M5     2
# 4 M1, M6     1
# 5 M2, M3     1
# 6 M2, M5     2
# 7 M2, M6     1
# 8 M3, M5     1
# 9 M4, M5     1
#10 M4, M6     1
#11 M4, M7     1
#12 M5, M6     2
#13 M5, M7     1
#14 M7, M6     1


来源:https://stackoverflow.com/questions/56794136/find-all-the-combinations-of-a-particular-column-and-find-their-frequencies

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!