问题
My file is like this-
Pcol Mcol
P1 M1,M2,M5,M6,M1,M2,M1.M5
P2 M1,M2,M3,M5,M1,M2,M1,M3
P3 M4,M5,M7,M6,M5,M7,M4,M7
I want to find all the combination of Mcol elements
and find these combinatinatons are present in how many rows
.
Expected output-
Mcol freq
M1,M2 2
M1,M5 2
M1,M6 1
M2,M5 2
M2,M6 1
M5,M6 2
M1,M3 1
M2,M3 1
M4,M5 1
M4,M7 1
M4,M6 1
M7,M6 1
I have tried this-
x <- read.csv("file.csv" ,header = TRUE, stringsAsFactors = FALSE)
xx <- do.call(rbind.data.frame,
lapply(x$Mcol, function(i){
n <- sort(unlist(strsplit(i, ",")))
t(combn(n, 2))
}))
data.frame(table(paste(xx[, 1], xx[, 2], sep = ",")))
It doesn't give the expected output
I have also tried this as well-
library(tidyverse)
df1 %>%
separate_rows(Mcol) %>%
group_by(Pcol) %>%
summarise(Mcol = list(combn(Mcol, 2, FUN= toString, simplify = FALSE))) %>%
unnest %>%
unnest %>%
count(Mcol)
But it is not giving the frequency of combination that are present in number of rows.I want the frequency of row in which these combinations are present
. That means if M1,M2 are present in P1 and P2 so it will calculate the frequency as 2
.
回答1:
An option in tidyverse
would be to be split the 'Mcol' with separate_row
, grouped by 'Pcol', get the combn
of 'Mcol' and after unnest
ing take the count
of 'Mcol' column
library(tidyverse)
df1 %>%
separate_rows(Mcol) %>%
group_by(Pcol) %>%
summarise(Mcol = list(combn(Mcol, 2, FUN= toString, simplify = FALSE))) %>%
unnest %>%
unnest %>%
count(Mcol)
# A tibble: 14 x 2
# Mcol n
# <chr> <int>
# 1 M1, M2 2
# 2 M1, M3 1
# 3 M1, M5 2
# 4 M1, M6 1
# 5 M2, M3 1
# 6 M2, M5 2
# 7 M2, M6 1
# 8 M3, M5 1
# 9 M4, M5 1
#10 M4, M6 1
#11 M4, M7 1
#12 M5, M6 2
#13 M5, M7 1
#14 M7, M6 1
来源:https://stackoverflow.com/questions/56794136/find-all-the-combinations-of-a-particular-column-and-find-their-frequencies