Create table with all pairs of values from one column in R, counting unique values [duplicate]

两盒软妹~` 提交于 2019-12-21 12:35:40

问题


I have data that shows what customers have purchased certain items. They can purchase an item multiple times. What I need is a table that shows all of the possible pairwise combinations of items along with the unique number of customers who have purchased that combination (the diagonal of the table will just be the unique number of people purchasing each item).

Here is an example:

item <- c("h","h","h","j","j")
customer <- c("a","a","b","b","b")
test.data <- data.frame(item,customer)

Here is the test.data:

item customer
h    a
h    a
h    b
j    b
j    b

Result needed - a table with the items as row and column names, with the counts of unique customers purchasing the pair inside the table. So, 2 customers purchased item h, 1 purchased both item h and j, and 1 purchased item j.

item   h    j
h      2    1
j      1    1

I have tried using the table function, melt/cast, etc., but nothing gets me the counts I need within the table. My first step is using unique() to get rid of duplicate rows.


回答1:


Using data.table and the gtools package, we can recreate all possible permutations by customer:

library(data.table)
library(gtools)

item <- c("h","h","h","j","j")
customer <- c("a","a","b","b","b")
test.data <- data.table(item,customer)

DT <- unique(test.data) #The unique is used as multiple purchases do not count twice

tuples <- function(x){
  return(data.frame(permutations(length(x), 2, x, repeats.allowed = T, set = F), stringsAsFactors = F))
}

DO <- DT[, tuples(item), by = customer]

This gives:

   customer X1 X2
1:        a  h  h
2:        b  h  h
3:        b  h  j
4:        b  j  h
5:        b  j  j

Which is a list of all unique item pairings a customer has. As per your example we are treating h x j differently from j x h. We can now get the frequency of each pair using the table function:

table(DO$X1,DO$X2)
    j h
  j 1 1
  h 1 2



回答2:


Here's a base R solution:

n_intersect <- Vectorize( function(x,y) length(intersect(x,y)) )

cs_by_item <- with(test.data, tapply(customer, item, unique))

outer(cs_by_item , cs_by_item , n_intersect)
#   h j
# h 2 1
# j 1 1


来源:https://stackoverflow.com/questions/32976966/create-table-with-all-pairs-of-values-from-one-column-in-r-counting-unique-valu

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!