Program to obtain frequency matrix of categorical data

纵饮孤独 提交于 2020-06-15 10:11:03

问题


I am working on data that contains more than 300 categorical features that I have factored into 0s and 1s. Now, i need to create a matrix of the features to with frequency of joint occurrence in each cell.

In the end , I am looking to create a heatmap of this frequency matrix.

So, my dataframe in R looks like this:

id cat1 cat2 cat3 cat4
156   0    0    1    1
465   1    1    1    0
573   0    1    1    0

The output I want is:

      cat1 cat2  cat3 ...
cat1   0     1      0
cat2    1     0     2
cat3    1     2     0
  .
  .

where each cell value denotes the number of times the two categorical variables have appeared together.


回答1:


We can use outer

#Since we have only 0's and 1's in column we can directly use &
fun <- function(x, y) sum(df[, x] & df[, y])

#Get all the cat columns
n <- seq_along(df)[-1]
#Apply function to every combination of columns
mat <- outer(n, n, Vectorize(fun))
#Turn diagonals to 0
diag(mat) <- 0
#Assign rownames and column names
dimnames(mat) <- list(names(df)[n], names(df[n]))

#     cat1 cat2 cat3 cat4
#cat1    0    1    1    0
#cat2    1    0    2    0
#cat3    1    2    0    1
#cat4    0    0    1    0



回答2:


we can use table with crossprod from base R

i1 <- as.logical(unlist(df1[-1]))
out <- crossprod(table(df1$id[row(df1[-1])][i1], 
          names(df1)[-1][col(df1[-1])].  [i1]))
diag(out) <- 0
out

#       cat1 cat2 cat3 cat4
#  cat1    0    1    1    0
#  cat2    1    0    2    0
#  cat3    1    2    0    1
#  cat4    0    0    1    0

data

df1 <- structure(list(id = c(156L, 465L, 573L), cat1 = c(0L, 1L, 0L), 
    cat2 = c(0L, 1L, 1L), cat3 = c(1L, 1L, 1L), cat4 = c(1L, 
    0L, 0L)), class = "data.frame", row.names = c(NA, -3L))


来源:https://stackoverflow.com/questions/58201529/program-to-obtain-frequency-matrix-of-categorical-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!