Convert categorical data in data frame to weighted adjacency matrix

醉酒当歌 提交于 2019-12-11 06:53:02

问题


I have the following data frame, call it DF, which is a data frame consisting in three vectors: "Chunk" "Name," and "Frequency." I need to turn it into a NameXName adjacency matrix where Names are considered adjacent when they reside in the same chunk. So for example, in the first lines, Gretel and Friedrich are adjacent because they are both in Chunk2. And the weight of the relationship should be based on "Frequency," precisely the number of times they are co-present in the same chunk, so for the Gretel/Friedrich example, Frequency(Gretel)+Frequency(Friedrich)-1 = 5

    Chunk         Name Frequency  
1       2       Gretel         2  
2       2      Pollock         1 
3       2       Adorno         1   
4       2    Friedrich         4  
5       3          Max         1 
6       3   Horkheimer         1  
7       3       Adorno         1   
8       4    Friedrich         5  
9       4      Pollock         1 
10      4        March         1 
11      5        Comte         3  
12      7      Jaspers         1  
13      7       Huxley         2  
14      8    Nietzsche         1 
15      8         Sade         2 
16      8        Felix         1  
17      8         Weil         1 
18      8      Western         1 
19      8    Lowenthal         1 
20      8         Kant         1 
21      8       Hitler         1 

I started to crack at this by splitting the data frame according to DF$Chunk,

> DF.split<-split(DF, DF$Chunk) 

$`2`
  Chunk      Name Frequency
1     2    Gretel         2
2     2   Pollock         1
3     2    Adorno         1
4     2 Friedrich         4

$`3`
  Chunk       Name Frequency
5     3        Max         1
6     3 Horkheimer         1
7     3     Adorno         1

$`4`
   Chunk      Name Frequency
8      4 Friedrich         5
9      4   Pollock         1
10     4     March         1

which I thought got closer, but it returns list items that I am having trouble turning back into workable data frames.

I have also tried to start by turning this into a ChunkXName adjacency matrix:

> chunkbyname<-tapply(DF$Frequency , list(DF$Name,DF$Chunk) , as.character )

with the hopes of multiplying chunkbyname by its transpose to get the NAmeXName matrix, but it seems this is the matrix is too sparse or complex (Error in a %*% b : requires numeric/complex matrix/vector arguments).

Any help getting this data frame into an adjacency matrix greatly appreciated.


回答1:


Is this what you are looking for?

df3 <- by(df, df$Chunk, function(x){
  mm <- outer(x$Frequency, x$Frequency, "+") - 1
  rownames(mm) <- x$Name
  colnames(mm) <- x$Name
  mm
})

df3

# $`2`
#           Gretel Pollock Adorno Friedrich
# Gretel         3       2      2         5
# Pollock        2       1      1         4
# Adorno         2       1      1         4
# Friedrich      5       4      4         7
# 
# $`3`
#            Max Horkheimer Adorno
# Max          1          1      1
# Horkheimer   1          1      1
# Adorno       1          1      1
# 
# $`4`
#           Friedrich Pollock March
# Friedrich         9       5     5
# Pollock           5       1     1
# March             5       1     1


来源:https://stackoverflow.com/questions/20623691/convert-categorical-data-in-data-frame-to-weighted-adjacency-matrix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!