I am using Python Pandas. I have got a column with a string and I would like to have the crossing between the columns.
E.g I have got the following input
You can generate the dummy columns first:
df['A'].str.get_dummies(', ')
Out:
Andi Cindy Thomas
0 1 0 0
1 1 1 0
2 0 1 1
3 0 1 1
And use that in the dot product:
tab = df['A'].str.get_dummies(', ')
tab.T.dot(tab)
Out:
Andi Cindy Thomas
Andi 2 1 0
Cindy 1 3 2
Thomas 0 2 2
Diagonal entries will give you the number of occurrences for each person. If you need to set the diagonals to 1, there are several alternatives. One of them is np.fill_diagonal from numpy.
co_occurrence = tab.T.dot(tab)
np.fill_diagonal(co_occurrence.values, 1)
co_occurrence
Out:
Andi Cindy Thomas
Andi 1 1 0
Cindy 1 1 2
Thomas 0 2 1