Pandas Crosstabulation and counting

前端 未结 1 1506
甜味超标
甜味超标 2021-02-06 13:52

I am using Python Pandas. I have got a column with a string and I would like to have the crossing between the columns.

E.g I have got the following input



        
1条回答
  •  隐瞒了意图╮
    2021-02-06 14:37

    You can generate the dummy columns first:

    df['A'].str.get_dummies(', ')
    Out: 
       Andi  Cindy  Thomas
    0     1      0       0
    1     1      1       0
    2     0      1       1
    3     0      1       1
    

    And use that in the dot product:

    tab = df['A'].str.get_dummies(', ')
    
    tab.T.dot(tab)
    Out: 
            Andi  Cindy  Thomas
    Andi       2      1       0
    Cindy      1      3       2
    Thomas     0      2       2
    

    Diagonal entries will give you the number of occurrences for each person. If you need to set the diagonals to 1, there are several alternatives. One of them is np.fill_diagonal from numpy.

    co_occurrence = tab.T.dot(tab)    
    np.fill_diagonal(co_occurrence.values, 1)    
    co_occurrence
    Out: 
            Andi  Cindy  Thomas
    Andi       1      1       0
    Cindy      1      1       2
    Thomas     0      2       1
    

    0 讨论(0)
提交回复
热议问题