Compute co-occurrence matrix by counting values in cells

后端 未结 4 456
醉梦人生
醉梦人生 2021-01-01 21:07

I have a dataframe like this

df = pd.DataFrame({\'a\' : [1,1,0,0], \'b\': [0,1,1,0], \'c\': [0,0,1,1]})

I want to get

  a         


        
相关标签:
4条回答
  • 2021-01-01 21:32

    np.einsum

    Not as pretty as df.T.dot(df) but how often do you see np.einsum amirite?

    pd.DataFrame(np.einsum('ij,ik->jk', df, df), df.columns, df.columns)
    
       a  b  c
    a  2  1  0
    b  1  2  1
    c  0  1  2
    
    0 讨论(0)
  • 2021-01-01 21:34

    Numpy matmul

    np.matmul(df.values.T,df.values)
    Out[87]: 
    array([[2, 1, 0],
           [1, 2, 1],
           [0, 1, 2]], dtype=int64)
    
    #pd.DataFrame(np.matmul(df.values.T,df.values), df.columns, df.columns)
    
    0 讨论(0)
  • 2021-01-01 21:36

    You can do a multiplication using @ operator for numpy arrays.

    df = pd.DataFrame(df.values.T @ df.values, df.columns, df.columns)
    
    0 讨论(0)
  • You appear to want the matrix product, so leverage DataFrame.dot:

    df.T.dot(df)
       a  b  c
    a  2  1  0
    b  1  2  1
    c  0  1  2
    

    Alternatively, if you want the same level of performance without the overhead of pandas, you could compute the product with np.dot:

    v = df.values
    pd.DataFrame(v.T.dot(v), index=df.columns, columns=df.columns)
    

    Or, if you want to get cute,

    (lambda a, c: pd.DataFrame(a.T.dot(a), c, c))(df.values, df.columns)
    

       a  b  c
    a  2  1  0
    b  1  2  1
    c  0  1  2
    

    —piRSquared

    0 讨论(0)
提交回复
热议问题