How to perform Correlation between two dataframes with different column names

后端 未结 1 2021
太阳男子
太阳男子 2020-12-29 15:35

I have a set of columns (col1,col2,col3) in dataframe df1 I have another set of columns (col4,col5,col6) in dataframe df2 Assume this two dataframes has the same number of r

相关标签:
1条回答
  • 2020-12-29 15:47

    pandas quick and dirty

    pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).corr().loc['df2', 'df1']
    

    numpy clean

    def corr(df1, df2):
        n = len(df1)
        v1, v2 = df1.values, df2.values
        sums = np.multiply.outer(v2.sum(0), v1.sum(0))
        stds = np.multiply.outer(v2.std(0), v1.std(0))
        return pd.DataFrame((v2.T.dot(v1) - sums / n) / stds / n,
                            df2.columns, df1.columns)
    
    corr(df1, df2)
    

    example

    df1 = pd.DataFrame(np.random.rand(10, 4), columns=list('abcd'))
    
    df2 = pd.DataFrame(np.random.rand(10, 3), columns=list('xyz'))
    

    pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).corr().loc['df2', 'df1']
    
              a         b         c         d
    x  0.235624  0.844665 -0.647962  0.535562
    y  0.357994  0.462007  0.205863  0.424568
    z  0.688853  0.350318  0.132357  0.687038
    

    corr(df1, df2)
    
              a         b         c         d
    x  0.235624  0.844665 -0.647962  0.535562
    y  0.357994  0.462007  0.205863  0.424568
    z  0.688853  0.350318  0.132357  0.687038
    
    0 讨论(0)
提交回复
热议问题