How to calculate p-values for pairwise correlation of columns in Pandas?

后端 未结 4 980
执念已碎
执念已碎 2021-02-09 11:01

Pandas has the very handy function to do pairwise correlation of columns using pd.corr(). That means it is possible to compare correlations between columns of any length. For in

4条回答
  •  有刺的猬
    2021-02-09 11:50

    Does this work for you?

    #call the correlation function, you could round the values if needed
    df_c = df_c.corr().round(1)
    #get the p values
    pval = df_c.corr(method=lambda x, y: pearsonr(x, y)[1]) - np.eye(*rho.shape)
    #set the p values, *** for less than 0.001, ** for less than 0.01, * for less than 0.05
    p = pval.applymap(lambda x: ''.join(['*' for t in [0.001,0.01,0.05] if x<=t]))
    #dfc_2 below will give you the dataframe with correlation coefficients and p values
    df_c2 = df_c.astype(str) + p
    
    #you could also plot the correlation matrix using sns.heatmap if you want
    #plot the triangle
    matrix = np.triu(df_c.corr())
    #convert to array for the heatmap
    df_c3 = df_c2.to_numpy()
    
    #plot the heatmap
    plt.figure(figsize=(13,8))
    sns.heatmap(df_c, annot = df_c3, fmt='', vmin=-1, vmax=1, center= 0, cmap= 'coolwarm', mask = matrix)
    

提交回复
热议问题