Is it possible to do running correlation with one fixed series in Python?

天大地大妈咪最大 提交于 2021-01-29 00:42:15

问题


I'm wondering if there is a fast way to do running correlation in Python with one fixed series? I've tried to use Pandas and for example: df1.rolling(4).corr(df2). However, it requires two DataFrames to have the same length. Is there a way to do similiar to the above Pandas example, but with one DataFrame being fixed?

To clarify, I would want to calculate the correlation coefficent between df2 below and the values in df1.

Example: First correlation between df2 and df1.loc[0:3] Second correlation between df2 and df1.loc[1:4]

etc.

I've managed to do this by creating a loop. However, I find it inefficent when working with larger DataFrames.

df1 = pd.DataFrame([1,3,2,4,5,6,3,4])
df2 = pd.DataFrame([1,2,3,2])

回答1:


You can use the pandas.DataFrame.rolling which returns pandas.core.window.Rolling which has apply method. Then you could pass to apply() any function that calculates the correction you want.

Example

  • Let's say you are interested in the Pearson correlation coefficient. That can be calculated using scipy.stats.pearsonr.
import pandas as pd
from scipy.stats import pearsonr 
import numpy as np 


df1 = pd.DataFrame([1,3,2,4,5,6,3,4,1,2,3,2,2,3,2,5,1,2,1,2,8,8,8,8,8,8,8])
df2 = pd.DataFrame([1,2,3,2])

CORR_VALS = df2[0].values
def get_correlation(vals):
    return pearsonr(vals, CORR_VALS)[0]

df1['correlation'] = df1.rolling(window=len(CORR_VALS)).apply(get_correlation)

  • Note that the window argument in the df1.rolling() should have the same length as the array you are calculating correlation against.

this outputs

In [5]: df1['correlation'].values
Out[5]:
array([        nan,         nan,         nan,  0.31622777,  0.31622777,
        0.71713717,  0.63245553, -0.63245553, -0.39223227, -0.63245553,
       -0.63245553,  1.        ,  0.        , -0.70710678,  0.81649658,
        0.        ,  0.47809144, -0.23570226, -0.64699664,  0.        ,
        0.        ,  0.7570333 ,  0.76509206,  0.11043153, -0.77302068,
       -0.11043153,  0.86164044])

which would look like this:



来源:https://stackoverflow.com/questions/62942889/is-it-possible-to-do-running-correlation-with-one-fixed-series-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!