Calculate pearson correlation in python

社会主义新天地 提交于 2021-01-28 09:22:34

问题


I have 4 columns "Country, year, GDP, CO2 emissions"

I want to measure the pearson correlation between GDP and CO2emissions for each country.

The country column has all the countries in the world and the year has the values "1990, 1991, ...., 2018".


回答1:


You should use a groupby grouped with corr() as your aggregation function:

country = ['India','India','India','India','India','China','China','China','China','China']
Year = [2018,2017,2016,2015,2014,2018,2017,2016,2015,2014]
GDP = [100,98,94,64,66,200,189,165,134,130]
CO2 = [94,96,90,76,64,180,172,150,121,117]
df = pd.DataFrame({'country':country,'Year':Year,'GDP':GDP,'CO2':CO2})
print(df.groupby('country')[['GDP','CO2']].corr()

If we work this output a bit we can go to something fancier:

df_corr = (df.groupby('country')['GDP','CO2'].corr()).drop(columns='GDP').drop('CO2',level=1).rename(columns={'CO2':'Correlation'})
df_corr = df_corr.reset_index().drop(columns='level_1').set_index('country',drop=True)
print(df_corr)

Output:

         Correlation
country             
China       0.999581
India       0.932202



回答2:


My guess is that you want to have the pearson coef for each country. Using pearsonr you can loop through and create a dictionary for each country.

from scipy.stats.stats import pearsonr
df = pd.DataFrame({"column1":["value 1", "value 1","value 1","value 1","value 2", "value 2", "value 2", "value 2"], 
              "column2":[1,2,3,4,5, 1,2,3],
             "column3":[10,30,50, 60, 80, 10, 90, 20],
             "column4":[1, 3, 5, 6, 8, 5, 2, 3]})


results = {}
for country in df.column1.unique():
    results[country] = {}
    pearsonr_value = pearsonr(df.loc[df["column1"]== country, "column3"],df.loc[df["column1"] == country, "column4"])
    results[country]["pearson"] = pearsonr_value[0]
    results[country]["pvalue"] = pearsonr_value[0]

print(results["value 1"])
#{'pearson': 1.0, 'pvalue': 1.0}

print(results["value 2"])
#{'pearson': 0.09258200997725514, 'pvalue': 0.09258200997725514}



回答3:


Thank you @Celius it worked and gave me the results i wanted.



来源:https://stackoverflow.com/questions/60116042/calculate-pearson-correlation-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!