问题
I have three lists that I have loaded into a pandas dataframe.
import pandas as pd
df = pd.DataFrame({'x': location})
df = df.assign(y1 = variable1)
df = df.assign(y2 = variable2)
I would like to plot the correlation of y1 with y2 with x being the common x-axis. That is, really, I would like to bin y1 and y2 values according to x location, find the correlation of y1 with y2 within each bin and then plot a line of the correlations across the whole x domain. So my final plot will have correlation on the y-axis and location on the x-axis.
I have previously done something not completely dissimilar to this using the scipy binned_statistics
function to plot conditional means but I don't think I can easily extend that to correlations. I would also like to get a bit better at using pandas anyway so I'm trying to avoid that route if at all possible.
I'm sure this has been asked before but everything that I have come across seems to be looking at multiple distribution plots.
回答1:
I've more or less arrived at a solution. Implementing something similar to what was used here I have:
nbins = 20
df['bins'] = pd.qcut(df['x'], q=nbins)
plotdatadf = df.groupby('bins')[['y1', 'y2']].corr().iloc[0::2, -1]
This provides me with a data frame with a correlation coefficient of y1
and y2
for each bin, where bins are evenly divided along x
in terms of observations per bin.
I can now go back to my previous dataframe and add another column of the original length with these correlation values, conditional on if bin[1] then corr = corr[1]
-type copying. This column can then be plotted as y against my already existing x as a line plot.
来源:https://stackoverflow.com/questions/64019645/plotting-binned-correlation-of-two-variables-using-common-axis