问题
I have two sets of dataArrays representing a value on three coordinate axes (lat, lon, time); one set of dataArrays represents variable varA, one set represents variable varB (example given below).
varA
<xarray.DataArray 'varA' (time: 32, lat: 20, lon: 18)>
array([[[...
... ]]])
Coordinates:
* lat (lat) float64 4.75 4.25 3.75 3.25 2.75 2.25 1.75 1.25 0.75 0.25 ...
* lon (lon) float64 33.25 33.75 34.25 34.75 35.25 35.75 36.25 36.75 ...
* time (time) datetime64[ns] 1979-01-01 1980-01-01 1981-01-01 ...
There are 4 different duos of varA and varB. I made a scatter plot which shows varA against varB for each lat, lon and time, and this for each duo (varA1 against varB1, varA2 against varB2...). ; and visualised it on 1 figure (all dataArrays have the exactly same coordinates).
for t in range(varA1['time'].size) :
for la in range(varA1['lat'].size) :
for lo in range(varA1['lon'].size) :
x = varA1.values[t,la,lo]
y = varB1.values[t,la,lo]
plt.scatter(x,y)
for t in range(varA2['time'].size) :
for la in range(varA2['lat'].size) :
for lo in range(varA2['lon'].size) :
x = varA2.values[t,la,lo]
y = varB2.values[t,la,lo]
plt.scatter(x,y)
...
plt.show()
The full scatterplot function works fine, but now I would like to add a trendline (and find its equation) for the full scatterplot. In fact, I want to investigate the long-term (over time) and large-scale (over lat and lon) relation between varA and varB; I know higher varA values are accompanied by higher varB values (no matter on which location or on what time), however I want to obtain 1 regression equation, a correlation coefficient and an RMSE (i.e. quantifying their relation).
Is this possible? I think I need an average of all varAs per cell [lat,lon,time], and the same for all varBs; so something like this:
avrR = wfdei_rain * 0
avrY = wfdei_rain * 0
for t in range(varA1['time'].size) :
for la in range(varA1['lat'].size) :
for lo in range(varA1['lon'].size) :
avrA[la,lo,t] = float(sum([varA1[la,lo,t],varA2[la,lo,t],varA3[la,lo,t],varA4[la,lo,t])) / 4
avrB[la,lo,t] = float(sum([varB1[la,lo,t],varB2[la,lo,t],varB3[la,lo,t],varB4[la,lo,t])) / 4
z = np.polyfit(avrA[:,:,:],avrB[:,:,:],1)
p = np.poly1d(z)
plt.plot(x,p(x))
print('y=%.6fx+(%.6f)'%(z[0],z[1])
This gives an error on the polyfit function ('expected 1D vector for x'). I do not find how to adjust polyfit to work with DataArrays - 3D. I found a related post: Applying numpy.polyfit to xarray Dataset but it didn't help me.
Any suggestions on a (better) way to approach this?
Thanks in advance!
来源:https://stackoverflow.com/questions/45863969/python-how-to-find-regression-equation-of-multiple-3d-lat-lon-time-value-data