Python: How to find regression equation of multiple 3D (lat-lon-time-value) dataArrays?

南楼画角 提交于 2019-12-13 03:47:11

问题


I have two sets of dataArrays representing a value on three coordinate axes (lat, lon, time); one set of dataArrays represents variable varA, one set represents variable varB (example given below).

varA
<xarray.DataArray 'varA' (time: 32, lat: 20, lon: 18)>
array([[[... 
... ]]])
Coordinates:
 * lat      (lat) float64 4.75 4.25 3.75 3.25 2.75 2.25 1.75 1.25 0.75 0.25 ...
 * lon      (lon) float64 33.25 33.75 34.25 34.75 35.25 35.75 36.25 36.75 ...
 * time     (time) datetime64[ns] 1979-01-01 1980-01-01 1981-01-01 ...

There are 4 different duos of varA and varB. I made a scatter plot which shows varA against varB for each lat, lon and time, and this for each duo (varA1 against varB1, varA2 against varB2...). ; and visualised it on 1 figure (all dataArrays have the exactly same coordinates).

for t in range(varA1['time'].size) :
    for la in range(varA1['lat'].size) :
        for lo in range(varA1['lon'].size) :
            x = varA1.values[t,la,lo]
            y = varB1.values[t,la,lo]
            plt.scatter(x,y)
for t in range(varA2['time'].size) :
    for la in range(varA2['lat'].size) :
        for lo in range(varA2['lon'].size) :
            x = varA2.values[t,la,lo]
            y = varB2.values[t,la,lo]
            plt.scatter(x,y)
... 
plt.show()

The full scatterplot function works fine, but now I would like to add a trendline (and find its equation) for the full scatterplot. In fact, I want to investigate the long-term (over time) and large-scale (over lat and lon) relation between varA and varB; I know higher varA values are accompanied by higher varB values (no matter on which location or on what time), however I want to obtain 1 regression equation, a correlation coefficient and an RMSE (i.e. quantifying their relation).

Is this possible? I think I need an average of all varAs per cell [lat,lon,time], and the same for all varBs; so something like this:

avrR = wfdei_rain * 0
avrY = wfdei_rain * 0
for t in range(varA1['time'].size) :
    for la in range(varA1['lat'].size) :
        for lo in range(varA1['lon'].size) :
            avrA[la,lo,t] = float(sum([varA1[la,lo,t],varA2[la,lo,t],varA3[la,lo,t],varA4[la,lo,t])) / 4    
            avrB[la,lo,t] = float(sum([varB1[la,lo,t],varB2[la,lo,t],varB3[la,lo,t],varB4[la,lo,t])) / 4                                
z = np.polyfit(avrA[:,:,:],avrB[:,:,:],1)
p = np.poly1d(z)
plt.plot(x,p(x))
print('y=%.6fx+(%.6f)'%(z[0],z[1])

This gives an error on the polyfit function ('expected 1D vector for x'). I do not find how to adjust polyfit to work with DataArrays - 3D. I found a related post: Applying numpy.polyfit to xarray Dataset but it didn't help me.

Any suggestions on a (better) way to approach this?

Thanks in advance!

来源:https://stackoverflow.com/questions/45863969/python-how-to-find-regression-equation-of-multiple-3d-lat-lon-time-value-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!