问题
I am reading NetCDF files using xarray. Each variable have 4 dimensions (Times, lev, y, x). After reading the variable, I calculate the mean of the variable QVAPOR along (Times,lev) dimensions. After calculation I get variable QVAPOR_mean which is a 2D variable with shape (y: 699, x: 639).
Xarray took only 10micro seconds to read the data with shape (Times:2918, lev:36, y:699, x:639); but took more than 60 minutes to plot the filled contour of the data of shape (y: 699, x: 639).
I am wondering how come Xarray is taking extremely long time (more than 60 mins) to plot the contourf of array with size (y: 699, x: 639).
I use following code for reading the files and perform computation.
flnm=xr.open_mfdataset('./WRF_3D_2007_*.nc',chunks={'Times': 100})
QVAPOR_mean=flnm.QVAPOR.mean(dim=('Times','lev')
QVAPOR_mean.plot.imshow()
The last command takes more than 60 mins to complete. Help is appreciated. Thank You
回答1:
When you open your dataset and provide the chunks
argument, xarray is returning a Dataset
that is comprised of dask arrays. These arrays are evaluated "lazily" (xarray/dask documentation). It is not until you plot your data that the computation is triggered. To illustrate this, you can explicitly load your data after you take the mean:
flnm=xr.open_mfdataset('./WRF_3D_2007_*.nc',chunks={'Times': 100})
QVAPOR_mean=flnm.QVAPOR.mean(dim=('Times','lev').load()
Now your QVAPOR_mean
variable is backed by a numpy array instead of a dask array. Plotting this array will likely be much faster.
However, the computation of your mean
is likely to still take quite a long time. There are ways improve the throughput here as well.
Try using a larger chunk size. I often find that chunk sizes in the 10-100Mb range perform best.
Try a different scheduler. You are by default using dask's threaded scheduler. Because of limitations with netCDF/HDF, this does not allow for parallel reads from disk. We have been finding that the
distributed
scheduler works well for these applications.
来源:https://stackoverflow.com/questions/49271716/plotting-2d-data-using-xarray-takes-a-surprisingly-long-time