Plotting 2D data using Xarray takes a surprisingly long time?

半腔热情 提交于 2020-01-25 07:12:41

问题


I am reading NetCDF files using xarray. Each variable have 4 dimensions (Times, lev, y, x). After reading the variable, I calculate the mean of the variable QVAPOR along (Times,lev) dimensions. After calculation I get variable QVAPOR_mean which is a 2D variable with shape (y: 699, x: 639).

Xarray took only 10micro seconds to read the data with shape (Times:2918, lev:36, y:699, x:639); but took more than 60 minutes to plot the filled contour of the data of shape (y: 699, x: 639).

I am wondering how come Xarray is taking extremely long time (more than 60 mins) to plot the contourf of array with size (y: 699, x: 639).

I use following code for reading the files and perform computation.

flnm=xr.open_mfdataset('./WRF_3D_2007_*.nc',chunks={'Times': 100})
QVAPOR_mean=flnm.QVAPOR.mean(dim=('Times','lev')
QVAPOR_mean.plot.imshow()

The last command takes more than 60 mins to complete. Help is appreciated. Thank You


回答1:


When you open your dataset and provide the chunks argument, xarray is returning a Dataset that is comprised of dask arrays. These arrays are evaluated "lazily" (xarray/dask documentation). It is not until you plot your data that the computation is triggered. To illustrate this, you can explicitly load your data after you take the mean:

flnm=xr.open_mfdataset('./WRF_3D_2007_*.nc',chunks={'Times': 100})
QVAPOR_mean=flnm.QVAPOR.mean(dim=('Times','lev').load()

Now your QVAPOR_mean variable is backed by a numpy array instead of a dask array. Plotting this array will likely be much faster.

However, the computation of your mean is likely to still take quite a long time. There are ways improve the throughput here as well.

  • Try using a larger chunk size. I often find that chunk sizes in the 10-100Mb range perform best.

  • Try a different scheduler. You are by default using dask's threaded scheduler. Because of limitations with netCDF/HDF, this does not allow for parallel reads from disk. We have been finding that the distributed scheduler works well for these applications.



来源:https://stackoverflow.com/questions/49271716/plotting-2d-data-using-xarray-takes-a-surprisingly-long-time

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!