Calculating percentile for each gridpoint in xarray

前端未结

关注

 2  1645

I am currently using xarray to make probability maps. I want to use a statistical assessment like a “counting” exercise. Meaning, for all data points in NEU count how many time

1. Create some data

coords = ("time", "latitude", "longitude")
sizes = (500, 80, 120)

ds = xr.Dataset(
    coords={c: np.arange(s) for c, s in zip(coords, sizes)},
    data_vars=dict(
        precipitation=(coords, np.random.randn(*sizes)),
        temperature=(coords, np.random.randn(*sizes)),
    ),
)

View of the data:

<xarray.Dataset>
Dimensions:        (latitude: 80, longitude: 120, time: 500)
Coordinates:
  * time           (time) int64 0 1 2 3 ... 496 497 498 499
  * latitude       (latitude) int64 0 1 2 3 ... 76 77 78 79
  * longitude      (longitude) int64 0 1 2 3 ... 117 118 119
Data variables:
    precipitation  (time, latitude, longitude) float64 -1.673 ... -0.3323
    temperature    (time, latitude, longitude) float64 -0.331 ... -0.03728

2. Compute quantiles

qt_dims = ("latitude", "longitude")
qt_values = (0.1, 0.9)

ds_qt = ds.quantile(qt_values, dim=qt_dims)

It is a Dataset, with dimensions of analysis ("latitude", "longitude") lost, and with a new "quantile" dimension:

<xarray.Dataset>
Dimensions:        (quantile: 2, time: 500)
Coordinates:
  * time           (time) int64 0 1 2 3 ... 496 497 498 499
  * quantile       (quantile) float64 0.1 0.9
Data variables:
    precipitation  (quantile, time) float64 -1.305 ... 1.264
    temperature    (quantile, time) float64 -1.267 ... 1.254

3. Compute outliers co-occurrence

For the locations of outliers: (edit: use of np.logical_and, more readable than the & operator)

da_outliers_loc = np.logical_and(
    ds.precipitation > ds_qt.precipitation.sel(quantile=qt_values[0]),
    ds.temperature > ds_qt.temperature.sel(quantile=qt_values[1]),
)

The output is a boolean DataArray:

<xarray.DataArray (time: 500, latitude: 80, longitude: 120)>
array([[[False, ...]]])
Coordinates:
  * time       (time) int64 0 1 2 3 4 ... 496 497 498 499
  * latitude   (latitude) int64 0 1 2 3 4 ... 75 76 77 78 79
  * longitude  (longitude) int64 0 1 2 3 ... 116 117 118 119

And if ever the values are relevant:

ds_outliers = ds.where(
    (ds.precipitation > ds_qt.precipitation.sel(quantile=qt_values[0]))
    & (ds.temperature > ds_qt.temperature.sel(quantile=qt_values[1]))
)

4. Count outliers per timestep

outliers_count = da_outliers_loc.sum(dim=qt_dims)

Finally, here is the DataArray with only a time dimension, and having for values the number of outliers at each timestamp.

<xarray.DataArray (time: 500)>
array([857, ...])
Coordinates:
  * time     (time) int64 0 1 2 3 4 ... 495 496 497 498 499

0 讨论(0)

旧时难觅i

2021-01-28 03:35
np.nanpercentile by default works on a flattened array, however, in this case, the goal is to reduce only the first dimension generating a 2D array containing the result at each gridpoint. To achieve this, the axis argument of nanpercentile can be used:
```
np.nanpercentile(NEU.rr, 1, axis=0)
```
This however will remove the labeled dimensions and coordinates. It is to preserve the dims and coords that apply_ufunc has to be used, it does not vectorize the functions for you.
```
xr.apply_ufunc(
    lambda x: np.nanpercentile(x, 1, axis=-1), NEU.rr, input_core_dims=[["time"]]
)
```
Note how now the axis is -1 and we are using input_core_dims which tells apply_ufunc this dimension will be reduced and also moves it to the last position (hence the -1). For a more detailed explanation on apply_ufunc, this other answer may help.
0 讨论(0)
发布评论:

提交评论
- 加载中...