Calculating percentile for each gridpoint in xarray

前端 未结 2 1644
离开以前
离开以前 2021-01-28 03:02

I am currently using xarray to make probability maps. I want to use a statistical assessment like a “counting” exercise. Meaning, for all data points in NEU count how many time

2条回答
  •  走了就别回头了
    2021-01-28 03:30

    I'm not sure how you want to process quantiles, but here is a version from which you may be able to adapt.

    Also, I chose to keep the dataset structure when computing the quantiles, as it shows how to retrieve the values of the outliers if this is ever relevant (and it is one step away from retrieving the values of valid data points, which is likely relevant).

    1. Create some data

    coords = ("time", "latitude", "longitude")
    sizes = (500, 80, 120)
    
    ds = xr.Dataset(
        coords={c: np.arange(s) for c, s in zip(coords, sizes)},
        data_vars=dict(
            precipitation=(coords, np.random.randn(*sizes)),
            temperature=(coords, np.random.randn(*sizes)),
        ),
    )
    

    View of the data:

    
    Dimensions:        (latitude: 80, longitude: 120, time: 500)
    Coordinates:
      * time           (time) int64 0 1 2 3 ... 496 497 498 499
      * latitude       (latitude) int64 0 1 2 3 ... 76 77 78 79
      * longitude      (longitude) int64 0 1 2 3 ... 117 118 119
    Data variables:
        precipitation  (time, latitude, longitude) float64 -1.673 ... -0.3323
        temperature    (time, latitude, longitude) float64 -0.331 ... -0.03728
    

    2. Compute quantiles

    qt_dims = ("latitude", "longitude")
    qt_values = (0.1, 0.9)
    
    ds_qt = ds.quantile(qt_values, dim=qt_dims)
    

    It is a Dataset, with dimensions of analysis ("latitude", "longitude") lost, and with a new "quantile" dimension:

    
    Dimensions:        (quantile: 2, time: 500)
    Coordinates:
      * time           (time) int64 0 1 2 3 ... 496 497 498 499
      * quantile       (quantile) float64 0.1 0.9
    Data variables:
        precipitation  (quantile, time) float64 -1.305 ... 1.264
        temperature    (quantile, time) float64 -1.267 ... 1.254
    

    3. Compute outliers co-occurrence

    For the locations of outliers: (edit: use of np.logical_and, more readable than the & operator)

    da_outliers_loc = np.logical_and(
        ds.precipitation > ds_qt.precipitation.sel(quantile=qt_values[0]),
        ds.temperature > ds_qt.temperature.sel(quantile=qt_values[1]),
    )
    

    The output is a boolean DataArray:

    
    array([[[False, ...]]])
    Coordinates:
      * time       (time) int64 0 1 2 3 4 ... 496 497 498 499
      * latitude   (latitude) int64 0 1 2 3 4 ... 75 76 77 78 79
      * longitude  (longitude) int64 0 1 2 3 ... 116 117 118 119
    

    And if ever the values are relevant:

    ds_outliers = ds.where(
        (ds.precipitation > ds_qt.precipitation.sel(quantile=qt_values[0]))
        & (ds.temperature > ds_qt.temperature.sel(quantile=qt_values[1]))
    )
    

    4. Count outliers per timestep

    outliers_count = da_outliers_loc.sum(dim=qt_dims)
    

    Finally, here is the DataArray with only a time dimension, and having for values the number of outliers at each timestamp.

    
    array([857, ...])
    Coordinates:
      * time     (time) int64 0 1 2 3 4 ... 495 496 497 498 499
    

提交回复
热议问题