Writing xarray multiindex data in chunks

前端 未结 2 717
夕颜
夕颜 2021-02-20 01:11

I am trying to efficiently restructure a large multidimentional dataset. Let assume I have a number of remotely sensed images over time with a number of bands with coordinates x

2条回答
  •  你的背包
    2021-02-20 01:30

    I have a solution here (https://github.com/pydata/xarray/issues/1077#issuecomment-644803374) for writing multiindexed datasets to file.

    You'll have to manually "encode" the dataset into a form that can be written as netCDF. And then "decode" when you read it back.

    import numpy as np
    import pandas as pd
    import xarray as xr
    
    
    def encode_multiindex(ds, idxname):
        encoded = ds.reset_index(idxname)
        coords = dict(zip(ds.indexes[idxname].names, ds.indexes[idxname].levels))
        for coord in coords:
            encoded[coord] = coords[coord].values
        shape = [encoded.sizes[coord] for coord in coords]
        encoded[idxname] = np.ravel_multi_index(ds.indexes[idxname].codes, shape)
        encoded[idxname].attrs["compress"] = " ".join(ds.indexes[idxname].names)
        return encoded
    
    
    def decode_to_multiindex(encoded, idxname):
        names = encoded[idxname].attrs["compress"].split(" ")
        shape = [encoded.sizes[dim] for dim in names]
        indices = np.unravel_index(encoded.landpoint.values, shape)
        arrays = [encoded[dim].values[index] for dim, index in zip(names, indices)]
        mindex = pd.MultiIndex.from_arrays(arrays)
    
        decoded = xr.Dataset({}, {idxname: mindex})
        for varname in encoded.data_vars:
            if idxname in encoded[varname].dims:
                decoded[varname] = (idxname, encoded[varname].values)
        return decoded
    

提交回复
热议问题