When to use multiindexing vs. xarray in pandas

前端 未结 1 580
隐瞒了意图╮
隐瞒了意图╮ 2021-02-01 07:14

The pandas pivot tables documentation seems to recomend dealing with more than two dimensions of data by using multiindexing:

In [1]: import pandas as pd

In [2         


        
相关标签:
1条回答
  • 2021-02-01 08:00

    There does seem to be a transition to xarray for doing work on multi-dimensional arrays. Pandas will be depreciating the support for the 3D Panels data structure and in the documentation even suggest using xarray for working with multidemensional arrays:

    'Oftentimes, one can simply use a MultiIndex DataFrame for easily working with higher dimensional data.

    In addition, the xarray package was built from the ground up, specifically in order to support the multi-dimensional analysis that is one of Panel s main use cases. Here is a link to the xarray panel-transition documentation.'

    From the xarray documentation they state their aims and goals:

    xarray aims to provide a data analysis toolkit as powerful as pandas but designed for working with homogeneous N-dimensional arrays instead of tabular data...

    ...Our target audience is anyone who needs N-dimensional labelled arrays, but we are particularly focused on the data analysis needs of physical scientists – especially geoscientists who already know and love netCDF

    The main advantage of xarray over using straight numpy is that it makes use of labels in the same way pandas does over multiple dimensions. If you are working with 3-dimensional data using multi-indexing or xarray might be interchangeable. As the number of dimensions grows in your data set xarray becomes much more manageable. I cannot comment on how each performs in terms of efficiency or speed.

    0 讨论(0)
提交回复
热议问题