When to use multiindexing vs. xarray in pandas

前端未结

关注

 1  587

The pandas pivot tables documentation seems to recomend dealing with more than two dimensions of data by using multiindexing:

In [1]: import pandas as pd

In [2


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  傲寒        
                
              
                            
                2021-02-01 08:00
              
            
            
                                                                       
There does seem to be a transition to xarray for doing work on multi-dimensional arrays. Pandas will be depreciating the support for the 3D Panels data structure and in the documentation even suggest using xarray for working with multidemensional arrays:


  'Oftentimes, one can simply use a MultiIndex DataFrame for easily
  working with higher dimensional data.
  
  In addition, the xarray package was built from the ground up,
  specifically in order to support the multi-dimensional analysis that
  is one of Panel s main use cases. Here is a link to the xarray
  panel-transition documentation.'


From the xarray documentation they state their aims and goals:


  xarray aims to provide a data analysis toolkit as powerful as pandas
  but designed for working with homogeneous N-dimensional arrays instead
  of tabular data...
  
  ...Our target audience is anyone who needs N-dimensional labelled
  arrays, but we are particularly focused on the data analysis needs of
  physical scientists – especially geoscientists who already know and
  love netCDF


The main advantage of xarray over using straight numpy is that it makes use of labels in the same way pandas does over multiple dimensions. 
If you are working with 3-dimensional data using multi-indexing or xarray might be interchangeable. As the number of dimensions grows in your data set xarray becomes much more manageable.
I cannot comment on how each performs in terms of efficiency or speed. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复