Column order in pandas.concat

后端未结

关注

 6  1671

I do as below:

data1 = pd.DataFrame({ \'b\' : [1, 1, 1], \'a\' : [2, 2, 2]})
data2 = pd.DataFrame({ \'b\' : [1, 1, 1], \'a\' : [2, 2, 2]})
frames = [data1, data2


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  佛祖请我去吃肉        
                
              
                            
                2021-02-04 00:50
              
            
            
                                                                       
Starting from version 0.23.0, you can prevent the concat() method to sort the returned DataFrame.  For example:

df1 = pd.DataFrame({ 'a' : [1, 1, 1], 'b' : [2, 2, 2]})
df2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
df = pd.concat([df1, df2], sort=False)


A future version of pandas will change to not sort by default.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  悲哀的现实        
                
              
                            
                2021-02-04 00:50
              
            
            
                                                                       
Simplest way is firstly make the columns same order then concat:

df2=df2[df1.columns]
df=pd.concat((df1,df2),axis=0)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我寻月下人不归        
                
              
                            
                2021-02-04 00:55
              
            
            
                                                                       
you can also specify the order like this : 

import pandas as pd

data1 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
data2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
listdf = [data1, data2]
data = pd.concat(listdf)
sequence = ['b','a']
data = data.reindex(columns=sequence)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野性不改        
                
              
                            
                2021-02-04 00:56
              
            
            
                                                                       
You are creating DataFrames out of dictionaries. Dictionaries are a unordered which means the keys do not have a specific order. So
d1 = {'key_a': 'val_a', 'key_b': 'val_b'}

and
d2 = {'key_b': 'val_b', 'key_a': 'val_a'}

are (probably) the same.
In addition to that I assume that pandas sorts the dictionary's keys descending by default (unfortunately I did not find any hint in the docs in order to prove that assumption) leading to the behavior you encountered.
So the basic motivation would be to resort / reorder the columns in your DataFrame. You can do this as follows:
import pandas as pd

data1 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
data2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
frames = [data1, data2]
data = pd.concat(frames)

print(data)

cols = ['b' , 'a']
data = data[cols]

print(data)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情书的邮戳        
                
              
                            
                2021-02-04 00:56
              
            
            
                                                                       
You can create the original DataFrames with OrderedDicts

from collections import OrderedDict

odict = OrderedDict()
odict['b'] = [1, 1, 1]
odict['a'] = [2, 2, 2]
data1 = pd.DataFrame(odict)
data2 = pd.DataFrame(odict)
frames = [data1, data2]
data = pd.concat(frames)
data


    b    a
0   1    2
1   1    2
2   1    2
0   1    2
1   1    2
2   1    2

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  后悔当初        
                
              
                            
                2021-02-04 00:57
              
            
            
                                                                       
def concat_ordered_columns(frames):
    columns_ordered = []
    for frame in frames:
        columns_ordered.extend(x for x in frame.columns if x not in columns_ordered)
    final_df = pd.concat(frames)    
    return final_df[columns_ordered]       

# Usage
dfs = [df_a,df_b,df_c]
full_df = concat_ordered_columns(dfs)


This should work.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复