Column order in pandas.concat

后端 未结 6 1671
既然无缘
既然无缘 2021-02-04 00:26

I do as below:

data1 = pd.DataFrame({ \'b\' : [1, 1, 1], \'a\' : [2, 2, 2]})
data2 = pd.DataFrame({ \'b\' : [1, 1, 1], \'a\' : [2, 2, 2]})
frames = [data1, data2         


        
相关标签:
6条回答
  • 2021-02-04 00:50

    Starting from version 0.23.0, you can prevent the concat() method to sort the returned DataFrame. For example:

    df1 = pd.DataFrame({ 'a' : [1, 1, 1], 'b' : [2, 2, 2]})
    df2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
    df = pd.concat([df1, df2], sort=False)
    

    A future version of pandas will change to not sort by default.

    0 讨论(0)
  • 2021-02-04 00:50

    Simplest way is firstly make the columns same order then concat:

    df2=df2[df1.columns]
    df=pd.concat((df1,df2),axis=0)
    
    0 讨论(0)
  • 2021-02-04 00:55

    you can also specify the order like this :

    import pandas as pd
    
    data1 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
    data2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
    listdf = [data1, data2]
    data = pd.concat(listdf)
    sequence = ['b','a']
    data = data.reindex(columns=sequence)
    
    0 讨论(0)
  • 2021-02-04 00:56

    You are creating DataFrames out of dictionaries. Dictionaries are a unordered which means the keys do not have a specific order. So

    d1 = {'key_a': 'val_a', 'key_b': 'val_b'}
    

    and

    d2 = {'key_b': 'val_b', 'key_a': 'val_a'}
    

    are (probably) the same.

    In addition to that I assume that pandas sorts the dictionary's keys descending by default (unfortunately I did not find any hint in the docs in order to prove that assumption) leading to the behavior you encountered.

    So the basic motivation would be to resort / reorder the columns in your DataFrame. You can do this as follows:

    import pandas as pd
    
    data1 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
    data2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
    frames = [data1, data2]
    data = pd.concat(frames)
    
    print(data)
    
    cols = ['b' , 'a']
    data = data[cols]
    
    print(data)
    
    0 讨论(0)
  • 2021-02-04 00:56

    You can create the original DataFrames with OrderedDicts

    from collections import OrderedDict
    
    odict = OrderedDict()
    odict['b'] = [1, 1, 1]
    odict['a'] = [2, 2, 2]
    data1 = pd.DataFrame(odict)
    data2 = pd.DataFrame(odict)
    frames = [data1, data2]
    data = pd.concat(frames)
    data
    
    
        b    a
    0   1    2
    1   1    2
    2   1    2
    0   1    2
    1   1    2
    2   1    2
    
    0 讨论(0)
  • 2021-02-04 00:57
    def concat_ordered_columns(frames):
        columns_ordered = []
        for frame in frames:
            columns_ordered.extend(x for x in frame.columns if x not in columns_ordered)
        final_df = pd.concat(frames)    
        return final_df[columns_ordered]       
    
    # Usage
    dfs = [df_a,df_b,df_c]
    full_df = concat_ordered_columns(dfs)
    

    This should work.

    0 讨论(0)
提交回复
热议问题