I do as below:
data1 = pd.DataFrame({ \'b\' : [1, 1, 1], \'a\' : [2, 2, 2]})
data2 = pd.DataFrame({ \'b\' : [1, 1, 1], \'a\' : [2, 2, 2]})
frames = [data1, data2
Starting from version 0.23.0, you can prevent the concat() method to sort the returned DataFrame. For example:
df1 = pd.DataFrame({ 'a' : [1, 1, 1], 'b' : [2, 2, 2]})
df2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
df = pd.concat([df1, df2], sort=False)
A future version of pandas will change to not sort by default.
Simplest way is firstly make the columns same order then concat:
df2=df2[df1.columns]
df=pd.concat((df1,df2),axis=0)
you can also specify the order like this :
import pandas as pd
data1 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
data2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
listdf = [data1, data2]
data = pd.concat(listdf)
sequence = ['b','a']
data = data.reindex(columns=sequence)
You are creating DataFrames out of dictionaries. Dictionaries are a unordered which means the keys do not have a specific order. So
d1 = {'key_a': 'val_a', 'key_b': 'val_b'}
and
d2 = {'key_b': 'val_b', 'key_a': 'val_a'}
are (probably) the same.
In addition to that I assume that pandas sorts the dictionary's keys descending by default (unfortunately I did not find any hint in the docs in order to prove that assumption) leading to the behavior you encountered.
So the basic motivation would be to resort / reorder the columns in your DataFrame. You can do this as follows:
import pandas as pd
data1 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
data2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
frames = [data1, data2]
data = pd.concat(frames)
print(data)
cols = ['b' , 'a']
data = data[cols]
print(data)
You can create the original DataFrames with OrderedDicts
from collections import OrderedDict
odict = OrderedDict()
odict['b'] = [1, 1, 1]
odict['a'] = [2, 2, 2]
data1 = pd.DataFrame(odict)
data2 = pd.DataFrame(odict)
frames = [data1, data2]
data = pd.concat(frames)
data
b a
0 1 2
1 1 2
2 1 2
0 1 2
1 1 2
2 1 2
def concat_ordered_columns(frames):
columns_ordered = []
for frame in frames:
columns_ordered.extend(x for x in frame.columns if x not in columns_ordered)
final_df = pd.concat(frames)
return final_df[columns_ordered]
# Usage
dfs = [df_a,df_b,df_c]
full_df = concat_ordered_columns(dfs)
This should work.