I have two pandas.DataFrames which I would like to combine into one. The dataframes have the same number of columns, in the same order, but have column headings in different
If the columns are always in the same order, you can mechanically rename the columns and the do an append like:
new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
df_out = df_ger.append(df_uk.rename(columns=new_cols))
df_ger = pd.read_fwf(StringIO(
u"""
index Datum Zahl1 Zahl2
0 1-1-17 1 2
1 2-1-17 3 4"""),
header=1).set_index('index')
df_uk = pd.read_fwf(StringIO(
u"""
index Date No1 No2
0 1-1-17 5 6
1 2-1-17 7 8"""),
header=1).set_index('index')
print(df_uk)
print(df_ger)
new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
df_out = df_ger.append(df_uk.rename(columns=new_cols))
print(df_out)
Date No1 No2
index
0 1-1-17 5 6
1 2-1-17 7 8
Datum Zahl1 Zahl2
index
0 1-1-17 1 2
1 2-1-17 3 4
Datum Zahl1 Zahl2
index
0 1-1-17 1 2
1 2-1-17 3 4
0 1-1-17 5 6
1 2-1-17 7 8
I am not sure if this will be simpler than what you had in mind, but if the main goal is for something general then this should be fine with one assumption: The columns in the two files match for example if date is the first column, the translated version will also be the first column.
# number of columns
n_columns = len(df_ger.columns)
# save final columns names
columns = df_uk.columns
# rename both columns to numbers
df_ger.columns = range(n_columns)
df_uk.columns = range(n_columns)
# concat columns
df_out = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
# rename columns in new dataframe
df_out.columns = columns
Provided you can be sure that the structures of the two dataframes remain the same, I see two options:
Keep the dataframe column names of the chosen default language (I assume en_GB) and just copy them over:
df_ger.columns = df_uk.columns
df_combined = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
This works whatever the column names are. However, technically it remains renaming.
Pull the data out of the dataframe using numpy.ndarrays, concatenate them in numpy, and make a dataframe out of it again:
np_ger_data = df_ger.as_matrix()
np_uk_data = df_uk.as_matrix()
np_combined_data = numpy.concatenate([np_ger_data, np_uk_data], axis=0)
df_combined = pd.DataFrame(np_combined_data, columns=["Date", "No1", "No2"])
This solution requires more resources, so I would opt for the first one.