Python Pandas - Concat dataframes with different columns ignoring column names

前端 未结 3 417
粉色の甜心
粉色の甜心 2020-12-31 00:55

I have two pandas.DataFrames which I would like to combine into one. The dataframes have the same number of columns, in the same order, but have column headings in different

相关标签:
3条回答
  • 2020-12-31 01:41

    If the columns are always in the same order, you can mechanically rename the columns and the do an append like:

    Code:

    new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
    df_out = df_ger.append(df_uk.rename(columns=new_cols))
    

    Test Code:

    df_ger = pd.read_fwf(StringIO(
        u"""
            index  Datum   Zahl1   Zahl2
            0      1-1-17  1       2
            1      2-1-17  3       4"""),
        header=1).set_index('index')
    
    df_uk = pd.read_fwf(StringIO(
        u"""
            index  Date    No1     No2
            0      1-1-17  5       6
            1      2-1-17  7       8"""),
        header=1).set_index('index')
    
    print(df_uk)
    print(df_ger)
    
    new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
    df_out = df_ger.append(df_uk.rename(columns=new_cols))
    
    print(df_out)
    

    Results:

             Date  No1  No2
    index                  
    0      1-1-17    5    6
    1      2-1-17    7    8
    
            Datum  Zahl1  Zahl2
    index                      
    0      1-1-17      1      2
    1      2-1-17      3      4
    
            Datum  Zahl1  Zahl2
    index                      
    0      1-1-17      1      2
    1      2-1-17      3      4
    0      1-1-17      5      6
    1      2-1-17      7      8
    
    0 讨论(0)
  • 2020-12-31 01:43

    I am not sure if this will be simpler than what you had in mind, but if the main goal is for something general then this should be fine with one assumption: The columns in the two files match for example if date is the first column, the translated version will also be the first column.

    # number of columns
    n_columns = len(df_ger.columns)
    
    # save final columns names
    columns = df_uk.columns
    
    # rename both columns to numbers
    df_ger.columns = range(n_columns)
    df_uk.columns = range(n_columns)
    
    # concat columns
    df_out = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
    
    # rename columns in new dataframe
    df_out.columns = columns
    
    0 讨论(0)
  • 2020-12-31 01:55

    Provided you can be sure that the structures of the two dataframes remain the same, I see two options:

    1. Keep the dataframe column names of the chosen default language (I assume en_GB) and just copy them over:

      df_ger.columns = df_uk.columns
      df_combined = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
      

      This works whatever the column names are. However, technically it remains renaming.

    2. Pull the data out of the dataframe using numpy.ndarrays, concatenate them in numpy, and make a dataframe out of it again:

      np_ger_data = df_ger.as_matrix()
      np_uk_data = df_uk.as_matrix()
      np_combined_data = numpy.concatenate([np_ger_data, np_uk_data], axis=0)
      df_combined = pd.DataFrame(np_combined_data, columns=["Date", "No1", "No2"])
      

      This solution requires more resources, so I would opt for the first one.

    0 讨论(0)
提交回复
热议问题