Pandas concat failing

后端 未结 4 596
无人及你
无人及你 2020-12-03 10:24

I am trying to concat dataframes based on the foll. 2 csv files:

df_a: https://www.dropbox.com/s/slcu7o7yyottujl/df_current.csv?dl=0

df_b: https://www.dropbo

相关标签:
4条回答
  • 2020-12-03 10:30

    I believe that this error occurs if the following two conditions are met:

    1. The data frames have different columns. (i.e. (df1.columns == df2.columns) is False
    2. The columns has a repeated value.

    Basically if you concat dataframes with columns [A,B,C] and [B,C,D] it can work out to make one series for each distinct column name. So if I try to join a third dataframe [B,B,C] it does not know which column to append and ends up with fewer distinct columns than it thinks it needs.

    If your dataframes are such that df1.columns == df2.columns then it will work anyway. So you can join [B,B,C] to [B,B,C], but not to [C,B,B], as if the columns are identical it probably just uses the integer indexes or something.

    0 讨论(0)
  • 2020-12-03 10:34

    You can get around this issue with a 'manual' concatenation, in this case your

    list_of_dfs = [df_a, df_b]
    

    And instead of running

    giant_concat_df = pd.concat(list_of_dfs,0)
    

    You can use turn all of the dataframes to a list of dictionaries and then make a new data frame from these lists (merged with chain)

    from itertools import chain
    list_of_dicts = [cur_df.T.to_dict().values() for cur_df in list_of_dfs]    
    giant_concat_df = pd.DataFrame(list(chain(*list_of_dicts)))
    
    0 讨论(0)
  • 2020-12-03 10:38

    The answers here did not solve my issue, but this answer did.

    The Issue was duplicated columns in one or both DataFrames.

    Here's a duplicated column fix(as per answer above):

    df = df.loc[:,~df.columns.duplicated()]
    
    0 讨论(0)
  • 2020-12-03 10:53

    Unfortunately, the source files are already unavailable, so I can't check my solution in your case. In my case the error occurred when:

    1. Data frames have two columns with the same name (I've had ID and id columns, which I then converted to lower case, so they become the same)
    2. Value types of the same-named columns are different

    Here is an example which gives me the error in question:

    df1 = pd.DataFrame(data=[
        ['a', 'b', 'id', 1],
        ['a', 'b', 'id', 2]
    ], columns=['A', 'B', 'id', 'id'])
    
    df2 = pd.DataFrame(data=[
        ['b', 'c', 'id', 1],
        ['b', 'c', 'id', 2]
    ], columns=['B', 'C', 'id', 'id'])
    pd.concat([df1, df2])
    >>> AssertionError: Number of manager items must equal union of block items
     # manager items: 4, # tot_items: 5
    

    Removing / renaming one of the columns makes this code work.

    0 讨论(0)
提交回复
热议问题