Pandas concat yields ValueError: Plan shapes are not aligned

后端 未结 6 685
忘掉有多难
忘掉有多难 2020-12-03 02:23

In pandas, I am attempting to concatenate a set of dataframes and I am getting this error:

ValueError: Plan shapes are not aligned

My underst

相关标签:
6条回答
  • 2020-12-03 02:56

    I recently got this message, too, and I found like user @jason and @user3805082 above that I had duplicate columns in several of the hundreds of dataframes I was trying to concat, each with dozens of enigmatic varnames. Manually searching for duplicates was not practical.

    In case anyone else has the same problem, I wrote the following function which might help out.

    def duplicated_varnames(df):
        """Return a dict of all variable names that 
        are duplicated in a given dataframe."""
        repeat_dict = {}
        var_list = list(df) # list of varnames as strings
        for varname in var_list:
            # make a list of all instances of that varname
            test_list = [v for v in var_list if v == varname] 
            # if more than one instance, report duplications in repeat_dict
            if len(test_list) > 1: 
                repeat_dict[varname] = len(test_list)
        return repeat_dict
    

    Then you can iterate over that dict to report how many duplicates there are, delete the duplicated variables, or rename them in some systematic way.

    0 讨论(0)
  • 2020-12-03 02:58

    In case it helps, I have also hit this error when I tried to concatenate two data frames (and as of the time of writing this is the only related hit I can find on google other than the source code).

    I don't know whether this answer would have solved the OP's problem (since he/she didn't post enough information), but for me, this was caused when I tried to concat dataframe df1 with columns ['A', 'B', 'B', 'C'] (see the duplicate column headings?) with dataframe df2 with columns ['A', 'B']. Understandably the duplication caused pandas to throw a wobbly. Change df1 to ['A', 'B', 'C'] (i.e. drop one of the duplicate columns) and everything works fine.

    0 讨论(0)
  • 2020-12-03 02:58

    You need to have the same header names for all the df you want to concat.

    Do it for example with :

    headername = list(df)

    Data = Data.filter(headername)

    0 讨论(0)
  • 2020-12-03 03:00

    Error is result of having duplicate columns. Use following function in order to remove duplicate function without impacting data.

    def duplicated_varnames(df):
        repeat_dict = {}
        var_list = list(df) # list of varnames as strings
        for varname in var_list:
            test_list = [v for v in var_list if v == varname] 
            if len(test_list) > 1: 
                repeat_dict[varname] = len(test_list)
            if len(repeat_dict)>0:
                df = df.loc[:,~df.columns.duplicated()]
        return df
    
    0 讨论(0)
  • 2020-12-03 03:09

    How to reproduce above error from pandas.concat(...):

    ValueError: Plan shapes are not aligned

    The Python (3.6.8) code:

    import pandas as pd
    df = pd.DataFrame({"foo": [3] })
    print(df)
    df2 = pd.concat([df, df], axis="columns")
    print(df2)
    df3 = pd.concat([df2, df], sort=False) #ValueError: Plan shapes are not aligned
    

    which prints:

       foo
    0    3
    
       foo  foo
    0    3    3
    ValueError: Plan shapes are not aligned
    

    Explanation of error

    If the first pandas dataframe (here df2) has a duplicate named column and is sent to pd.concat and the second dataframe isn't of the same dimension as the first, then you get this error.

    Solution

    Make sure there are no duplicate named columns:

    df_onefoo = pd.DataFrame({"foo": [3] })
    print(df_onefoo)
    df_onebar = pd.DataFrame({"bar": [3] })
    print(df_onebar)
    df2 = pd.concat([df_onefoo, df_onebar], axis="columns")
    print(df2)
    df3 = pd.concat([df2, df_onefoo], sort=False)
    print(df2)
    

    prints:

       foo
    0    3
    
       bar
    0    3
    
       foo  bar
    0    3    3
    
       foo  bar
    0    3    3
    

    Pandas concat could have been more helpful with that error message. It's a straight up bubbleup-implementation-itis, which is textbook python.

    0 讨论(0)
  • 2020-12-03 03:16

    Wrote a small function to concatenate duplicated column names. Function cares about sorting if original dataframe is unsorted, the output will be a sorted one.

    def concat_duplicate_columns(df):
        dupli = {}
        # populate dictionary with column names and count for duplicates 
        for column in df.columns:
            dupli[column] = dupli[column] + 1 if column in dupli.keys() else 1
        # rename duplicated keys with °°° number suffix
        for key, val in dict(dupli).items():
            del dupli[key]
            if val > 1:
                for i in range(val):
                    dupli[key+'°°°'+str(i)] = val
            else: dupli[key] = 1
        # rename columns so that we can now access abmigous column names
        # sorting in dict is the same as in original table
        df.columns = dupli.keys()
        # for each duplicated column name
        for i in set(re.sub('°°°(.*)','',j) for j in dupli.keys() if '°°°' in j):
            i = str(i)
            # for each duplicate of a column name
            for k in range(dupli[i+'°°°0']-1):
                # concatenate values in duplicated columns
                df[i+'°°°0'] = df[i+'°°°0'].astype(str) + df[i+'°°°'+str(k+1)].astype(str)
                # Drop duplicated columns from which we have aquired data
                df = df.drop(i+'°°°'+str(k+1), 1)
        # resort column names for proper mapping
        df = df.reindex_axis(sorted(df.columns), axis = 1)
        # rename columns
        df.columns = sorted(set(re.sub('°°°(.*)','',i) for i in dupli.keys()))
        return df
    
    0 讨论(0)
提交回复
热议问题