Renaming columns in a Pandas dataframe with duplicate column names?

后端 未结 4 904
面向向阳花
面向向阳花 2021-01-04 20:19

I have a df X with columns with duplicate names:

In [77]: X_R
Out[77]: 
      dollars  dollars
   0   0.7085   0.5000

I want to rename it

相关标签:
4条回答
  • 2021-01-04 21:02

    Here is another dynamic solution that I think is nicer

    In [59]: df
    Out[59]:
       a  x  x  x  z
    0  6  2  7  7  8
    1  6  6  3  1  1
    2  6  6  7  5  6
    3  8  3  6  1  8
    4  5  7  5  3  0
    In [61]: class renamer():
                 def __init__(self):
                      self.d = dict()
    
                  def __call__(self, x):
                      if x not in self.d:
                          self.d[x] = 0
                          return x
                      else:
                          self.d[x] += 1
                          return "%s_%d" % (x, self.d[x])
    
              df.rename(columns=renamer())
    Out[61]:
       a  x  x_1  x_2  z
    0  6   2   7   7  8
    1  6   6   3   1  1
    2  6   6   7   5  6
    3  8   3   6   1  8
    4  5   7   5   3  0
    
    0 讨论(0)
  • 2021-01-04 21:03

    MaxU's answer helped me with this same problem. In this answer, I add in a way to find those duplicated column headers.

    First, we make a dictionary of the duplicated column names with values corresponding to the desired new column names. For this, the defaultdict subclass is required.

    import pandas as pd
    from collections import defaultdict
    
    renamer = defaultdict()
    

    We iterate over the duplicated column names to create a dictionary with keys being the duplicated column name and values being a list of new column names. I have chosen this list to be original name_0, original name_1, and so on.

    for column_name in df.columns[df.columns.duplicated(keep=False)].tolist():
        if column_name not in renamer:
            renamer[column_name] = [column_name+'_0']
        else:
            renamer[column_name].append(column_name +'_'+str(len(renamer[column_name])))
    
    print(renamer)
    defaultdict(None, {'b': ['b_0', 'b_1', 'b_2', 'b_3'], 'c': ['c_0', 'c_1']})
    

    Original dataframe:

    print(df)
            a   b   b   b   b   c   c   d
    Item 0  2   1   0   2   8   3   9   5
    Item 1  3   2   7   3   5   4   6   2
    Item 2  4   3   8   1   5   7   4   4
    Item 3  5   5   3   6   0   5   2   5
    

    Rename the duplicated columns by assigning the new names from our renamer defaultdict, leaving the unduplicated columns alone

    df.rename(
        columns=lambda column_name: renamer[column_name].pop(0)
        if column_name in renamer 
        else column_name
    )
            a   b_0 b_1 b_2 b_3 c_0 c_1 d
    Item 0  2   1   0   2   8   3   9   5
    Item 1  3   2   7   3   5   4   6   2
    Item 2  4   3   8   1   5   7   4   4
    Item 3  5   5   3   6   0   5   2   5
    

    (As a sidenote, a couple of people have questioned why duplicated column names existed in the first place. For myself, I encountered duplicated column names when importing with the xlwings package (to deal with password-protected Excel files). You could also inadvertently create duplicate column names by using pd.concat.

    0 讨论(0)
  • 2021-01-04 21:10
    X_R.columns = ['Retail','Cost']
    
    0 讨论(0)
  • 2021-01-04 21:10

    Here is a dynamic solution:

    In [59]: df
    Out[59]:
       a  x  x  x  z
    0  6  2  7  7  8
    1  6  6  3  1  1
    2  6  6  7  5  6
    3  8  3  6  1  8
    4  5  7  5  3  0
    
    In [60]: d
    Out[60]: {'x': ['x1', 'x2', 'x3']}
    
    In [61]: df.rename(columns=lambda c: d[c].pop(0) if c in d.keys() else c)
    Out[61]:
       a  x1  x2  x3  z
    0  6   2   7   7  8
    1  6   6   3   1  1
    2  6   6   7   5  6
    3  8   3   6   1  8
    4  5   7   5   3  0
    
    0 讨论(0)
提交回复
热议问题