Pandas: Change a specific column name in dataframe having multilevel columns

后端 未结 3 1489
别那么骄傲
别那么骄傲 2021-02-14 02:05

I want to find the way change name of specific column in a multilevel dataframe.

With this data:

data = {
    (\'A\', \'1\', \'I\'): [1, 2, 3, 4, 5], 
           


        
3条回答
  •  深忆病人
    2021-02-14 02:24

    This is my theory

    pandas does not want pd.Indexs to be mutable. We can see this if we try to change the first element of the index ourselves

    dataDF.columns[0] = ('Z', '100', 'Z')
    
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
     in ()
    ----> 1 dataDF.columns[0] = ('Z', '100', 'Z')
    
    //anaconda/envs/3.5/lib/python3.5/site-packages/pandas/indexes/base.py in __setitem__(self, key, value)
       1372 
       1373     def __setitem__(self, key, value):
    -> 1374         raise TypeError("Index does not support mutable operations")
       1375 
       1376     def __getitem__(self, key):
    
    TypeError: Index does not support mutable operations
    

    But pandas can't control what you do the values attribute.

    dataDF.columns.values[0] = ('Z', '100', 'Z')
    

    we see that dataDF.columns looks the same, but dataDF.columns.values clearly reflects the change. Unfortunately, df.columns.values isn't what shows up on the display of the dataframe.


    On the other hand, this really does seem like it should work. The fact that it doesn't feels wrong to me.

    dataDF.rename(columns={('A', '1', 'I'): ('Z', '100', 'Z')}, inplace=True)
    

    I believe the reason this only works after having changed the values, is that rename is forcing the reconstruction of the columns by looking at the values. Since we change the values, it now works. This is exceptionally kludgy and I don't recommend building a process that relies on this.


    my recommendation

    • identify location of column name you want to change
    • assign name of column to the array of values
    • build new columns from scratch, explicity

    from_col = ('A', '1', 'I')
    to_col = ('Z', '100', 'Z')
    colloc = dataDF.columns.get_loc(from_col)
    cvals = dataDF.columns.values
    cvals[colloc] = to_col
    
    dataDF.columns = pd.MultiIndex.from_tuples(cvals.tolist())
    
    dataDF
    
    [![enter code here][1]][1]
    

提交回复
热议问题