Pandas set_levels, how to avoid sorting of labels?

╄→гoц情女王★ 提交于 2019-12-12 09:47:29

问题


I came across a problem using set_levels of multi index

from io import StringIO

txt = '''Name,Height,Age
"",Metres,""
A,-1,25
B,95,-1'''

df = pd.read_csv(StringIO(txt),header=[0,1],na_values=['-1',''])

df.columns = df.columns.set_levels(df.columns.get_level_values(level=1).str.replace('Un.*',''),level=1)

     Name Height   Age
   Metres             
0      A    NaN  25.0
1      B   95.0   NaN

If I run the same command again

df.columns = df.columns.set_levels(df.columns.get_level_values(level=1).str.replace('Un.*',''),level=1)

  Name Height   Age
       Metres      
0    A    NaN  25.0
1    B   95.0   NaN

Now this is yielding the expected result. Why is this behaviour so? Is it possible to keep the labels unsorted at the first try itself ?


回答1:


I do not completely understand why this happens but I found what causes the problem and a solution:

If we look at the column labels we can see something weird

>>> df = pd.read_csv(StringIO(txt),header=[0,1],na_values=['-1',''])
>>> df.columns
MultiIndex(levels=[['Age', 'Height', 'Name'], ['Metres', 'Unnamed: 0_level_1', 'Unnamed: 2_level_1']],
           labels=[[2, 1, 0], [1, 0, 2]])

The indices of the second level don't match the indices of the first layer. And when you replace the strings you do that on the array that is in the correct order:

>>> df.columns.get_level_values(level=1)
Index(['Unnamed: 0_level_1', 'Metres', 'Unnamed: 2_level_1'], dtype='object')

But you can get the array that's in the incorrect order just by indexing:

>>> df.columns.levels[1]
Index(['Metres', 'Unnamed: 0_level_1', 'Unnamed: 2_level_1'], dtype='object')

So to remove the Unnamed indices:

>>> df.columns = df.columns.set_levels(df.columns.levels[1].str.replace('Un.*', ''), level=1)
>>> df

  Name Height   Age
       Metres
0    A    NaN  25.0
1    B   95.0   NaN

However I would love for someone to point out why using the get_ and set_levels has this behavior.




回答2:


Sounds like you need , This will modify base on your original structure

df.rename(columns=lambda x : '' if 'Unnamed' in x else x , level=1)
Out[106]: 
  Name Height   Age
       Metres      
0    A    NaN  25.0
1    B   95.0   NaN


来源:https://stackoverflow.com/questions/48061197/pandas-set-levels-how-to-avoid-sorting-of-labels

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!