问题
I came across a problem using set_levels
of multi index
from io import StringIO
txt = '''Name,Height,Age
"",Metres,""
A,-1,25
B,95,-1'''
df = pd.read_csv(StringIO(txt),header=[0,1],na_values=['-1',''])
df.columns = df.columns.set_levels(df.columns.get_level_values(level=1).str.replace('Un.*',''),level=1)
Name Height Age
Metres
0 A NaN 25.0
1 B 95.0 NaN
If I run the same command again
df.columns = df.columns.set_levels(df.columns.get_level_values(level=1).str.replace('Un.*',''),level=1)
Name Height Age
Metres
0 A NaN 25.0
1 B 95.0 NaN
Now this is yielding the expected result. Why is this behaviour so? Is it possible to keep the labels unsorted at the first try itself ?
回答1:
I do not completely understand why this happens but I found what causes the problem and a solution:
If we look at the column labels we can see something weird
>>> df = pd.read_csv(StringIO(txt),header=[0,1],na_values=['-1',''])
>>> df.columns
MultiIndex(levels=[['Age', 'Height', 'Name'], ['Metres', 'Unnamed: 0_level_1', 'Unnamed: 2_level_1']],
labels=[[2, 1, 0], [1, 0, 2]])
The indices of the second level don't match the indices of the first layer. And when you replace the strings you do that on the array that is in the correct order:
>>> df.columns.get_level_values(level=1)
Index(['Unnamed: 0_level_1', 'Metres', 'Unnamed: 2_level_1'], dtype='object')
But you can get the array that's in the incorrect order just by indexing:
>>> df.columns.levels[1]
Index(['Metres', 'Unnamed: 0_level_1', 'Unnamed: 2_level_1'], dtype='object')
So to remove the Unnamed indices:
>>> df.columns = df.columns.set_levels(df.columns.levels[1].str.replace('Un.*', ''), level=1)
>>> df
Name Height Age
Metres
0 A NaN 25.0
1 B 95.0 NaN
However I would love for someone to point out why using the get_
and set_levels
has this behavior.
回答2:
Sounds like you need , This will modify base on your original structure
df.rename(columns=lambda x : '' if 'Unnamed' in x else x , level=1)
Out[106]:
Name Height Age
Metres
0 A NaN 25.0
1 B 95.0 NaN
来源:https://stackoverflow.com/questions/48061197/pandas-set-levels-how-to-avoid-sorting-of-labels