Pandas Replace All But Middle Values per Category of a Level with Blank

拜拜、爱过 提交于 2019-12-12 03:28:38

问题


Given the following pivot table:

df=pd.DataFrame({'A':['a','a','a','a','a','b','b','b','b'],
                 'B':['x','y','z','x','y','z','x','y','z'],
                 'C':['a','b','a','b','a','b','a','b','a'],
                 'D':[7,5,3,4,1,6,5,3,1]})
table = pd.pivot_table(df, index=['A', 'B','C'],aggfunc='sum')
table

            D
A   B   C   
a   x   a   7
        b   4
    y   a   1
        b   5
    z   a   3
b   x   a   5
    y   b   3
    z   a   1
        b   6

I know that I can access the values of each level like so:

In [128]:    
table.index.get_level_values('B')

Out[128]:
Index(['x', 'x', 'y', 'y', 'z', 'x', 'y', 'z', 'z'], dtype='object', name='B')

In [129]:
table.index.get_level_values('A')

Out[129]:
Index(['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], dtype='object', name='A')

Next, I'd like to replace all values in each of the outer levels with blank ('') save for the middle or n/2+1 values.

So that:

Index(['x', 'x', 'y', 'y', 'z', 'x', 'y', 'z', 'z'], dtype='object', name='B')

becomes:

Index(['x', '', 'y', '', 'z', 'x', 'y', 'z', ''], dtype='object', name='B')

and

Index(['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], dtype='object', name='A')

becomes:

Index(['', '', 'a', '', '', '', 'b', '', ''], dtype='object', name='A')

Ultimately, I will attempt to use these as secondary and tertiary y-axis labels in a Matplotlib horizontal bar, something chart like this (though some of my labels may be shifted up):


回答1:


Finally took the time to figure this out...

#First, get the values of the index level.
A=table.index.get_level_values(0)

#Next, convert the values to a data frame.
ndf = pd.DataFrame({'A2':A.values})

#Next, get the count of rows per group.
ndf['A2Count']=ndf.groupby('A2')['A2'].transform(lambda x: x.count())

#Next, get the position based on the logic in the question.
ndf['A2Pos']=ndf['A2Count'].apply(lambda x: x/2 if x%2==0 else (x+1)/2)

#Next, order the rows per group.
ndf['A2GpOrdr']=ndf.groupby('A2').cumcount()+1

#And finally, create the column to use for plotting this level's axis label.
ndf['A2New']=ndf.apply(lambda x: x['A2'] if x['A2GpOrdr']==x['A2Pos'] else "",axis=1)
ndf

    A2  A2Count  A2Pos  A2GpOrdr   A2New
0   a   5        3.0       1    
1   a   5        3.0       2    
2   a   5        3.0       3       a
3   a   5        3.0       4    
4   a   5        3.0       5    
5   b   4        2.0       1    
6   b   4        2.0       2       b
7   b   4        2.0       3    
8   b   4        2.0       4    


来源:https://stackoverflow.com/questions/37191784/pandas-replace-all-but-middle-values-per-category-of-a-level-with-blank

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!