Pandas - How to flatten a hierarchical index in columns

后端 未结 17 1093
忘掉有多难
忘掉有多难 2020-11-22 02:55

I have a data frame with a hierarchical index in axis 1 (columns) (from a groupby.agg operation):

     USAF   WBAN  year  month  day  s_PC  s_CL         


        
相关标签:
17条回答
  • 2020-11-22 03:06

    I'll share a straight-forward way that worked for me.

    [" ".join([str(elem) for elem in tup]) for tup in df.columns.tolist()]
    #df = df.reset_index() if needed
    
    0 讨论(0)
  • 2020-11-22 03:08

    And if you want to retain any of the aggregation info from the second level of the multiindex you can try this:

    In [1]: new_cols = [''.join(t) for t in df.columns]
    Out[1]:
    ['USAF',
     'WBAN',
     'day',
     'month',
     's_CDsum',
     's_CLsum',
     's_CNTsum',
     's_PCsum',
     'tempfamax',
     'tempfamin',
     'year']
    
    In [2]: df.columns = new_cols
    
    0 讨论(0)
  • 2020-11-22 03:11

    All of the current answers on this thread must have been a bit dated. As of pandas version 0.24.0, the .to_flat_index() does what you need.

    From panda's own documentation:

    MultiIndex.to_flat_index()

    Convert a MultiIndex to an Index of Tuples containing the level values.

    A simple example from its documentation:

    import pandas as pd
    print(pd.__version__) # '0.23.4'
    index = pd.MultiIndex.from_product(
            [['foo', 'bar'], ['baz', 'qux']],
            names=['a', 'b'])
    
    print(index)
    # MultiIndex(levels=[['bar', 'foo'], ['baz', 'qux']],
    #           codes=[[1, 1, 0, 0], [0, 1, 0, 1]],
    #           names=['a', 'b'])
    

    Applying to_flat_index():

    index.to_flat_index()
    # Index([('foo', 'baz'), ('foo', 'qux'), ('bar', 'baz'), ('bar', 'qux')], dtype='object')
    

    Using it to replace existing pandas column

    An example of how you'd use it on dat, which is a DataFrame with a MultiIndex column:

    dat = df.loc[:,['name','workshop_period','class_size']].groupby(['name','workshop_period']).describe()
    print(dat.columns)
    # MultiIndex(levels=[['class_size'], ['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max']],
    #            codes=[[0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6, 7]])
    
    dat.columns = dat.columns.to_flat_index()
    print(dat.columns)
    # Index([('class_size', 'count'),  ('class_size', 'mean'),
    #     ('class_size', 'std'),   ('class_size', 'min'),
    #     ('class_size', '25%'),   ('class_size', '50%'),
    #     ('class_size', '75%'),   ('class_size', 'max')],
    #  dtype='object')
    
    0 讨论(0)
  • 2020-11-22 03:13
    df.columns = ['_'.join(tup).rstrip('_') for tup in df.columns.values]
    
    0 讨论(0)
  • 2020-11-22 03:13

    In case you want to have a separator in the name between levels, this function works well.

    def flattenHierarchicalCol(col,sep = '_'):
        if not type(col) is tuple:
            return col
        else:
            new_col = ''
            for leveli,level in enumerate(col):
                if not level == '':
                    if not leveli == 0:
                        new_col += sep
                    new_col += level
            return new_col
    
    df.columns = df.columns.map(flattenHierarchicalCol)
    
    0 讨论(0)
  • 2020-11-22 03:14

    Following @jxstanford and @tvt173, I wrote a quick function which should do the trick, regardless of string/int column names:

    def flatten_cols(df):
        df.columns = [
            '_'.join(tuple(map(str, t))).rstrip('_') 
            for t in df.columns.values
            ]
        return df
    
    0 讨论(0)
提交回复
热议问题