Pandas - How to flatten a hierarchical index in columns

后端 未结 17 998
忘掉有多难
忘掉有多难 2020-11-22 02:55

I have a data frame with a hierarchical index in axis 1 (columns) (from a groupby.agg operation):

     USAF   WBAN  year  month  day  s_PC  s_CL         


        
相关标签:
17条回答
  • 2020-11-22 03:24

    A bit late maybe, but if you are not worried about duplicate column names:

    df.columns = df.columns.tolist()
    
    0 讨论(0)
  • 2020-11-22 03:25

    You could also do as below. Consider df to be your dataframe and assume a two level index (as is the case in your example)

    df.columns = [(df.columns[i][0])+'_'+(datadf_pos4.columns[i][1]) for i in range(len(df.columns))]
    
    0 讨论(0)
  • 2020-11-22 03:29
    pd.DataFrame(df.to_records()) # multiindex become columns and new index is integers only
    
    0 讨论(0)
  • 2020-11-22 03:29

    A general solution that handles multiple levels and mixed types:

    df.columns = ['_'.join(tuple(map(str, t))) for t in df.columns.values]
    
    0 讨论(0)
  • 2020-11-22 03:33

    Andy Hayden's answer is certainly the easiest way -- if you want to avoid duplicate column labels you need to tweak a bit

    In [34]: df
    Out[34]: 
         USAF   WBAN  day  month  s_CD  s_CL  s_CNT  s_PC  tempf         year
                                   sum   sum    sum   sum   amax   amin      
    0  702730  26451    1      1    12     0     13     1  30.92  24.98  1993
    1  702730  26451    2      1    13     0     13     0  32.00  24.98  1993
    2  702730  26451    3      1     2    10     13     1  23.00   6.98  1993
    3  702730  26451    4      1    12     0     13     1  10.04   3.92  1993
    4  702730  26451    5      1    10     0     13     3  19.94  10.94  1993
    
    
    In [35]: mi = df.columns
    
    In [36]: mi
    Out[36]: 
    MultiIndex
    [(USAF, ), (WBAN, ), (day, ), (month, ), (s_CD, sum), (s_CL, sum), (s_CNT, sum), (s_PC, sum), (tempf, amax), (tempf, amin), (year, )]
    
    
    In [37]: mi.tolist()
    Out[37]: 
    [('USAF', ''),
     ('WBAN', ''),
     ('day', ''),
     ('month', ''),
     ('s_CD', 'sum'),
     ('s_CL', 'sum'),
     ('s_CNT', 'sum'),
     ('s_PC', 'sum'),
     ('tempf', 'amax'),
     ('tempf', 'amin'),
     ('year', '')]
    
    In [38]: ind = pd.Index([e[0] + e[1] for e in mi.tolist()])
    
    In [39]: ind
    Out[39]: Index([USAF, WBAN, day, month, s_CDsum, s_CLsum, s_CNTsum, s_PCsum, tempfamax, tempfamin, year], dtype=object)
    
    In [40]: df.columns = ind
    
    
    
    
    In [46]: df
    Out[46]: 
         USAF   WBAN  day  month  s_CDsum  s_CLsum  s_CNTsum  s_PCsum  tempfamax  tempfamin  \
    0  702730  26451    1      1       12        0        13        1      30.92      24.98   
    1  702730  26451    2      1       13        0        13        0      32.00      24.98   
    2  702730  26451    3      1        2       10        13        1      23.00       6.98   
    3  702730  26451    4      1       12        0        13        1      10.04       3.92   
    4  702730  26451    5      1       10        0        13        3      19.94      10.94   
    
    
    
    
       year  
    0  1993  
    1  1993  
    2  1993  
    3  1993  
    4  1993
    
    0 讨论(0)
提交回复
热议问题