Pandas - unstack column values into new columns

前端 未结 2 1849
小蘑菇
小蘑菇 2021-01-04 15:27

I have a large dataframe and I am storing a lot of redundant values that are making it hard to handle my data. I have a dataframe of the form:

import pandas          


        
相关标签:
2条回答
  • 2021-01-04 16:12

    If you group your meta columns into a list then you can do this:

    metas = ['meta1', 'meta2']
    
    new_df = df.set_index(['name'] + metas).unstack('name')
    print new_df
    
                data    
    name          n1  n2
    meta1 meta2         
    a     g       y1  y2
    b     h       y3  y4
    

    Which gets you most of the way there. Additional tailoring can get you the rest of the way.

    print new_df.data.rename_axis([None], axis=1).reset_index()
    
      meta1 meta2  n1  n2
    0     a     g  y1  y2
    1     b     h  y3  y4
    
    0 讨论(0)
  • 2021-01-04 16:13

    You can use pivot_table with reset_index and rename_axis (new in pandas 0.18.0):

    print (df.pivot_table(index=['meta1','meta2'], 
                          columns='name', 
                          values='data', 
                          aggfunc='first')
             .reset_index()
             .rename_axis(None, axis=1))
    
      meta1 meta2  n1  n2
    0     a     g  y1  y2
    1     b     h  y3  y4
    

    But better is use aggfunc join:

    print (df.pivot_table(index=['meta1','meta2'], 
                          columns='name', 
                          values='data', 
                          aggfunc=', '.join)
             .reset_index()
             .rename_axis(None, axis=1))
    
      meta1 meta2  n1  n2
    0     a     g  y1  y2
    1     b     h  y3  y4
    

    Explanation, why join is generally better as first:

    If use first, you can lost all data which are not first in each group by index, but join concanecate them:

    import pandas as pd
    
    df = pd.DataFrame([["a","g","n1","y1"], 
                       ["a","g","n2","y2"], 
                       ["a","g","n1","y3"], 
                       ["b","h","n2","y4"]], columns=["meta1", "meta2", "name", "data"])
    
    print (df)
      meta1 meta2 name data
    0     a     g   n1   y1
    1     a     g   n2   y2
    2     a     g   n1   y3
    3     b     h   n2   y4
    
    print (df.pivot_table(index=['meta1','meta2'], 
                          columns='name', 
                          values='data', 
                          aggfunc='first')
             .reset_index()
             .rename_axis(None, axis=1))
      meta1 meta2    n1  n2
    0     a     g    y1  y2
    1     b     h  None  y4
    
    print (df.pivot_table(index=['meta1','meta2'], 
                          columns='name', 
                          values='data', 
                          aggfunc=', '.join)
             .reset_index()
             .rename_axis(None, axis=1))
    
      meta1 meta2      n1  n2
    0     a     g  y1, y3  y2
    1     b     h    None  y4 
    
    0 讨论(0)
提交回复
热议问题