Pandas Melt with Multiple Value Vars

后端 未结 3 1621
南笙
南笙 2020-12-30 14:52

I have a data set which is in wide format like this

   Index Country     Variable 2000 2001 2002 2003 2004 2005
   0     Argentina   var1     12   15   18            


        
相关标签:
3条回答
  • 2020-12-30 15:08

    Instead of melt, you can use a combination of stack and unstack:

    (df.set_index(['Country', 'Variable'])
       .rename_axis(['Year'], axis=1)
       .stack()
       .unstack('Variable')
       .reset_index())
    
    Variable    Country  Year  var1  var2
    0         Argentina  2000    12     1
    1         Argentina  2001    15     3
    2         Argentina  2002    18     2
    3         Argentina  2003    17     5
    4         Argentina  2004    23     7
    5         Argentina  2005    29     5
    6            Brazil  2000    20     0
    7            Brazil  2001    23     1
    8            Brazil  2002    25     2
    9            Brazil  2003    29     2
    10           Brazil  2004    31     3
    11           Brazil  2005    32     3
    
    0 讨论(0)
  • 2020-12-30 15:16

    Option 1

    Using melt then unstack for var1, var2, etc...

    (df1.melt(id_vars=['Country','Variable'],var_name='Year')
        .set_index(['Country','Year','Variable'])
        .squeeze()
        .unstack()
        .reset_index())
    

    Output:

    Variable    Country  Year  var1  var2
    0         Argentina  2000    12     1
    1         Argentina  2001    15     3
    2         Argentina  2002    18     2
    3         Argentina  2003    17     5
    4         Argentina  2004    23     7
    5         Argentina  2005    29     5
    6            Brazil  2000    20     0
    7            Brazil  2001    23     1
    8            Brazil  2002    25     2
    9            Brazil  2003    29     2
    10           Brazil  2004    31     3
    11           Brazil  2005    32     3
    

    Option 2

    Using pivot then stack:

    (df1.pivot(index='Country',columns='Variable')
       .stack(0)
       .rename_axis(['Country','Year'])
       .reset_index())
    

    Output:

    Variable    Country  Year  var1  var2
    0         Argentina  2000    12     1
    1         Argentina  2001    15     3
    2         Argentina  2002    18     2
    3         Argentina  2003    17     5
    4         Argentina  2004    23     7
    5         Argentina  2005    29     5
    6            Brazil  2000    20     0
    7            Brazil  2001    23     1
    8            Brazil  2002    25     2
    9            Brazil  2003    29     2
    10           Brazil  2004    31     3
    11           Brazil  2005    32     3
    

    Option 3 (ayhan's solution)

    Using set_index, stack, and unstack:

    (df.set_index(['Country', 'Variable'])
       .rename_axis(['Year'], axis=1)
       .stack()
       .unstack('Variable')
       .reset_index())
    

    Output:

    Variable    Country  Year  var1  var2
    0         Argentina  2000    12     1
    1         Argentina  2001    15     3
    2         Argentina  2002    18     2
    3         Argentina  2003    17     5
    4         Argentina  2004    23     7
    5         Argentina  2005    29     5
    6            Brazil  2000    20     0
    7            Brazil  2001    23     1
    8            Brazil  2002    25     2
    9            Brazil  2003    29     2
    10           Brazil  2004    31     3
    11           Brazil  2005    32     3
    
    0 讨论(0)
  • 2020-12-30 15:20

    numpy

    years = df.drop(['Country', 'Variable'], 1)
    y = years.values
    m = y.shape[1]
    c = df.Country.values
    v = df.Variable.values
    
    f0, u0 = pd.factorize(df.Country.values)
    f1, u1 = pd.factorize(df.Variable.values)
    
    w = np.empty((u1.size, u0.size, m), dtype=y.dtype)
    w[f1, f0] = y
    
    results = pd.DataFrame(dict(
            Country=u0.repeat(m),
            Year=np.tile(years.columns.values, u0.size),
        )).join(pd.DataFrame(w.reshape(-1, m * u1.size).T, columns=u1))
    
    results
    
          Country  Year  var1  var2
    0   Argentina  2000    12     1
    1   Argentina  2001    15     3
    2   Argentina  2002    18     2
    3   Argentina  2003    17     5
    4   Argentina  2004    23     7
    5   Argentina  2005    29     5
    6      Brazil  2000    20     0
    7      Brazil  2001    23     1
    8      Brazil  2002    25     2
    9      Brazil  2003    29     2
    10     Brazil  2004    31     3
    11     Brazil  2005    32     3
    
    0 讨论(0)
提交回复
热议问题