Merge multi-indexed with single-indexed data frames in pandas

后端 未结 3 667
别那么骄傲
别那么骄傲 2021-01-30 17:21

I have two dataframes. df1 is multi-indexed:

                value
first second    
a     x         0.471780
      y         0.774908
      z         0.563634
b          


        
相关标签:
3条回答
  • 2021-01-30 17:58

    According to the documentation, as of pandas 0.14, you can simply join single-index and multiindex dataframes. It will match on the common index name. The how argument works as expected with 'inner' and 'outer', though interestingly it seems to be reversed for 'left' and 'right' (could this be a bug?).

    df1 = pd.DataFrame([['a', 'x', 0.471780], ['a','y', 0.774908], ['a', 'z', 0.563634],
                        ['b', 'x', -0.353756], ['b', 'y', 0.368062], ['b', 'z', -1.721840],
                        ['c', 'x', 1], ['c', 'y', 2], ['c', 'z', 3],
                       ],
                       columns=['first', 'second', 'value1']
                       ).set_index(['first', 'second'])
    df2 = pd.DataFrame([['a', 10], ['b', 20]],
                       columns=['first', 'value2']).set_index(['first'])
    
    print(df1.join(df2, how='inner'))
                    value1  value2
    first second                  
    a     x       0.471780      10
          y       0.774908      10
          z       0.563634      10
    b     x      -0.353756      20
          y       0.368062      20
          z      -1.721840      20
    
    0 讨论(0)
  • 2021-01-30 18:03

    You could use get_level_values:

    firsts = df1.index.get_level_values('first')
    df1['value2'] = df2.loc[firsts].values
    

    Note: you are almost doing a join here (except the df1 is MultiIndex)... so there may be a neater way to describe this...

    .

    In an example (similar to what you have):

    df1 = pd.DataFrame([['a', 'x', 0.123], ['a','x', 0.234],
                        ['a', 'y', 0.451], ['b', 'x', 0.453]],
                       columns=['first', 'second', 'value1']
                       ).set_index(['first', 'second'])
    df2 = pd.DataFrame([['a', 10],['b', 20]],
                       columns=['first', 'value']).set_index(['first'])
    
    firsts = df1.index.get_level_values('first')
    df1['value2'] = df2.loc[firsts].values
    
    In [5]: df1
    Out[5]: 
                  value1  value2
    first second                
    a     x        0.123      10
          x        0.234      10
          y        0.451      10
    b     x        0.453      20
    
    0 讨论(0)
  • 2021-01-30 18:05

    As the .ix syntax is a powerful shortcut to reindexing, but in this case you are actually not doing any combined rows/column reindexing, this can be done a bit more elegantly (for my humble taste buds) with just using reindexing:

    Preparation from hayden:

    df1 = pd.DataFrame([['a', 'x', 0.123], ['a','x', 0.234],
                        ['a', 'y', 0.451], ['b', 'x', 0.453]],
                       columns=['first', 'second', 'value1']
                       ).set_index(['first', 'second'])
    df2 = pd.DataFrame([['a', 10],['b', 20]],
                       columns=['first', 'value']).set_index(['first'])
    

    Then this looks like this in iPython:

    In [4]: df1
    Out[4]: 
                  value1
    first second        
    a     x        0.123
          x        0.234
          y        0.451
    b     x        0.453
    
    In [5]: df2
    Out[5]: 
           value
    first       
    a         10
    b         20
    
    In [7]: df2.reindex(df1.index, level=0)
    Out[7]: 
                  value
    first second       
    a     x          10
          x          10
          y          10
    b     x          20
    
    In [8]: df1['value2'] = df2.reindex(df1.index, level=0)
    
    In [9]: df1
    Out[9]: 
                  value1  value2
    first second                
    a     x        0.123      10
          x        0.234      10
          y        0.451      10
    b     x        0.453      20
    

    The mnemotechnic for what level you have to use in the reindex method: It states for the level that you already covered in the bigger index. So, in this case df2 already had level 0 covered of the df1.index.

    0 讨论(0)
提交回复
热议问题