I have two dataframes. df1 is multi-indexed:
value
first second
a x 0.471780
y 0.774908
z 0.563634
b
According to the documentation, as of pandas 0.14, you can simply join single-index and multiindex dataframes. It will match on the common index name. The how
argument works as expected with 'inner'
and 'outer'
, though interestingly it seems to be reversed for 'left'
and 'right'
(could this be a bug?).
df1 = pd.DataFrame([['a', 'x', 0.471780], ['a','y', 0.774908], ['a', 'z', 0.563634],
['b', 'x', -0.353756], ['b', 'y', 0.368062], ['b', 'z', -1.721840],
['c', 'x', 1], ['c', 'y', 2], ['c', 'z', 3],
],
columns=['first', 'second', 'value1']
).set_index(['first', 'second'])
df2 = pd.DataFrame([['a', 10], ['b', 20]],
columns=['first', 'value2']).set_index(['first'])
print(df1.join(df2, how='inner'))
value1 value2
first second
a x 0.471780 10
y 0.774908 10
z 0.563634 10
b x -0.353756 20
y 0.368062 20
z -1.721840 20
You could use get_level_values:
firsts = df1.index.get_level_values('first')
df1['value2'] = df2.loc[firsts].values
Note: you are almost doing a join here (except the df1 is MultiIndex)... so there may be a neater way to describe this...
.
In an example (similar to what you have):
df1 = pd.DataFrame([['a', 'x', 0.123], ['a','x', 0.234],
['a', 'y', 0.451], ['b', 'x', 0.453]],
columns=['first', 'second', 'value1']
).set_index(['first', 'second'])
df2 = pd.DataFrame([['a', 10],['b', 20]],
columns=['first', 'value']).set_index(['first'])
firsts = df1.index.get_level_values('first')
df1['value2'] = df2.loc[firsts].values
In [5]: df1
Out[5]:
value1 value2
first second
a x 0.123 10
x 0.234 10
y 0.451 10
b x 0.453 20
As the .ix
syntax is a powerful shortcut to reindexing, but in this case you are actually not doing any combined rows/column reindexing, this can be done a bit more elegantly (for my humble taste buds) with just using reindexing:
Preparation from hayden:
df1 = pd.DataFrame([['a', 'x', 0.123], ['a','x', 0.234],
['a', 'y', 0.451], ['b', 'x', 0.453]],
columns=['first', 'second', 'value1']
).set_index(['first', 'second'])
df2 = pd.DataFrame([['a', 10],['b', 20]],
columns=['first', 'value']).set_index(['first'])
Then this looks like this in iPython:
In [4]: df1
Out[4]:
value1
first second
a x 0.123
x 0.234
y 0.451
b x 0.453
In [5]: df2
Out[5]:
value
first
a 10
b 20
In [7]: df2.reindex(df1.index, level=0)
Out[7]:
value
first second
a x 10
x 10
y 10
b x 20
In [8]: df1['value2'] = df2.reindex(df1.index, level=0)
In [9]: df1
Out[9]:
value1 value2
first second
a x 0.123 10
x 0.234 10
y 0.451 10
b x 0.453 20
The mnemotechnic for what level you have to use in the reindex method: It states for the level that you already covered in the bigger index. So, in this case df2 already had level 0 covered of the df1.index.