问题
Consider the following single index DataFrame:
energy fat
1 2000 28
2 1900 17
3 2200 30
4 1750 15
5 1800 18
6 1600 12
I also have a multindex Series:
1 vitamin-c 0.0004
vitamin-a 0.0150
2 vitamin-c 0.0030
3 vitamin-d 1.2000
vitamin-e 1.0007
vitamin-c 1.2020
4 vitamin-a 0.0780
5 vitamin-b 0.9650
6 vitamin-e 1.9801
vitamin-c 1.0011
How can I join the two so the result looks like this:
energy fat vitamins
1 2000 28 vitamin-c 0.0004
vitamin-a 0.0150
2 1900 17 vitamin-c 0.0030
3 2200 30 vitamin-d 1.2000
vitamin-e 1.0007
vitamin-c 1.2020
4 1750 15 vitamin-a 0.0780
5 1800 18 vitamin-b 0.9650
6 1600 12 vitamin-e 1.9801
vitamin-c 1.0011
I tried df.join(series, how = 'inner')
but all I got is the following error message:
"ValueError: cannot join with no level specified and no overlapping names"
Can someone please explain me what im doing wrong here and how i can achieve the combination of the two ? Thank you !
回答1:
Option 1
I don't suggest moving things into the index that shouldn't be there.
That said, you can use pd.DataFrame.join
if your index levels are appropriately named, or rather they match so pandas knows what to join on.
df.rename_axis('ord').join(s.rename_axis(['ord', 'vit']).rename('val'))
energy fat val
ord vit
1 vitamin-c 2000 28 0.0004
vitamin-a 2000 28 0.0150
2 vitamin-c 1900 17 0.0030
3 vitamin-d 2200 30 1.2000
vitamin-e 2200 30 1.0007
vitamin-c 2200 30 1.2020
4 vitamin-a 1750 15 0.0780
5 vitamin-b 1800 18 0.9650
6 vitamin-e 1600 12 1.9801
vitamin-c 1600 12 1.0011
In a couple more lines to add readability
s = s.rename_axis(['ord', 'vit']).rename('val')
df = df.rename_axis('ord')
df.join(s)
energy fat val
ord vit
1 vitamin-c 2000 28 0.0004
vitamin-a 2000 28 0.0150
2 vitamin-c 1900 17 0.0030
3 vitamin-d 2200 30 1.2000
vitamin-e 2200 30 1.0007
vitamin-c 2200 30 1.2020
4 vitamin-a 1750 15 0.0780
5 vitamin-b 1800 18 0.9650
6 vitamin-e 1600 12 1.9801
vitamin-c 1600 12 1.0011
Option 2
We can also use pd.concat
with loc
and pd.Index.get_level_values
pd.concat(
[df.loc[s.index.get_level_values(0)].set_index(s.index), s.rename('val')],
axis=1
)
energy fat val
1 vitamin-c 2000 28 0.0004
vitamin-a 2000 28 0.0150
2 vitamin-c 1900 17 0.0030
3 vitamin-d 2200 30 1.2000
vitamin-e 2200 30 1.0007
vitamin-c 2200 30 1.2020
4 vitamin-a 1750 15 0.0780
5 vitamin-b 1800 18 0.9650
6 vitamin-e 1600 12 1.9801
vitamin-c 1600 12 1.0011
回答2:
If you add names to the index/multiindex you can use a join:
In [11]: df
Out[11]:
energy fat
n
1 2000 28
2 1900 17
3 2200 30
4 1750 15
5 1800 18
6 1600 12
In [12]: df2
Out[12]:
val
n vitamin
1 vitamin-c 0.0004
vitamin-a 0.0150
2 vitamin-c 0.0030
3 vitamin-d 1.2000
vitamin-e 1.0007
vitamin-c 1.2020
4 vitamin-a 0.0780
5 vitamin-b 0.9650
6 vitamin-e 1.9801
vitamin-c 1.0011
In [13]: df.join(df2)
Out[13]:
energy fat val
n vitamin
1 vitamin-c 2000 28 0.0004
vitamin-a 2000 28 0.0150
2 vitamin-c 1900 17 0.0030
3 vitamin-d 2200 30 1.2000
vitamin-e 2200 30 1.0007
vitamin-c 2200 30 1.2020
4 vitamin-a 1750 15 0.0780
5 vitamin-b 1800 18 0.9650
6 vitamin-e 1600 12 1.9801
vitamin-c 1600 12 1.0011
Note: Do this by setting the .index.names:
In [21]: df.index.names = ["n"] # or .name = "n"
In [22]: df2.index.names = ["n", "vitamin"]
回答3:
Source sets:
In [96]: s
Out[96]:
id vitamins
1.0 vitamin-c 0.0004
vitamin-a 0.0150
2.0 vitamin-c 0.0030
3.0 vitamin-d 1.2000
vitamin-e 1.0007
vitamin-c 1.2020
4.0 vitamin-a 0.0780
5.0 vitamin-b 0.9650
6.0 vitamin-e 1.9801
vitamin-c 1.0011
Name: val, dtype: float64
In [97]: df
Out[97]:
energy fat
1 2000 28
2 1900 17
3 2200 30
4 1750 15
5 1800 18
6 1600 12
Solution:
In [99]: s.reset_index() \
.merge(df, left_on='id', right_index=True) \
.set_index(['id','energy','fat','vitamins'])
Out[99]:
val
id energy fat vitamins
1.0 2000 28 vitamin-c 0.0004
vitamin-a 0.0150
2.0 1900 17 vitamin-c 0.0030
3.0 2200 30 vitamin-d 1.2000
vitamin-e 1.0007
vitamin-c 1.2020
4.0 1750 15 vitamin-a 0.0780
5.0 1800 18 vitamin-b 0.9650
6.0 1600 12 vitamin-e 1.9801
vitamin-c 1.0011
来源:https://stackoverflow.com/questions/47144979/how-to-join-a-multi-index-series-to-a-single-index-dataframe-with-pandas