问题
I have a MultiIndex pandas DataFrame df_multi
like:
import pandas as pd
df_multi = pd.DataFrame([['A', 'A1', 0,234,2002],['A', 'A1', 1,324,2550],
['A', 'A1', 2,345,3207],['A', 'A1', 3,458,4560],['A', 'A2', 0,569,1980],
['A', 'A2', 1,657,2314],['A', 'A2', 2,768,4568],['A', 'A2', 3,823,5761]],
columns=['Product','Scenario','Time','Quantity','Price']).set_index(
['Product', 'Scenario'])
and a single index DataFrame df_single
like:
df_single = pd.DataFrame([['A', -3,100],['A', -2,100], ['A', -1,100]],
columns=['Product','Time','Quantity']).set_index(['Product'])
For every 'Product' in the first index level of df_multi
, and for every 'Scenario' in its second level, I would like to append/concatenate the rows in df_single
, which contain some negative 'Time' values to be appended before the positive 'Time' values in df_multi
begin.
I would furthermore like the resulting DataFrame to be first MultiIndexed by ['Product','Scenario'] (just like df_multi
), then secondly with the rows ordered by ascending value of 'Time'. In other words, the desired result is:
df_result = pd.DataFrame([['A', 'A1', -3,100,'NaN'],['A', 'A1', -2,100,'NaN'],
['A', 'A1', -1,100,'NaN'],['A', 'A1', 0,234,2002],['A', 'A1', 1,324,2550],
['A', 'A1', 2,345,3207],['A', 'A1', 3,458,4560],['A','A2', -3,100,'NaN'],
['A', 'A2', -2,100,'NaN'],['A', 'A2', -1,100,'NaN'],['A', 'A2', 0,569,1980],
['A', 'A2', 1,657,2314],['A', 'A2', 2,768,4568],['A', 'A2', 3,823,5761]],
columns=['Product','Scenario','Time','Quantity','Price']).set_index(
['Product', 'Scenario'])
EDIT:
df_single
has no 'Scenario' values, which can be confusing. As long as 'Product' matches, the same rows ofdf_single
are to be appended to every scenario indf_multi
, and they simply "inherit" the Scenario values for free.- The actual DataFrames I'm working with are rather large (few thousand 'Product', few thousand 'Scenario' per product, and a few hundred 'Time' steps per scenario, plus extra columns which I did not write in the example), so I need to do this in a fully automated (and hopefully fast) way.
I tried to implement this with all of join
, concat
and merge
, and I did not succeed. What would be the best way of achieving the desired result?
回答1:
Consider resetting indexes as columns for a merge
, followed by a groupby
aggregation only to return one occurrence per group and avoid duplicates. Afterwards, run a concatenation, concat
, followed by column sorting and setting back the multi-index.
# MERGE AND AGGREGATION
df_temp = df_multi.reset_index().merge(df_single.reset_index(), on='Product', suffixes=['','_'])\
.groupby(['Product', 'Scenario', 'Time_'])['Quantity_'].max()\
.reset_index().rename(columns={'Time_':'Time','Quantity_':'Quantity'})
# ROW BIND CONCATENATION
df_final = pd.concat([df_multi.reset_index(), df_temp])\
.sort_values(['Product','Scenario', 'Time'])\
.set_index(['Product', 'Scenario'])[['Time', 'Quantity', 'Price']]
print(df_final)
# Time Quantity Price
# Product Scenario
# A A1 -3 100 NaN
# A1 -2 100 NaN
# A1 -1 100 NaN
# A1 0 234 2002.0
# A1 1 324 2550.0
# A1 2 345 3207.0
# A1 3 458 4560.0
# A2 -3 100 NaN
# A2 -2 100 NaN
# A2 -1 100 NaN
# A2 0 569 1980.0
# A2 1 657 2314.0
# A2 2 768 4568.0
# A2 3 823 5761.0
来源:https://stackoverflow.com/questions/47561694/pandas-how-to-concatenate-a-multiindex-dataframe-with-a-single-index-dataframe