I have a pandas dataframe that incorporates dates, customers, items, and then dollar value for purchases.
date customer product amt
1/1/2017 tim
Maybe because of my SQL mindset, consider a left join merge
on an expanded helper dataframe:
helper_df_list = [pd.DataFrame({'date': pd.date_range(df['date'].min(), df['date'].max()),
'customer': c, 'product': p })
for c in df['customer'].unique()
for p in df['product'].unique()]
helper_df = pd.concat(helper_df_list, ignore_index=True)
final_df = pd.merge(helper_df, df, on=['date', 'customer', 'product'], how='left')\
.fillna(0).sort_values(['date', 'customer']).reset_index(drop=True)
Output
print(final_df)
# customer date product amt
# 0 jim 2017-01-01 apple 0.0
# 1 jim 2017-01-01 melon 2.0
# 2 jim 2017-01-01 orange 0.0
# 3 tim 2017-01-01 apple 3.0
# 4 tim 2017-01-01 melon 0.0
# 5 tim 2017-01-01 orange 0.0
# 6 tom 2017-01-01 apple 5.0
# 7 tom 2017-01-01 melon 4.0
# 8 tom 2017-01-01 orange 0.0
# 9 jim 2017-01-02 apple 0.0
# 10 jim 2017-01-02 melon 0.0
# 11 jim 2017-01-02 orange 0.0
# 12 tim 2017-01-02 apple 0.0
# 13 tim 2017-01-02 melon 0.0
# 14 tim 2017-01-02 orange 0.0
# 15 tom 2017-01-02 apple 0.0
# 16 tom 2017-01-02 melon 0.0
# 17 tom 2017-01-02 orange 0.0
# 18 jim 2017-01-03 apple 0.0
# 19 jim 2017-01-03 melon 0.0
# 20 jim 2017-01-03 orange 0.0
# 21 tim 2017-01-03 apple 0.0
# 22 tim 2017-01-03 melon 0.0
# 23 tim 2017-01-03 orange 0.0
# 24 tom 2017-01-03 apple 0.0
# 25 tom 2017-01-03 melon 0.0
# 26 tom 2017-01-03 orange 0.0
# 27 jim 2017-01-04 apple 2.0
# 28 jim 2017-01-04 melon 0.0
# 29 jim 2017-01-04 orange 0.0
# 30 tim 2017-01-04 apple 0.0
# 31 tim 2017-01-04 melon 3.0
# 32 tim 2017-01-04 orange 0.0
# 33 tom 2017-01-04 apple 0.0
# 34 tom 2017-01-04 melon 1.0
# 35 tom 2017-01-04 orange 4.0