Fill in missing data pandas

╄→尐↘猪︶ㄣ 提交于 2020-04-06 18:17:33

问题


How can I fill in the missing data in this dateframe.

Missing values for days when no sales are made. How can I fill in the missing values for days where 0 of an item were sold at a particular store and date?

Input

Dates            Store            Item        Sales           
2017-01-01       Chicago          Apple       10
2017-01-02       NewYork          Pear        10 
2017-01-03       Chicago          Apple       10

Output

Dates            Store            Item        Sales           
2017-01-01       Chicago          Apple       10
2017-01-01       Chicago          Pear        0
2017-01-02       Chicago          Apple       0
2017-01-02       Chicago          Pear        0
2017-01-03       Chicago          Apple       10
2017-01-03       Chicago          Pear        0    
2017-01-01       NewYork          Apple       0
2017-01-01       NewYork          Pear        0 
2017-01-02       NewYork          Apple       0 
2017-01-02       NewYork          Pear        10 
2017-01-03       NewYork          Apple       0 
2017-01-03       NewYork          Pear        0 

回答1:


Use:

  • first set_index for Multiindex
  • create new Multiindex from_product
  • reindex and add 0 for missing values
  • last sort level Store by sort_index and reset_index

df = df.set_index(['Dates','Store','Item'])
mux = pd.MultiIndex.from_product(df.index.levels, names=df.index.names)
df = df.reindex(mux, fill_value=0).sort_index(level='Store').reset_index()
print (df)
        Dates    Store   Item  Sales
0  2017-01-01  Chicago  Apple     10
1  2017-01-01  Chicago   Pear      0
2  2017-01-02  Chicago  Apple      0
3  2017-01-02  Chicago   Pear      0
4  2017-01-03  Chicago  Apple     10
5  2017-01-03  Chicago   Pear      0
6  2017-01-01  NewYork  Apple      0
7  2017-01-01  NewYork   Pear      0
8  2017-01-02  NewYork  Apple      0
9  2017-01-02  NewYork   Pear     10
10 2017-01-03  NewYork  Apple      0
11 2017-01-03  NewYork   Pear      0



回答2:


It will be a little bit hard to understand, by using set_index,stack and unstack

df.set_index(['Dates','Store','Item']).unstack().stack(dropna=False).\
    unstack(1).stack(dropna=False).fillna(0).reset_index()
Out[258]: 
         Dates   Item    Store  Sales
0   2017-01-01  Apple  Chicago   10.0
1   2017-01-01  Apple  NewYork    0.0
2   2017-01-01   Pear  Chicago    0.0
3   2017-01-01   Pear  NewYork    0.0
4   2017-01-02  Apple  Chicago    0.0
5   2017-01-02  Apple  NewYork    0.0
6   2017-01-02   Pear  Chicago    0.0
7   2017-01-02   Pear  NewYork   10.0
8   2017-01-03  Apple  Chicago   10.0
9   2017-01-03  Apple  NewYork    0.0
10  2017-01-03   Pear  Chicago    0.0
11  2017-01-03   Pear  NewYork    0.0


来源:https://stackoverflow.com/questions/47368536/fill-in-missing-data-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!