Pandas- adding missing dates to DataFrame while keeping column/index values?

后端未结

关注

 4  1697

I have a pandas dataframe that incorporates dates, customers, items, and then dollar value for purchases.

   date     customer   product   amt  
 1/1/2017   tim


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  小鲜肉        
                
              
                            
                2021-01-22 16:36
              
            
            
                                                                       
Maybe because of my SQL mindset, consider a left join merge on an expanded helper dataframe:

helper_df_list = [pd.DataFrame({'date': pd.date_range(df['date'].min(), df['date'].max()), 
                                'customer': c, 'product': p }) 
                    for c in df['customer'].unique() 
                            for p in df['product'].unique()]

helper_df = pd.concat(helper_df_list, ignore_index=True)

final_df = pd.merge(helper_df, df, on=['date', 'customer', 'product'], how='left')\
                    .fillna(0).sort_values(['date', 'customer']).reset_index(drop=True)


Output

print(final_df)
#    customer       date product  amt
# 0       jim 2017-01-01   apple  0.0
# 1       jim 2017-01-01   melon  2.0
# 2       jim 2017-01-01  orange  0.0
# 3       tim 2017-01-01   apple  3.0
# 4       tim 2017-01-01   melon  0.0
# 5       tim 2017-01-01  orange  0.0
# 6       tom 2017-01-01   apple  5.0
# 7       tom 2017-01-01   melon  4.0
# 8       tom 2017-01-01  orange  0.0
# 9       jim 2017-01-02   apple  0.0
# 10      jim 2017-01-02   melon  0.0
# 11      jim 2017-01-02  orange  0.0
# 12      tim 2017-01-02   apple  0.0
# 13      tim 2017-01-02   melon  0.0
# 14      tim 2017-01-02  orange  0.0
# 15      tom 2017-01-02   apple  0.0
# 16      tom 2017-01-02   melon  0.0
# 17      tom 2017-01-02  orange  0.0
# 18      jim 2017-01-03   apple  0.0
# 19      jim 2017-01-03   melon  0.0
# 20      jim 2017-01-03  orange  0.0
# 21      tim 2017-01-03   apple  0.0
# 22      tim 2017-01-03   melon  0.0
# 23      tim 2017-01-03  orange  0.0
# 24      tom 2017-01-03   apple  0.0
# 25      tom 2017-01-03   melon  0.0
# 26      tom 2017-01-03  orange  0.0
# 27      jim 2017-01-04   apple  2.0
# 28      jim 2017-01-04   melon  0.0
# 29      jim 2017-01-04  orange  0.0
# 30      tim 2017-01-04   apple  0.0
# 31      tim 2017-01-04   melon  3.0
# 32      tim 2017-01-04  orange  0.0
# 33      tom 2017-01-04   apple  0.0
# 34      tom 2017-01-04   melon  1.0
# 35      tom 2017-01-04  orange  4.0

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人共我        
                
              
                            
                2021-01-22 16:47
              
            
            
                                                                       
Notice ,this using the stack and unstack couple of times 

df.set_index(['date','customer','product']).amt.unstack(-3).\
  reindex(columns=pd.date_range(df['date'].min(), 
    df['date'].max()),fill_value=0).\
      stack(dropna=False).unstack().stack(dropna=False).\
        unstack('customer').stack(dropna=False).reset_index().\
          fillna(0).sort_values(['level_1','customer','product'])
Out[314]: 
   product    level_1 customer    0
0    apple 2017-01-01      jim  0.0
12   melon 2017-01-01      jim  2.0
24  orange 2017-01-01      jim  0.0
1    apple 2017-01-01      tim  3.0
13   melon 2017-01-01      tim  0.0
25  orange 2017-01-01      tim  0.0
2    apple 2017-01-01      tom  5.0
14   melon 2017-01-01      tom  4.0
26  orange 2017-01-01      tom  0.0
3    apple 2017-01-02      jim  0.0
15   melon 2017-01-02      jim  0.0
27  orange 2017-01-02      jim  0.0
4    apple 2017-01-02      tim  0.0
16   melon 2017-01-02      tim  0.0
28  orange 2017-01-02      tim  0.0
5    apple 2017-01-02      tom  0.0
17   melon 2017-01-02      tom  0.0
29  orange 2017-01-02      tom  0.0
6    apple 2017-01-03      jim  0.0
18   melon 2017-01-03      jim  0.0
30  orange 2017-01-03      jim  0.0
7    apple 2017-01-03      tim  0.0
19   melon 2017-01-03      tim  0.0
31  orange 2017-01-03      tim  0.0
8    apple 2017-01-03      tom  0.0
20   melon 2017-01-03      tom  0.0
32  orange 2017-01-03      tom  0.0
9    apple 2017-01-04      jim  2.0
21   melon 2017-01-04      jim  0.0
33  orange 2017-01-04      jim  0.0
10   apple 2017-01-04      tim  0.0
22   melon 2017-01-04      tim  3.0
34  orange 2017-01-04      tim  0.0
11   apple 2017-01-04      tom  0.0
23   melon 2017-01-04      tom  1.0
35  orange 2017-01-04      tom  4.0

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  离开以前        
                
              
                            
                2021-01-22 16:50
              
            
            
                                                                       
Let's use product from itertools, pd.date_range, and merge:

from itertools import product

daterange = pd.date_range(df['date'].min(), df['date'].max(), freq='D')
d1 = pd.DataFrame(list(product(daterange, 
                               df['customer'].unique(),
                               df['product'].unique())), 
                  columns=['date', 'customer', 'product'])
d1.merge(df, on=['date', 'customer', 'product'], how='left').fillna(0)


Output:

         date customer product  amt
0  2017-01-01      tim   apple  3.0
1  2017-01-01      tim   melon  0.0
2  2017-01-01      tim  orange  0.0
3  2017-01-01      jim   apple  0.0
4  2017-01-01      jim   melon  2.0
5  2017-01-01      jim  orange  0.0
6  2017-01-01      tom   apple  5.0
7  2017-01-01      tom   melon  4.0
8  2017-01-01      tom  orange  0.0
9  2017-01-02      tim   apple  0.0
10 2017-01-02      tim   melon  0.0
11 2017-01-02      tim  orange  0.0
12 2017-01-02      jim   apple  0.0
13 2017-01-02      jim   melon  0.0
14 2017-01-02      jim  orange  0.0
15 2017-01-02      tom   apple  0.0
16 2017-01-02      tom   melon  0.0
17 2017-01-02      tom  orange  0.0
18 2017-01-03      tim   apple  0.0
19 2017-01-03      tim   melon  0.0
20 2017-01-03      tim  orange  0.0
21 2017-01-03      jim   apple  0.0
22 2017-01-03      jim   melon  0.0
23 2017-01-03      jim  orange  0.0
24 2017-01-03      tom   apple  0.0
25 2017-01-03      tom   melon  0.0
26 2017-01-03      tom  orange  0.0
27 2017-01-04      tim   apple  0.0
28 2017-01-04      tim   melon  3.0
29 2017-01-04      tim  orange  0.0
30 2017-01-04      jim   apple  2.0
31 2017-01-04      jim   melon  0.0
32 2017-01-04      jim  orange  0.0
33 2017-01-04      tom   apple  0.0
34 2017-01-04      tom   melon  1.0
35 2017-01-04      tom  orange  4.0

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一个人的身影        
                
              
                            
                2021-01-22 16:57
              
            
            
                                                                       
IIUC you can do it this way:

In [63]: dates = pd.date_range(df['date'].min(), df['date'].max())

In [64]: idx = pd.MultiIndex.from_product((dates,
                                           df['customer'].unique(), 
                                           df['product'].unique()))

In [72]: (df.set_index(['date','customer','product'])
            .reindex(idx, fill_value=0)
            .reset_index()
            .set_axis(df.columns, axis=1, inplace=False))
Out[72]:
         date customer product  amt
0  2017-01-01      tim   apple    3
1  2017-01-01      tim   melon    0
2  2017-01-01      tim  orange    0
3  2017-01-01      jim   apple    0
4  2017-01-01      jim   melon    2
5  2017-01-01      jim  orange    0
6  2017-01-01      tom   apple    5
7  2017-01-01      tom   melon    4
8  2017-01-01      tom  orange    0
9  2017-01-02      tim   apple    0
..        ...      ...     ...  ...
26 2017-01-03      tom  orange    0
27 2017-01-04      tim   apple    0
28 2017-01-04      tim   melon    3
29 2017-01-04      tim  orange    0
30 2017-01-04      jim   apple    2
31 2017-01-04      jim   melon    0
32 2017-01-04      jim  orange    0
33 2017-01-04      tom   apple    0
34 2017-01-04      tom   melon    1
35 2017-01-04      tom  orange    4

[36 rows x 4 columns]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复