Conditionally filling blank values in Pandas dataframes

后端未结

关注

 3  1267

I have a datafarme which looks like as follows (there are more columns having been dropped off):

    memberID    shipping_country    
    264991      
    264991


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  陌清茗        
                
              
                            
                2021-01-22 21:00
              
            
            
                                                                       
For the following sample dataframe (I added a memberID group that only contains '' in the shipping_country column):

   memberID shipping_country
0    264991                 
1    264991           Canada
2       100              USA
3      5000                 
4      5000               UK
5        54                 


This should work for you, and also as the behavior that if a memberID group only has empty string values ('') in shipping_country, those will be retained in the output df:

df['shipping_country'] = df.replace('',np.nan).groupby('memberID')['shipping_country'].transform('first').fillna('')


Yields:

   memberID shipping_country
0    264991           Canada
1    264991           Canada
2       100              USA
3      5000               UK
4      5000               UK
5        54                 


If you would like to leave the empty strings '' as NaN in the output df, then just remove the fillna(''), leaving:

df['shipping_country'] = df.replace('',np.nan).groupby('memberID')['shipping_country'].transform('first')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  -上瘾入骨i        
                
              
                            
                2021-01-22 21:07
              
            
            
                                                                       
You can use chained groupbys, one with forward fill and one with backfill:

# replace blank values with `NaN` first:
df['shipping_country'].replace('',pd.np.nan,inplace=True)

df.iloc[::-1].groupby('memberID').ffill().groupby('memberID').bfill()

   memberID shipping_country
0    264991           Canada
1    264991           Canada
2       100              USA
3      5000               UK
4      5000               UK


This method will also allow a group made up of all NaN to remain NaN:

>>> df
   memberID shipping_country
0    264991                 
1    264991           Canada
2       100              USA
3      5000                 
4      5000               UK
5         1                 
6         1                 

df['shipping_country'].replace('',pd.np.nan,inplace=True)

df.iloc[::-1].groupby('memberID').ffill().groupby('memberID').bfill()

   memberID shipping_country
0    264991           Canada
1    264991           Canada
2       100              USA
3      5000               UK
4      5000               UK
5         1              NaN
6         1              NaN

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  挽巷        
                
              
                            
                2021-01-22 21:12
              
            
            
                                                                       
You can use GroupBy + ffill / bfill:

def filler(x):
    return x.ffill().bfill()

res = df.groupby('memberID')['shipping_country'].apply(filler)


A custom function is necessary as there's no combined Pandas method to ffill and bfill sequentially.

This also caters for the situation where all values are NaN for a specific memberID; in this case they will remain NaN.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复