In Pandas, how can I patch a dataframe with missing values with values from another dataframe given a similar index?

前端未结

关注

 2  1249

一生所求 2021-01-20 02:44

From Fill in missing row values in pandas dataframe

I have the following dataframe and would like to fill in missing values.

mukey   hzdept_r    hzde


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   一整个雨季
                                             
                
                
                (楼主)
            
              
              
                2021-01-20 03:34
              

            
            
                        
The problem is the duplicate index values. When you use df1.fillna(df2), if you have multiple NaN entries in df1 where both the index and the column label are the same, pandas will get confused when trying to slice df1, and throw that InvalidIndexError. 

Your sample dataframe works because even though you have duplicate index values there, only one of each index value is null. Your larger dataframe contains null entries that share both the index value and column label in some cases. 

To make this work, you can do this one column at a time. For some reason, when operating on a series, pandas will not get confused by multiple entries of the same index, and will simply fill the same value in each one. Hence, this should work:

import pandas as pd
df = pd.read_csv('www004.csv')
# CSV file is here: https://www.dropbox.com/s/w3m0jppnq74op4c/www004.csv?dl=0
df1 = df.set_index('mukey')
grouped = df.groupby('mukey').mean()
for col in ['sandtotal_r', 'silttotal_r']:
    df1[col] = df1[col].fillna(grouped[col])
df1.reset_index()


NOTE: Be careful using the combine_first method if you ever have "extra" data in the dataframe you're filling from. The combine_first function will include ALL indices from the dataframe you're filling from, even if they're not present in the original dataframe. 
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复