pandas merge dataframe with NaN (or “unknown”) for missing values

前端未结

关注

 4  844

I have 2 dataframes, one of which has supplemental information for some (but not all) of the rows in the other.

names = df({\'names\':[\'bob\',\'frank\',\'james\


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2021-02-05 02:07
              
            
            
                                                                       
Think of it as an SQL join operation. You need a left-outer join[1].

names = pd.DataFrame({'names':['bob','frank','james','tim','ricardo','mike','mark','joan','joe'],'position':['dev','dev','dev','sys','sys','sys','sup','sup','sup']})

info = pd.DataFrame({'names':['joe','mark','tim','frank'],'classification':['thief','thief','good','thief']})

Since there are names for which there is no classification, a left-outer join will do the job.

a = pd.merge(names, info, how='left', on='names')

The result is ...

>>> a
     names position classification
0      bob      dev            NaN
1    frank      dev          thief
2    james      dev            NaN
3      tim      sys           good
4  ricardo      sys            NaN
5     mike      sys            NaN
6     mark      sup          thief
7     joan      sup            NaN
8      joe      sup          thief


... which is fine. All the NaN results are ok if you look at both the tables.

Cheers!

[1] - http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  盖世英雄少女心        
                
              
                            
                2021-02-05 02:07
              
            
            
                                                                       
For outer or inner join also join function can be used. In the case above let's suppose that names is the main table (all rows from this table must occur in result). Then to run left outer join use:
what = names.set_index('names').join(info.set_index('names'), how='left')

resp.
what = names.set_index('names').join(info.set_index('names'), how='left').fillna("unknown")

set_index functions are used to create temporary index column (same in both tables). When dataframes would have contain such index column, then this step wouldn't be necessary. For example:
# define index when create dataframes
names = pd.DataFrame({'names':['bob',...],'position':['dev',...]}).set_index('names')
info = pd.DataFrame({'names':['joe',...],'classification':['thief',...]}).set_index('names')

what = names.join(info, how='left')

To perform other types of join just change how attribute (left/right/inner/outer are allowed). More info here
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  佛祖请我去吃肉        
                
              
                            
                2021-02-05 02:13
              
            
            
                                                                       
I think you want to perform an outer merge:

In [60]:

pd.merge(names, info, how='outer')
Out[60]:
     names position classification
0      bob      dev            NaN
1    frank      dev          thief
2    james      dev            NaN
3      tim      sys           good
4  ricardo      sys            NaN
5     mike      sys            NaN
6     mark      sup          thief
7     joan      sup            NaN
8      joe      sup          thief


There is section showing the type of merges can perform: http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  半阙折子戏        
                
              
                            
                2021-02-05 02:23
              
            
            
                                                                       
In case you are still looking for an answer for this:

The "strange" things that you described are due to some minor errors in your code. For example, the first (appearance of "bobjames" and "devsys") is due to the fact that you don't have a comma between those two values in your source dataframes. And the second is because pandas doesn't care about the name of your dataframe but cares about the name of your columns when merging (you have a dataframe called "names" but also your columns are called "names"). Otherwise, it seems that the merge is doing exactly what you are looking for:

import pandas as pd
names = pd.DataFrame({'names':['bob','frank','bob','bob','bob', 'james','tim','ricardo','mike','mark','joan','joe'], 
                      'position':['dev','dev','dev','dev','dev','dev', 'sys','sys','sys','sup','sup','sup']})

info = pd.DataFrame({'names':['joe','mark','tim','frank','joe','bill'],
                     'classification':['thief','thief','good','thief','good','thief']})
what = pd.merge(names, info, how="outer")
what.fillna('unknown', inplace=True)


which will result in:

      names position classification
0       bob      dev        unknown
1       bob      dev        unknown
2       bob      dev        unknown
3       bob      dev        unknown
4     frank      dev          thief
5     james      dev        unknown
6       tim      sys           good
7   ricardo      sys        unknown
8      mike      sys        unknown
9      mark      sup          thief
10     joan      sup        unknown
11      joe      sup          thief
12      joe      sup           good
13     bill  unknown          thief

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复