map multiple columns by a single dictionary in pandas

前端未结

关注

 3  637

I have a DataFrame with a multiple columns with \'yes\' and \'no\' strings. I want all of them to convert to a boolian dtype. To map one column, I would use


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  猫巷女王i        
                
              
                            
                2021-01-03 04:48
              
            
            
                                                                       
You could use a stack/unstack idiom

df.stack().map(dict_map_yn_bool).unstack()


Using @jezrael's setup

df = pd.DataFrame({'nearby_subway_station':['yes','no'], 'Station':['no','yes']})
dict_map_yn_bool={'yes':True, 'no':False}


Then

df.stack().map(dict_map_yn_bool).unstack()

  Station nearby_subway_station
0   False                  True
1    True                 False




timing

small data  



bigger data  


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  天命终不由人        
                
              
                            
                2021-01-03 04:48
              
            
            
                                                                       
You can use applymap:

df = pd.DataFrame({'nearby_subway_station':['yes','no'], 'Station':['no','yes']})
print (df)
  Station nearby_subway_station
0      no                   yes
1     yes                    no

dict_map_yn_bool={'yes':True, 'no':False}

df = df.applymap(dict_map_yn_bool.get)
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False


Another solution:

for x in df:
    df[x] = df[x].map(dict_map_yn_bool)
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False


Thanks Jon Clements for very nice idea - using replace:

df = df.replace({'yes': True, 'no': False})
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False


Some differences if data are no in dict:

df = pd.DataFrame({'nearby_subway_station':['yes','no','a'], 'Station':['no','yes','no']})
print (df)
  Station nearby_subway_station
0      no                   yes
1     yes                    no
2      no                     a


applymap create None for boolean, strings, for numeric NaN.

df = df.applymap(dict_map_yn_bool.get)
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False
2   False                  None


map create  NaN:

for x in df:
    df[x] = df[x].map(dict_map_yn_bool)

print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False
2   False                   NaN


replace dont create NaN or None, but original data are untouched:

df = df.replace(dict_map_yn_bool)
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False
2   False                     a

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  有刺的猬        
                
              
                            
                2021-01-03 05:01
              
            
            
                                                                       
I would work with pandas.DataFrame.replace as I think it is the simplest and has built-in arguments to support this task. Also a one-liner solution, as requested.

First case, replace all instances of 'yes' or 'no':

import pandas as pd
import numpy as np
from numpy import random

# Generating the data, 20 rows by 5 columns.
data = random.choice(['yes','no'], size=(20, 5), replace=True)
col_names = ['col_{}'.format(a) for a in range(1,6)]
df = pd.DataFrame(data, columns=col_names)

# Supplying lists of values to what they will replace. No dict needed.
df_bool = df.replace(to_replace=['yes','no'], value=[True, False])


Second case, where you only want to replace in a subset of columns, as described in the documentation for DataFrame.replace. Use a nested dictionary where the first set of keys are columns with values to replace, and values are dictionaries mapping values to their replacements:

dict_map_yn_bool={'yes':True, 'no':False}
replace_dict = {'col_1':dict_map_yn_bool, 
           'col_2':dict_map_yn_bool}
df_bool = df.replace(to_replace=replace_dict)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复