Replacing few values in a column based on a list in python

后端未结

关注

 1  1872

here is one good explained topic on stackoverflow: Replacing few values in a pandas dataframe column with another value

The example is:

BrandName Special


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  南旧        
                
              
                            
                2021-01-23 19:01
              
            
            
                                                                       
Use regex=True for subtring replacement:

df['BrandName'] = df['BrandName'].replace(['ABC', 'AB'], 'A', regex=True)
print (df)
  BrandName Specialty
0         A         H
1         B         I
2       A B         J
3         D         K
4         A         L


Another solution is necessary, if need to avoid replacement values in anaother substrings, like ABCD is not replaced, then need regex words boundaries:

print (df)
  BrandName Specialty
0    A ABCD         H
1         B         I
2     ABC B         J
3         D         K
4        AB         L


L = [r"\b{}\b".format(x) for x in ['ABC', 'AB']]

df['BrandName1'] = df['BrandName'].replace(L, 'A', regex=True)
df['BrandName2'] = df['BrandName'].replace(['ABC', 'AB'], 'A', regex=True)
print (df)
  BrandName Specialty BrandName1 BrandName2
0    A ABCD         H     A ABCD       A AD
1         B         I          B          B
2     ABC B         J        A B        A B
3         D         K          D          D
4        AB         L          A          A


Edit(from the questioner):

To speed it up, you can have a look here: Speed up millions of regex replacements in Python 3

The best one is the trieapproach:

def trie_regex_from_words(words):
    trie = Trie()
    for word in words:
        trie.add(word)
    return re.compile(r"\b" + trie.pattern() + r"\b", re.IGNORECASE)

union = trie_regex_from_words(strings)
df['BrandName'] = df['BrandName'].replace(union, 'A', regex=True)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复