Replacing few values in a column based on a list in python

后端 未结 1 1867
粉色の甜心
粉色の甜心 2021-01-23 18:38

here is one good explained topic on stackoverflow: Replacing few values in a pandas dataframe column with another value

The example is:

BrandName Special         


        
相关标签:
1条回答
  • 2021-01-23 19:01

    Use regex=True for subtring replacement:

    df['BrandName'] = df['BrandName'].replace(['ABC', 'AB'], 'A', regex=True)
    print (df)
      BrandName Specialty
    0         A         H
    1         B         I
    2       A B         J
    3         D         K
    4         A         L
    

    Another solution is necessary, if need to avoid replacement values in anaother substrings, like ABCD is not replaced, then need regex words boundaries:

    print (df)
      BrandName Specialty
    0    A ABCD         H
    1         B         I
    2     ABC B         J
    3         D         K
    4        AB         L
    
    
    L = [r"\b{}\b".format(x) for x in ['ABC', 'AB']]
    
    df['BrandName1'] = df['BrandName'].replace(L, 'A', regex=True)
    df['BrandName2'] = df['BrandName'].replace(['ABC', 'AB'], 'A', regex=True)
    print (df)
      BrandName Specialty BrandName1 BrandName2
    0    A ABCD         H     A ABCD       A AD
    1         B         I          B          B
    2     ABC B         J        A B        A B
    3         D         K          D          D
    4        AB         L          A          A
    

    Edit(from the questioner):

    To speed it up, you can have a look here: Speed up millions of regex replacements in Python 3

    The best one is the trieapproach:

    def trie_regex_from_words(words):
        trie = Trie()
        for word in words:
            trie.add(word)
        return re.compile(r"\b" + trie.pattern() + r"\b", re.IGNORECASE)
    
    union = trie_regex_from_words(strings)
    df['BrandName'] = df['BrandName'].replace(union, 'A', regex=True)
    
    0 讨论(0)
提交回复
热议问题