Python emoji search and replace not working as expected

后端 未结 1 1353
小蘑菇
小蘑菇 2021-01-22 13:34

I am trying to separate emoji in given text from other characters/words/emojis. I want to use emoji later as features in text classification. So it is important that I treat eac

相关标签:
1条回答
  • 2021-01-22 13:50

    There are several issues here.

    • There is no capturing groups in the regex pattern, but in the replacement pattern, you define \1 backreference to Group 1 - so, the most natural workaround is to use a backreference to Group 0, i.e. the whole match, that is \g<0>.
    • The \1 in the replacement is not actually parsed as a backreference, but as a a char with an octal value 1 because the backslash in the regular (not raw) string literals forms escape sequences. Here, it is an octal escape.
    • The + after the ] means that the regex engine must match 1 or more occurrences of text matching the character class, so you match sequences of emojis rather than each separate emoji.

    Use

    import re
    
    text = "I am very #happy man but                                                                    
    0 讨论(0)
提交回复
热议问题