Match unicode emoji in python regex

前端 未结 3 1168
时光说笑
时光说笑 2021-01-18 02:49

I need to extract the text between a number and an emoticon in a text

example text:

blah xzuyguhbc ibcbb bqw 2 extract1  ☺️ jbjhcb 6 extract2          


        
3条回答
  •  清酒与你
    2021-01-18 03:31

    Here's my stab at the solution. Not sure if it will work in all circumstances. The trick is to convert all unicode emojis into normal text. This could be done by following this post Then you can match the emoji just as any normal text. Note that it won't work if the literal strings \u or \U is in your searched text.

    Example: Copy your string into a file, let's call it emo. In terminal:

    Chip chip@ 03:24:33@ ~: cat emo | python stackoverflow.py
    blah xzuyguhbc ibcbb bqw 2 extract1  \u263a\ufe0f jbjhcb 6 extract2 \U0001f645 bjvcvvv\n
    ------------------------
    [' extract1  ', ' extract2 ']
    

    Where stackoverflow.py file is:

    import fileinput
    a = fileinput.input();
    for line in a:
        teststring = unicode(line,'utf-8')
        teststring = teststring.encode('unicode-escape')
    
    import re
    print teststring
    print "------------------------"
    m = re.findall('(?<=[\s][\d])(.*?)(?=\\\\[uU])', teststring)
    print m
    

提交回复
热议问题