I need to extract the text between a number and an emoticon in a text
example text:
blah xzuyguhbc ibcbb bqw 2 extract1 ☺️ jbjhcb 6 extract2
Here's my stab at the solution. Not sure if it will work in all circumstances. The trick is to convert all unicode emojis into normal text. This could be done by following this post Then you can match the emoji just as any normal text. Note that it won't work if the literal strings \u
or \U
is in your searched text.
Example: Copy your string into a file, let's call it emo
.
In terminal:
Chip chip@ 03:24:33@ ~: cat emo | python stackoverflow.py
blah xzuyguhbc ibcbb bqw 2 extract1 \u263a\ufe0f jbjhcb 6 extract2 \U0001f645 bjvcvvv\n
------------------------
[' extract1 ', ' extract2 ']
Where stackoverflow.py
file is:
import fileinput
a = fileinput.input();
for line in a:
teststring = unicode(line,'utf-8')
teststring = teststring.encode('unicode-escape')
import re
print teststring
print "------------------------"
m = re.findall('(?<=[\s][\d])(.*?)(?=\\\\[uU])', teststring)
print m