I would like to iterate over a string and output all emojis.
I\'m trying to iterate over the characters, and check them against an emoji list.
However, python se
Try this,
import re
re.findall(r'[^\w\s,]', my_list[0])
The regex r'[^\w\s,]'
matches any character that is not a word, whitespace or comma.
Python pre-3.3 uses UTF-16LE (narrow build) or UTF-32LE (wide build) internally for storing Unicode, and due to leaky abstraction exposes this detail to the user. UTF-16LE uses surrogate pairs to represent Unicode characters above U+FFFF as two codepoints. Either use a wide Python build or switch to Python 3.3 or later to fix the issue.
One way of dealing with a narrow build is to match the surrogate pairs:
Python 2.7 (narrow build):
>>> s = u'Test \U0001f60d'
>>> len(s)
7
>>> re.findall(u'(?:[\ud800-\udbff][\udc00-\udfff])|.',s)
[u'T', u'e', u's', u't', u' ', u'\U0001f60d']
Python 3.6:
>>> s = 'Test \U0001f60d'
>>> len(s)
6
>>> list(s)
['T', 'e', 's', 't', ' ', '