问题
This question has been asked here Python : How to remove all emojis Without a solution, I have as step towards the solution. But need help finishing it off.
I went and got all the emoji hex code points from the emoji site: https://www.unicode.org/emoji/charts/emoji-ordering.txt
I then read in the file like so:
file = open('emoji-ordering.txt')
temp = file.readline()
final_list = []
while temp != '':
#print(temp)
if not temp[0] == '#' :
utf_8_values = ((temp.split(';')[0]).rstrip()).split(' ')
values = ["u\\"+(word[0]+((8 - len(word[2:]))*'0' + word[2:]).rstrip()) for word in utf_8_values]
#print(values[0])
final_list = final_list + values
temp = file.readline()
print(final_list)
I hoped this would give me unicode literals. It does not, my goal is to get unicode literals so I can use part of the solution from the last question and be able to exclude all emojis. Any ideas what we need to get a solution?
回答1:
First install emoji:
pip install emoji
or
pip3 install emoji
So do this:
import emoji
def give_emoji_free_text(self, text):
allchars = [str for str in text]
emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])
return clean_text
text = give_emoji_free_text(text)
This work for me!
Or you can try:
emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
u"\U0001F1F2-\U0001F1F4" # Macau flag
u"\U0001F1E6-\U0001F1FF" # flags
u"\U0001F600-\U0001F64F"
u"\U00002702-\U000027B0"
u"\U000024C2-\U0001F251"
u"\U0001f926-\U0001f937"
u"\U0001F1F2"
u"\U0001F1F4"
u"\U0001F620"
u"\u200d"
u"\u2640-\u2642"
"]+", flags=re.UNICODE)
text = emoji_pattern.sub(r'', text)
回答2:
Here's a Python script that uses the emoji library's get_emoji_regexp()
.
It reads text from a file and writes the emoji-free text to another file.
import emoji
import re
def strip_emoji(text):
print(emoji.emoji_count(text))
new_text = re.sub(emoji.get_emoji_regexp(), r"", text)
return new_text
with open("my_file.md", "r") as file:
old_text = file.read()
no_emoji_text = strip_emoji(old_text)
with open("file.md", "w+") as new_file:
new_file.write(no_emoji_text)
来源:https://stackoverflow.com/questions/51217909/removing-all-emojis-from-text