问题
I scrapped tweets from the twitter API and the package rtweet
but I don't know how to work with text with emojis because they are in the form '\U0001f600' and all the regex code that I tried failed until now. I can't get anything of it.
For example
text = 'text text. \U0001f600'
grepl('U',text)
Give me FALSE
grepl('000',text)
Also give me FALSE.
Another problem is that they are often sticked to the word before (for example i am here\U0001f600
)
So how can I make R recognize emojis of that format? What can I put in the grepl that will return me TRUE for any emojis of that format?
回答1:
In R there tends to be a package for most things. And in this case textclean and with it comes the lexicon
package which has a lot of dictionaries. Using textclean you have 2 functions you can use, replace_emoji
and replace_emoji_identifier
text = c("text text. \U0001f600", "i am here\U0001f600")
# replace emoji with identifier:
textclean::replace_emoji_identifier(text)
[1] "text text. lexiconvygwtlyrpywfarytvfis " "i am here lexiconvygwtlyrpywfarytvfis "
# replace emoji with text representation
textclean::replace_emoji(text)
[1] "text text. grinning face " "i am here grinning face "
Next you could use sentimentr
to use sentiment scoring on the emoji's or for text analysis quanteda
. If you just want to check the presence as in your expected output:
grepl("lexicon[[:alpha:]]{20}", textclean::replace_emoji_identifier(text))
[1] TRUE TRUE
来源:https://stackoverflow.com/questions/53071434/r-tweets-with-emojis