R tweets with emojis

。_饼干妹妹 提交于 2020-01-03 05:08:25

问题


I scrapped tweets from the twitter API and the package rtweet but I don't know how to work with text with emojis because they are in the form '\U0001f600' and all the regex code that I tried failed until now. I can't get anything of it.

For example

 text = 'text text. \U0001f600'
 grepl('U',text)

Give me FALSE

 grepl('000',text)

Also give me FALSE.

Another problem is that they are often sticked to the word before (for example i am here\U0001f600 )

So how can I make R recognize emojis of that format? What can I put in the grepl that will return me TRUE for any emojis of that format?


回答1:


In R there tends to be a package for most things. And in this case textclean and with it comes the lexicon package which has a lot of dictionaries. Using textclean you have 2 functions you can use, replace_emoji and replace_emoji_identifier

text = c("text text. \U0001f600", "i am here\U0001f600")

# replace emoji with identifier:
textclean::replace_emoji_identifier(text)
[1] "text text. lexiconvygwtlyrpywfarytvfis " "i am here lexiconvygwtlyrpywfarytvfis " 

# replace emoji with text representation
textclean::replace_emoji(text)
[1] "text text. grinning face " "i am here grinning face " 

Next you could use sentimentr to use sentiment scoring on the emoji's or for text analysis quanteda. If you just want to check the presence as in your expected output:

grepl("lexicon[[:alpha:]]{20}", textclean::replace_emoji_identifier(text))
[1] TRUE TRUE


来源:https://stackoverflow.com/questions/53071434/r-tweets-with-emojis

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!