Creating edges (rows) for several mentions in one tweet

本小妞迷上赌 提交于 2019-12-11 14:23:49

问题


I have retrieved many tweets from twitter using the r package twitteR.

After I've done this successfully, my goal is to create edges for a network analysis based on the mentions in those tweets. For this purpose I used the following code to get twitter usernames which were mentioned in a tweet:

tweets <- read.csv(file="tweets.csv")

tweets$mentions <- str_extract_all(tweets$text, "@\\w+")

There are tweets in which more than one username is mentioned for example "usernameA, usernameB and usernameC", but they are together in one row. Now I would like to multiple the rows with those tweets that mention more than one username with the number of usernames in this tweets. At the same time only one username should show up per row in the end. Let me illustrate what I mean on the already used example:

At the time being I have a row with two columns (text, mentions):

  1. "text of the tweet"; "usernameA, userNameB, usernameC"

I would like to have three rows in this case:

  1. "text of the tweet"; "usernameA"
  2. "text of the tweet"; "usernameB"
  3. "text of the tweet"; "usernameC"

My problems are:

  1. How do I let r check for entries that consist of a list (c ("usernameA", "usernameB", ...) in a specified column?
  2. How do I tell r to multiple this certain entry x-1 times (x=number of mentions)?
  3. How do I get r to leave only one username in each row?

回答1:


You can use plyr for your problem and split the data frame of tweets by the text column:

plyr::ddply(tweets, c("text"), function(x){
    mention <- unlist(stringr::str_extract_all(x$text, "@\\w+"))
    # some tweets do not contain mentions, making this necessary:
    if (length(mention) > 0){
        return(data.frame(mention = mention))
    } else {
        return(data.frame(mention = NA))    
    }
})

Example:

tweets <- data.frame(text = c("A tweet with text and @user1 and @user2.",
                              "Another tweet @user3 and @user4 should hear about."))

Running the above function returns:

                                                text mention
1           A tweet with text and @user1 and @user2.  @user1
2           A tweet with text and @user1 and @user2.  @user2
3 Another tweet @user3 and @user4 should hear about.  @user3
4 Another tweet @user3 and @user4 should hear about.  @user4



回答2:


I tried your code with different examples and works great, although the trouble I don't know how to face is when I have a list of tweets from a data.frame and I write tweets like:

tweets<-data.frame(text=(table$variable))

instead of

tweets <- data.frame(text = c("A tweet with text and @user1 and @user2.",
                              "Another tweet @user3 and @user4 should hear about."))

Apparently formats does not change, although after using your code, instead of getting handles I just receive numbers(indeed number of '@' inside of the text).




回答3:


Dave's answer returns handles instead of numbers from a generic data frame if you add stringsAsFactors=FALSE:

plyr::ddply(mydata, c("text"), function(x){
  mention <- unlist(stringr::str_extract_all(x$text, "@\\w+"))
  # some tweets do not contain mentions, making this necessary:
  if (length(mention) > 0){
    return(data.frame(mention = mention,stringsAsFactors=FALSE))
  } else {
    return(data.frame(mention = NA))    
  }
})


来源:https://stackoverflow.com/questions/26756338/creating-edges-rows-for-several-mentions-in-one-tweet

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!