问题
I have retrieved many tweets from twitter using the r package twitteR.
After I've done this successfully, my goal is to create edges for a network analysis based on the mentions in those tweets. For this purpose I used the following code to get twitter usernames which were mentioned in a tweet:
tweets <- read.csv(file="tweets.csv")
tweets$mentions <- str_extract_all(tweets$text, "@\\w+")
There are tweets in which more than one username is mentioned for example "usernameA, usernameB and usernameC", but they are together in one row. Now I would like to multiple the rows with those tweets that mention more than one username with the number of usernames in this tweets. At the same time only one username should show up per row in the end. Let me illustrate what I mean on the already used example:
At the time being I have a row with two columns (text, mentions):
- "text of the tweet"; "usernameA, userNameB, usernameC"
I would like to have three rows in this case:
- "text of the tweet"; "usernameA"
- "text of the tweet"; "usernameB"
- "text of the tweet"; "usernameC"
My problems are:
- How do I let r check for entries that consist of a list (c ("usernameA", "usernameB", ...) in a specified column?
- How do I tell r to multiple this certain entry x-1 times (x=number of mentions)?
- How do I get r to leave only one username in each row?
回答1:
You can use plyr
for your problem and split the data frame of tweets by the text column:
plyr::ddply(tweets, c("text"), function(x){
mention <- unlist(stringr::str_extract_all(x$text, "@\\w+"))
# some tweets do not contain mentions, making this necessary:
if (length(mention) > 0){
return(data.frame(mention = mention))
} else {
return(data.frame(mention = NA))
}
})
Example:
tweets <- data.frame(text = c("A tweet with text and @user1 and @user2.",
"Another tweet @user3 and @user4 should hear about."))
Running the above function returns:
text mention
1 A tweet with text and @user1 and @user2. @user1
2 A tweet with text and @user1 and @user2. @user2
3 Another tweet @user3 and @user4 should hear about. @user3
4 Another tweet @user3 and @user4 should hear about. @user4
回答2:
I tried your code with different examples and works great, although the trouble I don't know how to face is when I have a list of tweets from a data.frame and I write tweets like:
tweets<-data.frame(text=(table$variable))
instead of
tweets <- data.frame(text = c("A tweet with text and @user1 and @user2.",
"Another tweet @user3 and @user4 should hear about."))
Apparently formats does not change, although after using your code, instead of getting handles I just receive numbers(indeed number of '@' inside of the text).
回答3:
Dave's answer returns handles instead of numbers from a generic data frame if you add stringsAsFactors=FALSE
:
plyr::ddply(mydata, c("text"), function(x){
mention <- unlist(stringr::str_extract_all(x$text, "@\\w+"))
# some tweets do not contain mentions, making this necessary:
if (length(mention) > 0){
return(data.frame(mention = mention,stringsAsFactors=FALSE))
} else {
return(data.frame(mention = NA))
}
})
来源:https://stackoverflow.com/questions/26756338/creating-edges-rows-for-several-mentions-in-one-tweet