I am using the twitteR package in R to extract tweets based on their ids. But I am unable to do this for multiple tweet ids without hitting either a rate limit
I have come across the same issue recently. For retrieving tweets in bulk, Twitter recommends using the lookup
-method provided by its API. That way you can get up to 100 tweets per request.
Unfortunately, this has not been implemented in the twitteR
package yet; so I've tried to hack together a quick function (by re-using lots of code from the twitteR
package) to use that API method:
lookupStatus <- function (ids, ...){
lapply(ids, twitteR:::check_id)
batches <- split(ids, ceiling(seq_along(ids)/100))
results <- lapply(batches, function(batch) {
params <- parseIDs(batch)
statuses <- twitteR:::twInterfaceObj$doAPICall(paste("statuses", "lookup",
sep = "/"),
params = params, ...)
twitteR:::import_statuses(statuses)
})
return(unlist(results))
}
parseIDs <- function(ids){
id_list <- list()
if (length(ids) > 0) {
id_list$id <- paste(ids, collapse = ",")
}
return(id_list)
}
Make sure that your vector of ids
is of class character
(otherwise there can be a some problems with very large IDs).
Use the function like this:
ids <- c("432656548536401920", "332526548546401821")
tweets <- lookupStatus(ids, retryOnRateLimit=100)
Setting a high retryOnRateLimit
ensures you get all your tweets, even if your vector of IDs has more than 18,000 entries (100 IDs per request, 180 requests per 15-minute window).
As usual, you can turn the tweets into a data frame with twListToDF(tweets)
.