Removing stopwords from a user-defined corpus in R

When I run into tm problems I often end up just editing the original text.

For removing words it's a little awkward, but you can paste together a regex from tm's stopword list.

stopwords_regex = paste(stopwords('en'), collapse = '\\b|\\b')
stopwords_regex = paste0('\\b', stopwords_regex, '\\b')
documents = stringr::str_replace_all(documents, stopwords_regex, '')

> documents
[1] "     toast  breakfast"             " coffee  morning  excellent"      
[3] " lunch lets   pancakes"            "later   day  will   talks"        
[5] " talks   first day  great"         " second day   good presentations "

Maybe try to use the tm_map function to transform the document. It seems to work in my case.

> documents = c("She had toast for breakfast",
+  "The coffee this morning was excellent", 
+  "For lunch let's all have pancakes", 
+  "Later in the day, there will be more talks", 
+  "The talks on the first day were great", 
+  "The second day should have good presentations too")
> library(tm)
Loading required package: NLP
> documents <- Corpus(VectorSource(documents))
> documents = tm_map(documents, content_transformer(tolower))
> documents = tm_map(documents, removePunctuation)
> documents = tm_map(documents, removeWords, stopwords("english"))
> documents
Metadata:  corpus specific: 0, document level (indexed): 0
Content:  documents: 6

This yields

> documents[[1]]$content
[1] "  toast  breakfast"
> documents[[2]]$content
[1] " coffee  morning  excellent"
> documents[[3]]$content
[1] " lunch lets   pancakes"
> documents[[4]]$content
[1] "later   day  will   talks"
> documents[[5]]$content
[1] " talks   first day  great"
> documents[[6]]$content
[1] " second day   good presentations "

you can use quanteda package to remove stop words, but first make sure your words are tokens and then use the following:

x<- tokens_select(x,stopwords(), selection=)