I have a list of documents that have already been tokenized:
dat <- list(c(\"texaco\", \"canada\", \"lowered\", \"contract\", \"price\", \"pay\", \"crude
Here's one way with embed.
embed
find_ngrams <- function(x, n) { if (n == 1) return(x) c(x, apply(embed(x, n), 1, function(row) paste(rev(row), collapse=' '))) }
There seems to be a bug in your function. If you fix that, we can do a benchmark.