Converting a list of tokens to n-grams

后端 未结 1 1660
悲&欢浪女
悲&欢浪女 2021-01-15 16:51

I have a list of documents that have already been tokenized:

dat <- list(c(\"texaco\", \"canada\", \"lowered\", \"contract\", \"price\", \"pay\", 
\"crude         


        
相关标签:
1条回答
  • 2021-01-15 17:37

    Here's one way with embed.

    find_ngrams <- function(x, n) {
        if (n == 1) return(x)
        c(x, apply(embed(x, n), 1, function(row) paste(rev(row), collapse=' ')))
    }
    

    There seems to be a bug in your function. If you fix that, we can do a benchmark.

    0 讨论(0)
提交回复
热议问题