What algorithm I need to find n-grams?

前端 未结 7 460
無奈伤痛
無奈伤痛 2020-12-04 16:56

What algorithm is used for finding ngrams?

Supposing my input data is an array of words and the size of the ngrams I want to find, what algorithm I should use?

7条回答
  •  有刺的猬
    2020-12-04 17:50

    Usually the n-grams are calculated to find its frequency distribution. So Yes, it does matter how many times the n-grams appear.

    Also you want character level n-gram or word level n-gram. I have written a code for finding the character level n-gram from a csv file in r. I used package 'tau' for that. You can find it here.

    Also here is the code I wrote:

     library(tau)
    temp<-read.csv("/home/aravi/Documents/sample/csv/ex.csv",header=FALSE,stringsAsFactors=F)
    r<-textcnt(temp, method="ngram",n=4L,split = "[[:space:][:punct:]]+", decreasing=TRUE)
    a<-data.frame(counts = unclass(r), size = nchar(names(r)))
    b<-split(a,a$size)
    b
    

    Cheers!

提交回复
热议问题