发表新帖

发表新帖

What algorithm I need to find n-grams?

前端未结

关注

 7  460

無奈伤痛 2020-12-04 16:56

What algorithm is used for finding ngrams?

Supposing my input data is an array of words and the size of the ngrams I want to find, what algorithm I should use?

7条回答

有刺的猬 (楼主)

2020-12-04 17:50
Usually the n-grams are calculated to find its frequency distribution. So Yes, it does matter how many times the n-grams appear.

Also you want character level n-gram or word level n-gram. I have written a code for finding the character level n-gram from a csv file in r. I used package 'tau' for that. You can find it here.

Also here is the code I wrote:
```
 library(tau)
temp<-read.csv("/home/aravi/Documents/sample/csv/ex.csv",header=FALSE,stringsAsFactors=F)
r<-textcnt(temp, method="ngram",n=4L,split = "[[:space:][:punct:]]+", decreasing=TRUE)
a<-data.frame(counts = unclass(r), size = nchar(names(r)))
b<-split(a,a$size)
b
```
Cheers!
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

热议问题