Counting bi-gram frequencies

后端 未结 4 464
感情败类
感情败类 2021-02-06 17:10

I\'ve written a piece of code that essentially counts word frequencies and inserts them into an ARFF file for use with weka. I\'d like to alter it so that it can count bi-gram f

4条回答
  •  借酒劲吻你
    2021-02-06 18:15

    This should get you started:

    def bigrams(words):
        wprev = None
        for w in words:
            yield (wprev, w)
            wprev = w
    

    Note that the first bigram is (None, w1) where w1 is the first word, so you have a special bigram that marks start-of-text. If you also want an end-of-text bigram, add yield (wprev, None) after the loop.

提交回复
热议问题