I\'ve written a piece of code that essentially counts word frequencies and inserts them into an ARFF file for use with weka. I\'d like to alter it so that it can count bi-gram f
This should get you started:
def bigrams(words):
wprev = None
for w in words:
yield (wprev, w)
wprev = w
Note that the first bigram is (None, w1)
where w1
is the first word, so you have a special bigram that marks start-of-text. If you also want an end-of-text bigram, add yield (wprev, None)
after the loop.