How to get the probability of bigrams in a text of sentences?
问题 I have a text which has many sentences. How can I use nltk.ngrams to process it? This is my code: sequence = nltk.tokenize.word_tokenize(raw) bigram = ngrams(sequence,2) freq_dist = nltk.FreqDist(bigram) prob_dist = nltk.MLEProbDist(freq_dist) number_of_bigrams = freq_dist.N() However, the above code supposes that all sentences are one sequence. But, sentences are separated, and I guess the last word of one sentence is unrelated to the start word of another sentence. How can I create a bigram